Lun Provision
-
Upload
nayab-rasool -
Category
Documents
-
view
106 -
download
21
description
Transcript of Lun Provision
ABSTRACT In the face of exponential data growth efficient management of data is crucial NetApp provides a set of technologies to do more with less These technologies allow for thin-provisioned storagemdashthe ability to consolidate much more data on NetAppreg storage controllers than would fit in the physically attached disks This document explains how to achieve best-in-class storage use and how to manage thin-provisioned storage to enable storage efficiency in daily life while meeting service level agreements
Technical Report
Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use Dr Adolf Hohl Georg Mey NetApp with support from the NetApp Field Centers for Innovation
October 2010 | RA-0007
2 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TABLE OF CONTENTS
1 EXECUTIVE SUMMARY 4
2 INTRODUCTION 5
21 TERMINOLOGY 5
22 GOAL OF THIS DOCUMENT 6
23 AUDIENCE 8
24 SCENARIO 9
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY 11
3 PROVISIONING 12
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING 12
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS 22
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION 27
4 OPERATION 30
41 PHASES AND TRANSITIONS 31
42 MONITORING 31
43 NOTIFICATION 35
44 MITIGATE STORAGE USE 37
5 REAL-LIFE SETTINGS 39
51 SAMPLE SETTING 1 REAL-LIFE SETTING 39
52 SAMPLE SETTING 2 SETTLEDNOMAD 41
6 STORAGE EFFICIENCY COOKBOOK 43
7 REFERENCES 46
8 ACKNOWLEDGMENTS 47
LIST OF TABLES
Table 1) NetApp technologies for storage efficiency and flexibility 11
Table 2) Full fat provisioning 13
Table 3) Zero fat provisioning 14
Table 4) Full fat provisioning 16
Table 5) Low fat provisioning 16
Table 6) Zero fat provisioning 17
Table 7) Comparison of provisioning methods 18
Table 8) Mitigation alternatives to control use within aggregates 38
Table 9) Mitigation activities for resource tightness within volumes 38
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative 41
3 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
LIST OF FIGURES
Figure 1) Terminology in context of the storage objects of volumes and aggregates 6
Figure 2) Storage consolidation and growing utilization using thin provisioning 7
Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate 7
Figure 4) Mitigate to prevent uncontrolled utilization 8
Figure 5) Sample service levels ordered by service disruption and recovery time 9
Figure 6) Questions regarding storage efficiency from an operational point of view 10
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible 13
Figure 8) Provisioning model for SAN storage from scratch 15
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete 20
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat 21
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services 21
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes 24
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically 26
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate 27
Figure 15) Alignment by technical impact (sorted by negative impact in descending order) 28
Figure 16) Alignment by business impact (sorted by negative impact in descending order) 28
Figure 17) Operations Manager screen to configure thresholds on operational metrics 32
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager 33
Figure 19) Storage efficiency dashboard in Operations Manager 34
Figure 20) Configuring an alarm based on the threshold aggregate almost full 36
Figure 21) Storage to enable organic data growth between planned downtime windows 39
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space 40
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months 41
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used 42
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe 43
4 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
1 EXECUTIVE SUMMARY This document provides consolidated best practices to achieve and manage best-in-class storage use We introduce intervals and metrics that trigger changes in behavior in order to operate NetApp storage in a corridor of high utilization as long as possible Starting with provisioning models focused on high consolidation and operational agility we describe the operational phases and its transitions A list of mitigation alternatives describes the available alternatives to control use in the face of data growth
Finally this document presents real-life settings where high data consolidation is achieved by using NetApp storage efficiency technologies
5 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
2 INTRODUCTION Exponential data growth generates a serious challenge for IT managers Gartner predicts that within the period from 2008 to 2013 enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (wwwgartnercomtechnologymedia-productsnewslettersnetappissue24gartner3html) Until recently continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems However IT executives are discovering that there are limits to that solution floor space weight loads rack space network drops power connections cooling infrastructure and even power itself are finite resources Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands
NetApprsquos solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently Key benefits of this strategy are
bull Less management involvement bull Reduced complexity support and service costs bull Improved performance and network efficiency
NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future
In this document we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment By adhering to the best practices outlined you can increase storage consolidation and agility as well as decrease operational risk
The document is organized as follows
bull Chapter 3 describes storage provisioning bull Chapter 4 describes the monitoring process and supporting tools for daily operation bull Chapter 5 describes concrete operational setups used in daily life bull Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to
adapt individual thresholds
21 TERMINOLOGY
We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology
bull Logical storage refers to storage that is visible at the application layer Logical storage does not necessarily require the allocation of usable capacity
bull Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers
bull Used capacity is a value that represents the amount of physical capacity that holds application or user data In Operations Manager1
bull Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns
terminology this is represented by capacity used
1 NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts reports performance and configuration tools
6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes
bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric
For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience
Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management
Figure 1) Terminology in context of the storage objects of volumes and aggregates
Committed Logical Storage
Usable Capacity of Aggregate
Data
Data Growth
Volumes with LUNSNAS
Operational Sweet Spot Corridor
Used Capacity
In practice commitment rates far above 100 are common in customer environments This document describes how to manage this
22 GOAL OF THIS DOCUMENT
The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment
The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally
7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility
To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller
bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor
bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth
bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation
Figure 2) Storage consolidation and growing utilization using thin provisioning
Data Data Growth
Operational Sweet Spot Corridor
Aggregate Capacity
Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate
Data Growth
Aggregate Capacity
Data
8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 4) Mitigate to prevent uncontrolled utilization
Aggregate Capacity
Data
Mitigate to prevent uncontrolled utilization
This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases
23 AUDIENCE
This document addresses two audiences
bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization
bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications
9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
24 SCENARIO
As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation
The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications
Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates
Figure 5) Sample service levels ordered by service disruption and recovery time
bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term
data
Bronzebull Production
Gold
bull Production
Gold
bull Productionbull Low budget
Silver
bull Productionbull Premium
customers
Platinum
Disruption
Recovery Time
Lowest Low Best Effort
Lowest
Low
Best Effort
In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency
We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life
10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services
The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view
bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best
bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described
bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events
bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment
Figure 6) Questions regarding storage efficiency from an operational point of view
bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion
awarenesssect From scratch or
templateclonebull Where to provision tobull Which SLAbull What are the defaults
Provision
Monitor
Notification
Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect
bull Monitoringsect Toolssect What to monitor
bull Who is in charge to reactbull How to notify
bull Available optionsbull Implications on SLAs bull When to act
Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
2 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TABLE OF CONTENTS
1 EXECUTIVE SUMMARY 4
2 INTRODUCTION 5
21 TERMINOLOGY 5
22 GOAL OF THIS DOCUMENT 6
23 AUDIENCE 8
24 SCENARIO 9
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY 11
3 PROVISIONING 12
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING 12
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS 22
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION 27
4 OPERATION 30
41 PHASES AND TRANSITIONS 31
42 MONITORING 31
43 NOTIFICATION 35
44 MITIGATE STORAGE USE 37
5 REAL-LIFE SETTINGS 39
51 SAMPLE SETTING 1 REAL-LIFE SETTING 39
52 SAMPLE SETTING 2 SETTLEDNOMAD 41
6 STORAGE EFFICIENCY COOKBOOK 43
7 REFERENCES 46
8 ACKNOWLEDGMENTS 47
LIST OF TABLES
Table 1) NetApp technologies for storage efficiency and flexibility 11
Table 2) Full fat provisioning 13
Table 3) Zero fat provisioning 14
Table 4) Full fat provisioning 16
Table 5) Low fat provisioning 16
Table 6) Zero fat provisioning 17
Table 7) Comparison of provisioning methods 18
Table 8) Mitigation alternatives to control use within aggregates 38
Table 9) Mitigation activities for resource tightness within volumes 38
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative 41
3 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
LIST OF FIGURES
Figure 1) Terminology in context of the storage objects of volumes and aggregates 6
Figure 2) Storage consolidation and growing utilization using thin provisioning 7
Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate 7
Figure 4) Mitigate to prevent uncontrolled utilization 8
Figure 5) Sample service levels ordered by service disruption and recovery time 9
Figure 6) Questions regarding storage efficiency from an operational point of view 10
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible 13
Figure 8) Provisioning model for SAN storage from scratch 15
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete 20
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat 21
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services 21
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes 24
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically 26
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate 27
Figure 15) Alignment by technical impact (sorted by negative impact in descending order) 28
Figure 16) Alignment by business impact (sorted by negative impact in descending order) 28
Figure 17) Operations Manager screen to configure thresholds on operational metrics 32
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager 33
Figure 19) Storage efficiency dashboard in Operations Manager 34
Figure 20) Configuring an alarm based on the threshold aggregate almost full 36
Figure 21) Storage to enable organic data growth between planned downtime windows 39
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space 40
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months 41
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used 42
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe 43
4 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
1 EXECUTIVE SUMMARY This document provides consolidated best practices to achieve and manage best-in-class storage use We introduce intervals and metrics that trigger changes in behavior in order to operate NetApp storage in a corridor of high utilization as long as possible Starting with provisioning models focused on high consolidation and operational agility we describe the operational phases and its transitions A list of mitigation alternatives describes the available alternatives to control use in the face of data growth
Finally this document presents real-life settings where high data consolidation is achieved by using NetApp storage efficiency technologies
5 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
2 INTRODUCTION Exponential data growth generates a serious challenge for IT managers Gartner predicts that within the period from 2008 to 2013 enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (wwwgartnercomtechnologymedia-productsnewslettersnetappissue24gartner3html) Until recently continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems However IT executives are discovering that there are limits to that solution floor space weight loads rack space network drops power connections cooling infrastructure and even power itself are finite resources Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands
NetApprsquos solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently Key benefits of this strategy are
bull Less management involvement bull Reduced complexity support and service costs bull Improved performance and network efficiency
NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future
In this document we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment By adhering to the best practices outlined you can increase storage consolidation and agility as well as decrease operational risk
The document is organized as follows
bull Chapter 3 describes storage provisioning bull Chapter 4 describes the monitoring process and supporting tools for daily operation bull Chapter 5 describes concrete operational setups used in daily life bull Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to
adapt individual thresholds
21 TERMINOLOGY
We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology
bull Logical storage refers to storage that is visible at the application layer Logical storage does not necessarily require the allocation of usable capacity
bull Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers
bull Used capacity is a value that represents the amount of physical capacity that holds application or user data In Operations Manager1
bull Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns
terminology this is represented by capacity used
1 NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts reports performance and configuration tools
6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes
bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric
For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience
Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management
Figure 1) Terminology in context of the storage objects of volumes and aggregates
Committed Logical Storage
Usable Capacity of Aggregate
Data
Data Growth
Volumes with LUNSNAS
Operational Sweet Spot Corridor
Used Capacity
In practice commitment rates far above 100 are common in customer environments This document describes how to manage this
22 GOAL OF THIS DOCUMENT
The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment
The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally
7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility
To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller
bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor
bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth
bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation
Figure 2) Storage consolidation and growing utilization using thin provisioning
Data Data Growth
Operational Sweet Spot Corridor
Aggregate Capacity
Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate
Data Growth
Aggregate Capacity
Data
8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 4) Mitigate to prevent uncontrolled utilization
Aggregate Capacity
Data
Mitigate to prevent uncontrolled utilization
This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases
23 AUDIENCE
This document addresses two audiences
bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization
bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications
9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
24 SCENARIO
As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation
The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications
Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates
Figure 5) Sample service levels ordered by service disruption and recovery time
bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term
data
Bronzebull Production
Gold
bull Production
Gold
bull Productionbull Low budget
Silver
bull Productionbull Premium
customers
Platinum
Disruption
Recovery Time
Lowest Low Best Effort
Lowest
Low
Best Effort
In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency
We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life
10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services
The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view
bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best
bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described
bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events
bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment
Figure 6) Questions regarding storage efficiency from an operational point of view
bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion
awarenesssect From scratch or
templateclonebull Where to provision tobull Which SLAbull What are the defaults
Provision
Monitor
Notification
Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect
bull Monitoringsect Toolssect What to monitor
bull Who is in charge to reactbull How to notify
bull Available optionsbull Implications on SLAs bull When to act
Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
3 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
LIST OF FIGURES
Figure 1) Terminology in context of the storage objects of volumes and aggregates 6
Figure 2) Storage consolidation and growing utilization using thin provisioning 7
Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate 7
Figure 4) Mitigate to prevent uncontrolled utilization 8
Figure 5) Sample service levels ordered by service disruption and recovery time 9
Figure 6) Questions regarding storage efficiency from an operational point of view 10
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible 13
Figure 8) Provisioning model for SAN storage from scratch 15
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete 20
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat 21
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services 21
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes 24
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically 26
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate 27
Figure 15) Alignment by technical impact (sorted by negative impact in descending order) 28
Figure 16) Alignment by business impact (sorted by negative impact in descending order) 28
Figure 17) Operations Manager screen to configure thresholds on operational metrics 32
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager 33
Figure 19) Storage efficiency dashboard in Operations Manager 34
Figure 20) Configuring an alarm based on the threshold aggregate almost full 36
Figure 21) Storage to enable organic data growth between planned downtime windows 39
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space 40
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months 41
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used 42
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe 43
4 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
1 EXECUTIVE SUMMARY This document provides consolidated best practices to achieve and manage best-in-class storage use We introduce intervals and metrics that trigger changes in behavior in order to operate NetApp storage in a corridor of high utilization as long as possible Starting with provisioning models focused on high consolidation and operational agility we describe the operational phases and its transitions A list of mitigation alternatives describes the available alternatives to control use in the face of data growth
Finally this document presents real-life settings where high data consolidation is achieved by using NetApp storage efficiency technologies
5 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
2 INTRODUCTION Exponential data growth generates a serious challenge for IT managers Gartner predicts that within the period from 2008 to 2013 enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (wwwgartnercomtechnologymedia-productsnewslettersnetappissue24gartner3html) Until recently continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems However IT executives are discovering that there are limits to that solution floor space weight loads rack space network drops power connections cooling infrastructure and even power itself are finite resources Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands
NetApprsquos solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently Key benefits of this strategy are
bull Less management involvement bull Reduced complexity support and service costs bull Improved performance and network efficiency
NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future
In this document we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment By adhering to the best practices outlined you can increase storage consolidation and agility as well as decrease operational risk
The document is organized as follows
bull Chapter 3 describes storage provisioning bull Chapter 4 describes the monitoring process and supporting tools for daily operation bull Chapter 5 describes concrete operational setups used in daily life bull Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to
adapt individual thresholds
21 TERMINOLOGY
We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology
bull Logical storage refers to storage that is visible at the application layer Logical storage does not necessarily require the allocation of usable capacity
bull Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers
bull Used capacity is a value that represents the amount of physical capacity that holds application or user data In Operations Manager1
bull Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns
terminology this is represented by capacity used
1 NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts reports performance and configuration tools
6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes
bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric
For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience
Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management
Figure 1) Terminology in context of the storage objects of volumes and aggregates
Committed Logical Storage
Usable Capacity of Aggregate
Data
Data Growth
Volumes with LUNSNAS
Operational Sweet Spot Corridor
Used Capacity
In practice commitment rates far above 100 are common in customer environments This document describes how to manage this
22 GOAL OF THIS DOCUMENT
The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment
The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally
7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility
To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller
bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor
bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth
bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation
Figure 2) Storage consolidation and growing utilization using thin provisioning
Data Data Growth
Operational Sweet Spot Corridor
Aggregate Capacity
Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate
Data Growth
Aggregate Capacity
Data
8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 4) Mitigate to prevent uncontrolled utilization
Aggregate Capacity
Data
Mitigate to prevent uncontrolled utilization
This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases
23 AUDIENCE
This document addresses two audiences
bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization
bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications
9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
24 SCENARIO
As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation
The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications
Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates
Figure 5) Sample service levels ordered by service disruption and recovery time
bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term
data
Bronzebull Production
Gold
bull Production
Gold
bull Productionbull Low budget
Silver
bull Productionbull Premium
customers
Platinum
Disruption
Recovery Time
Lowest Low Best Effort
Lowest
Low
Best Effort
In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency
We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life
10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services
The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view
bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best
bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described
bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events
bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment
Figure 6) Questions regarding storage efficiency from an operational point of view
bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion
awarenesssect From scratch or
templateclonebull Where to provision tobull Which SLAbull What are the defaults
Provision
Monitor
Notification
Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect
bull Monitoringsect Toolssect What to monitor
bull Who is in charge to reactbull How to notify
bull Available optionsbull Implications on SLAs bull When to act
Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
4 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
1 EXECUTIVE SUMMARY This document provides consolidated best practices to achieve and manage best-in-class storage use We introduce intervals and metrics that trigger changes in behavior in order to operate NetApp storage in a corridor of high utilization as long as possible Starting with provisioning models focused on high consolidation and operational agility we describe the operational phases and its transitions A list of mitigation alternatives describes the available alternatives to control use in the face of data growth
Finally this document presents real-life settings where high data consolidation is achieved by using NetApp storage efficiency technologies
5 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
2 INTRODUCTION Exponential data growth generates a serious challenge for IT managers Gartner predicts that within the period from 2008 to 2013 enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (wwwgartnercomtechnologymedia-productsnewslettersnetappissue24gartner3html) Until recently continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems However IT executives are discovering that there are limits to that solution floor space weight loads rack space network drops power connections cooling infrastructure and even power itself are finite resources Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands
NetApprsquos solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently Key benefits of this strategy are
bull Less management involvement bull Reduced complexity support and service costs bull Improved performance and network efficiency
NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future
In this document we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment By adhering to the best practices outlined you can increase storage consolidation and agility as well as decrease operational risk
The document is organized as follows
bull Chapter 3 describes storage provisioning bull Chapter 4 describes the monitoring process and supporting tools for daily operation bull Chapter 5 describes concrete operational setups used in daily life bull Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to
adapt individual thresholds
21 TERMINOLOGY
We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology
bull Logical storage refers to storage that is visible at the application layer Logical storage does not necessarily require the allocation of usable capacity
bull Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers
bull Used capacity is a value that represents the amount of physical capacity that holds application or user data In Operations Manager1
bull Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns
terminology this is represented by capacity used
1 NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts reports performance and configuration tools
6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes
bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric
For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience
Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management
Figure 1) Terminology in context of the storage objects of volumes and aggregates
Committed Logical Storage
Usable Capacity of Aggregate
Data
Data Growth
Volumes with LUNSNAS
Operational Sweet Spot Corridor
Used Capacity
In practice commitment rates far above 100 are common in customer environments This document describes how to manage this
22 GOAL OF THIS DOCUMENT
The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment
The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally
7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility
To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller
bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor
bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth
bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation
Figure 2) Storage consolidation and growing utilization using thin provisioning
Data Data Growth
Operational Sweet Spot Corridor
Aggregate Capacity
Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate
Data Growth
Aggregate Capacity
Data
8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 4) Mitigate to prevent uncontrolled utilization
Aggregate Capacity
Data
Mitigate to prevent uncontrolled utilization
This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases
23 AUDIENCE
This document addresses two audiences
bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization
bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications
9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
24 SCENARIO
As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation
The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications
Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates
Figure 5) Sample service levels ordered by service disruption and recovery time
bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term
data
Bronzebull Production
Gold
bull Production
Gold
bull Productionbull Low budget
Silver
bull Productionbull Premium
customers
Platinum
Disruption
Recovery Time
Lowest Low Best Effort
Lowest
Low
Best Effort
In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency
We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life
10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services
The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view
bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best
bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described
bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events
bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment
Figure 6) Questions regarding storage efficiency from an operational point of view
bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion
awarenesssect From scratch or
templateclonebull Where to provision tobull Which SLAbull What are the defaults
Provision
Monitor
Notification
Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect
bull Monitoringsect Toolssect What to monitor
bull Who is in charge to reactbull How to notify
bull Available optionsbull Implications on SLAs bull When to act
Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
5 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
2 INTRODUCTION Exponential data growth generates a serious challenge for IT managers Gartner predicts that within the period from 2008 to 2013 enterprises will purchase and install 20 times more terabytes of storage than they did in 2008 (wwwgartnercomtechnologymedia-productsnewslettersnetappissue24gartner3html) Until recently continuous improvements in cost for performance and storage space have made it easy and affordable to solve storage concerns by adding more disks to existing storage systems However IT executives are discovering that there are limits to that solution floor space weight loads rack space network drops power connections cooling infrastructure and even power itself are finite resources Hitting any one of these limits significantly jeopardizes the ability of IT to meet business demands
NetApprsquos solution to rapid resource consumption is to reduce storage controllers and disks from the resource equation by using storage more efficiently Key benefits of this strategy are
bull Less management involvement bull Reduced complexity support and service costs bull Improved performance and network efficiency
NetApp storage efficiency technologies are key to achieving data consolidation and managing future data growth they allow for storing and managing several times more data on NetApp storage controllers than would fit on their physical attached disks and allow the deferral of IT investments to the future
In this document we describe techniques and guidelines you can use to find the operational sweet spot for NetApp storage efficiency technologies in your environment By adhering to the best practices outlined you can increase storage consolidation and agility as well as decrease operational risk
The document is organized as follows
bull Chapter 3 describes storage provisioning bull Chapter 4 describes the monitoring process and supporting tools for daily operation bull Chapter 5 describes concrete operational setups used in daily life bull Chapter 6 concludes with a step-by-step cookbook to provision and manage storage efficiently and to
adapt individual thresholds
21 TERMINOLOGY
We use the following terminology to describe resource use on the level of exposing storage to applications and on the level of physical resource allocation within the aggregates in the storage controllers Also refer to the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further explanation of this terminology
bull Logical storage refers to storage that is visible at the application layer Logical storage does not necessarily require the allocation of usable capacity
bull Usable capacity refers to storage that is usable for the applications provided by NetApp storage controllers
bull Used capacity is a value that represents the amount of physical capacity that holds application or user data In Operations Manager1
bull Storage utilization refers to the ratio of usable capacity to used capacity without accounting for efficiency returns
terminology this is represented by capacity used
1 NetApp Operations Manager is a central console that delivers comprehensive monitoring and management for NetApp storage with alerts reports performance and configuration tools
6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes
bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric
For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience
Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management
Figure 1) Terminology in context of the storage objects of volumes and aggregates
Committed Logical Storage
Usable Capacity of Aggregate
Data
Data Growth
Volumes with LUNSNAS
Operational Sweet Spot Corridor
Used Capacity
In practice commitment rates far above 100 are common in customer environments This document describes how to manage this
22 GOAL OF THIS DOCUMENT
The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment
The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally
7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility
To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller
bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor
bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth
bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation
Figure 2) Storage consolidation and growing utilization using thin provisioning
Data Data Growth
Operational Sweet Spot Corridor
Aggregate Capacity
Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate
Data Growth
Aggregate Capacity
Data
8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 4) Mitigate to prevent uncontrolled utilization
Aggregate Capacity
Data
Mitigate to prevent uncontrolled utilization
This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases
23 AUDIENCE
This document addresses two audiences
bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization
bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications
9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
24 SCENARIO
As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation
The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications
Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates
Figure 5) Sample service levels ordered by service disruption and recovery time
bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term
data
Bronzebull Production
Gold
bull Production
Gold
bull Productionbull Low budget
Silver
bull Productionbull Premium
customers
Platinum
Disruption
Recovery Time
Lowest Low Best Effort
Lowest
Low
Best Effort
In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency
We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life
10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services
The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view
bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best
bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described
bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events
bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment
Figure 6) Questions regarding storage efficiency from an operational point of view
bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion
awarenesssect From scratch or
templateclonebull Where to provision tobull Which SLAbull What are the defaults
Provision
Monitor
Notification
Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect
bull Monitoringsect Toolssect What to monitor
bull Who is in charge to reactbull How to notify
bull Available optionsbull Implications on SLAs bull When to act
Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
6 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
bull Commitment rate is an Operations Manager term that refers to the percentage of aggregate space committed to volumes
bull Deduplication rate is an Operations Manager term that measures the efficiency of the deduplication functionality This rate is measured for the volume and the aggregate level in percentage metric
For the aggregate we define different operational windows characterized by an interval of storage utilization We define an interval as the operational sweet spot corridor (green) where the aggregate should be operated for optimal utilization and service availability We define a tolerance interval (yellow) where actions are taken to get back into the operational sweet spot window We define an interval as a no-go area (red) where we do not intend to operate the aggregate This area might act as a last buffer of time or can be considered an area where operational staff has less experience
Figure 1 explains the terminology in the context of storage objects on a NetApp storage controller The aggregate is a physically limited storage object Aggregates are treated as fairly static containers and thus need proper size management
Figure 1) Terminology in context of the storage objects of volumes and aggregates
Committed Logical Storage
Usable Capacity of Aggregate
Data
Data Growth
Volumes with LUNSNAS
Operational Sweet Spot Corridor
Used Capacity
In practice commitment rates far above 100 are common in customer environments This document describes how to manage this
22 GOAL OF THIS DOCUMENT
The goal of this document is to achieve best-in-class storage efficiency and costs by consolidating the highest possible amount of data of applications while meeting the required service-level agreements The idea is to enable thin provisioning while controlling the use of physical resources in the NetApp shared storage infrastructure within a desired corridor NetApp storage efficiency technologies can save a significant amount of the IT budget On the other side running at an uncontrolled use level can reduce flexibility and cause headaches in managing data growth and service level fulfillment
The difference in managing thin-provisioned storage compared to traditional storage is that due to dense consolidation of application data accumulated application data growth rates might vary in a broader corridor than they would traditionally
7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility
To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller
bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor
bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth
bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation
Figure 2) Storage consolidation and growing utilization using thin provisioning
Data Data Growth
Operational Sweet Spot Corridor
Aggregate Capacity
Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate
Data Growth
Aggregate Capacity
Data
8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 4) Mitigate to prevent uncontrolled utilization
Aggregate Capacity
Data
Mitigate to prevent uncontrolled utilization
This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases
23 AUDIENCE
This document addresses two audiences
bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization
bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications
9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
24 SCENARIO
As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation
The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications
Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates
Figure 5) Sample service levels ordered by service disruption and recovery time
bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term
data
Bronzebull Production
Gold
bull Production
Gold
bull Productionbull Low budget
Silver
bull Productionbull Premium
customers
Platinum
Disruption
Recovery Time
Lowest Low Best Effort
Lowest
Low
Best Effort
In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency
We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life
10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services
The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view
bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best
bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described
bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events
bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment
Figure 6) Questions regarding storage efficiency from an operational point of view
bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion
awarenesssect From scratch or
templateclonebull Where to provision tobull Which SLAbull What are the defaults
Provision
Monitor
Notification
Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect
bull Monitoringsect Toolssect What to monitor
bull Who is in charge to reactbull How to notify
bull Available optionsbull Implications on SLAs bull When to act
Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
7 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
To control the level of physical resources we outline methods for increasing storage utilization by provisioning storage with NetApp storage efficiency technologies We also outline how to react to organic data growth and how to level the storage use within a corridor of high efficiency by using NetApp technologies for data center flexibility
To summarize this document introduces three phases to manage the storage on NetApp storage controllers provisioning organic growth and mitigation of storage tightness These phases play a vital role for aggregates which are the coarsest storage abstraction of a NetApp storage controller
bull Provisioning phase In this phase storage is provisioned by the NetApp shared storage infrastructure which increases the utilization of aggregates The goal is to operate the aggregates with a high level of data consolidation in an efficient utilization corridor Figure 2 visualizes this corridor
bull Organic growth phase In this phase no further storage is provisioned to slow down growth of aggregate utilization The goal is to operate in a corridor of high utilization but safely reach the next planned downtime or administration window of the served applications Thus the organic growth phase is sized depending on the growth rate of capacity used and the length of the time frame to the planned downtime windows Figure 3 visualizes slowed growth
bull Mitigation of storage tightness phase This phase prevents an uncontrolled level of utilization and provides mitigation activities to lower this level Several mitigation alternatives are presented to mitigate storage tightness and to shift back the aggregate utilization in the desired operational corridor Figure 4 visualizes this mitigation
Figure 2) Storage consolidation and growing utilization using thin provisioning
Data Data Growth
Operational Sweet Spot Corridor
Aggregate Capacity
Figure 3) Controlled and slowed data growth within the operational sweet spot corridor in the aggregate
Data Growth
Aggregate Capacity
Data
8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 4) Mitigate to prevent uncontrolled utilization
Aggregate Capacity
Data
Mitigate to prevent uncontrolled utilization
This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases
23 AUDIENCE
This document addresses two audiences
bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization
bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications
9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
24 SCENARIO
As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation
The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications
Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates
Figure 5) Sample service levels ordered by service disruption and recovery time
bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term
data
Bronzebull Production
Gold
bull Production
Gold
bull Productionbull Low budget
Silver
bull Productionbull Premium
customers
Platinum
Disruption
Recovery Time
Lowest Low Best Effort
Lowest
Low
Best Effort
In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency
We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life
10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services
The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view
bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best
bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described
bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events
bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment
Figure 6) Questions regarding storage efficiency from an operational point of view
bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion
awarenesssect From scratch or
templateclonebull Where to provision tobull Which SLAbull What are the defaults
Provision
Monitor
Notification
Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect
bull Monitoringsect Toolssect What to monitor
bull Who is in charge to reactbull How to notify
bull Available optionsbull Implications on SLAs bull When to act
Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
8 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 4) Mitigate to prevent uncontrolled utilization
Aggregate Capacity
Data
Mitigate to prevent uncontrolled utilization
This document addresses best practice and tools to manage the NetApp storage infrastructure and to support the decision making during transition between phases
23 AUDIENCE
This document addresses two audiences
bull Decision makers It provides decision makers with an understanding of how to align storage efficiency best practices and processes within their existing operations organization
bull Operational teams It allows operational teams to understand monitoring and management of the storage infrastructure while mastering data growth It allows the operational teams to implement a basic setting and to position their usage goals We refer to existing NetApp best practices to increase the level of data consolidation and to achieve overcommitment for major applications
9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
24 SCENARIO
As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation
The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications
Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates
Figure 5) Sample service levels ordered by service disruption and recovery time
bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term
data
Bronzebull Production
Gold
bull Production
Gold
bull Productionbull Low budget
Silver
bull Productionbull Premium
customers
Platinum
Disruption
Recovery Time
Lowest Low Best Effort
Lowest
Low
Best Effort
In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency
We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life
10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services
The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view
bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best
bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described
bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events
bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment
Figure 6) Questions regarding storage efficiency from an operational point of view
bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion
awarenesssect From scratch or
templateclonebull Where to provision tobull Which SLAbull What are the defaults
Provision
Monitor
Notification
Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect
bull Monitoringsect Toolssect What to monitor
bull Who is in charge to reactbull How to notify
bull Available optionsbull Implications on SLAs bull When to act
Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
9 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
24 SCENARIO
As a scenario we consider a service provider who delivers a set of IT services This service provider might serve internal or external customers at different service levels The service levels provided are characterized by unplanned downtime as exemplified by Figure 5 This characterization is useful for aligning service data with physical resources In our example the highest level of service availability is delivered for Platinum services It is further assumed that provided services have different lifetimes and dates of creation
The service providerrsquos major goal is to deliver the services within the specified service level and to achieve maximum data consolidation through NetApp storage efficiency technologies This directly translates into cost savings related to capital investments floor space cooling maintenance and operational expenses However storing data in a consolidated way using storage efficiency technologies needs to take into account the aggregated data growth rates of the applications
Predicting the data growth rates depends on several parameters which are usually outside of control and knowledge of the service provider These parameters include usage characteristics number of users and functionality used To compensate for the deficiencies in precisely predicting the data growth over a specific time frame we propose a scheme that the service provider can use to achieve operational flexibility and adaptability to handle unpredictable growth rates
Figure 5) Sample service levels ordered by service disruption and recovery time
bull Best Effort Servicesbull DevTestbull ColdFillup databull Dynamicshort term
data
Bronzebull Production
Gold
bull Production
Gold
bull Productionbull Low budget
Silver
bull Productionbull Premium
customers
Platinum
Disruption
Recovery Time
Lowest Low Best Effort
Lowest
Low
Best Effort
In this document the focus is on operational aspects of storage efficiency technologies to achieve data center consolidation and agility Thus we take a seat next to the operational staff of our sample service provider to understand their questions regarding the technologies that make up NetApp storage efficiency
We address the questions posed by the operational staff such as How do we set it up How do we run this How do we integrate necessary procedures in our daily life
10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services
The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view
bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best
bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described
bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events
bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment
Figure 6) Questions regarding storage efficiency from an operational point of view
bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion
awarenesssect From scratch or
templateclonebull Where to provision tobull Which SLAbull What are the defaults
Provision
Monitor
Notification
Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect
bull Monitoringsect Toolssect What to monitor
bull Who is in charge to reactbull How to notify
bull Available optionsbull Implications on SLAs bull When to act
Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
10 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
A set of questions pertaining to the lifetime of a service instance and its storage arises It starts with provisioning storage in a NetApp shared storage infrastructure detection and monitoring of situations endangering the level of a service necessary response procedures and promoting a continuous and smooth delivery of services
The questions are structured around a cycle that starts at provisioning storage and finishes at deprovisioning storage Figure 6 shows important questions regarding storage efficiency from an operational point of view
bull Provisioning deals with the provisioning of storage In this document provisioning models are shown to achieve a high level of storage consolidation while preserving operational flexibility For individual applications NetApp provides a rich library of technical reports on how to provision best
bull Monitoring deals with defining the goal of the monitoring process and which metrics to use to decide when to stop (for example the provisioning of storage) Relevant metrics provided by NetApp Operations Manager are described
bull Notifying deals with how to notify people in charge of when to perform certain actions The notification mechanisms within NetApp Operations Manager are described to deliver information in case of certain events
bull Mitigation deals with mechanisms to prevent uncontrolled utilization affecting operational flexibility and service fulfillment
Figure 6) Questions regarding storage efficiency from an operational point of view
bull How to provision best for storage efficiencysect Provisioning Modelssect NetApp Data Motion
awarenesssect From scratch or
templateclonebull Where to provision tobull Which SLAbull What are the defaults
Provision
Monitor
Notification
Mitigatebull What is criticalsect When to stop provisioningsect When to stop extendingsect When to relax tightnesssect How to detect
bull Monitoringsect Toolssect What to monitor
bull Who is in charge to reactbull How to notify
bull Available optionsbull Implications on SLAs bull When to act
Before discussing the details of this cycle it is important to understand the NetApp technologies that achieve storage efficiency and flexibility and to understand their relevance in the provisioning and operational phases
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
11 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
The NetApp technologies provided in Table 1 are characterized by how they bring the most significant advantage and value For example FlexClonereg technology provides significant time and space advantages while provisioning but the space advantage might be reduced over time In contrast deduplication technology can achieve space savings over the entire storage lifetime
Table 1) NetApp technologies for storage efficiency and flexibility
NetApp Technology Benefit During Provisioning
During Operation
FlexClone Instantly creates thin provisioned and space-efficient writable clones X
FlexVolreg Implements thin provisioning and consumes only the needed space rather than the requested space X X
Deduplication Increases data consolidation by detecting and optimizing repeating patterns in primary and secondary storage X X
NetApp Data Motion Provides flexibility for management and optimal loadcapacity rebalancing in growing cloud environments without downtime
X
Aggregate Extensibility in Data ONTAPreg
Data ONTAP is the foundation for all features listed in this table and provides flexibility in handling physical resources It allows extending physical aggregates during operation
X X
Furthermore NetApp RAID-DPreg SATA and NetApp Flash Cache (formerly PAM II) are technologies that help to reduce the total cost of storage tremendously It is assumed that these technologies are deployed according to the requirements of the use case
NETAPP SHARED STORAGE INFRASTRUCTURE
To implement the practices outlined in this document some prerequisites must be met We assume a NetApp shared storage infrastructure implemented using large aggregates This acts as a utility for delivering storage in a flexible manner for applications with different needs It scales with the demands and serves a variety of different service levels at the same time NetApp Operations Manager monitors the NetApp shared storage infrastructure This software acts as a central management station and consolidates information about the current status of all NetApp storage controllers Based on this information Operations Manager indicates the necessity to change the phases and behavior in the data center
The NetApp shared storage infrastructure provides different ways for clients to consume its resources It can provide a traditional view where storage resources are located at a specific controller Using NetApp Provisioning Manager the infrastructure can also provide a service-oriented resource view that abstracts resource consumption and management from their physical controllers The abstractions of a storage service catalog resource pools and datasets provide easy manageability in the face of massive scale If multi-tenancy is not required then this is the abstraction of choice
Supported by the NetApp technologies MultiStorereg (vFilertrade) and NetApp Data Motion storage can be provided in a utilitylike fashion independent of physical hardware This makes possible high operational flexibility in the data center and allows building virtualized environments for multiple tenants with competing interests
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
12 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
3 PROVISIONING Some features such as data deduplication can be turned on or off at any time However to achieve the maximum consolidation and storage efficiency more strategies must be considered during the data-provisioning phase
In addition provisioning should take the flexibility of storage (for example the migrate ability) into account This allows administrators to easily move data off aggregates approaching capacity without downtime This is also an important aspect when planning to deliver services 24x7
Thus design goals for storage layouts are storage efficiency and operational flexibility In the following sections we discuss three orthogonal dimensions of provisioning storage Two of them focus on achieving data consolidation The third one focuses on achieving operational flexibility All dimensions can be combined independently Note that the achievable level of consolidation depends on the applications and its data
TR-3827 If You Are Doing This Then Your Storage Could Be Underutilized provides further understanding of storage efficiency and operational flexibility
31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
This section deals with the way data is provisioned and the consequences for storage efficiency We recommend applying the so-called zero fat configurations
We consider the storage setup for a single application instance The presented configurations can be applied while provisioning storage from scratch and to already provisioned storage When the technical dimensions of storage provisioning are categorized in primary data and its Snapshottrade copies space there are four theoretical combinations for both network-attached storage (NAS) and storage area network (SAN) environments In practical applications only two variants are relevant to NAS and three variants are relevant to SAN storage
bull Full fat bull Low fat bull Zero fat
According to NetApp best practices we do not mix block and file data within a single volume which allows us to consider NAS and SAN environments separately
NAS
For NAS two options are recommended full fat and zero fat
bull Full fat The primary data and Snapshot copy space are preallocated bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
13 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 7) Provisioning model for NAS storage from scratch Technically only two out of four combinations are possible
Primary Data (Files amp Directory) Space Allocation
Fat Thin
Full Fat Option No Option
No Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
Note Full fat is characterized slightly different in NAS and SAN due to their technical properties
FULL FAT PROVISIONING
Full fat provisioning NAS is the traditional (default) way to implement NFSCIFS shares Volumes in a full fat configuration are characterized as follows
bull Volumes are created with space guarantee bull The size of the volume follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data
bull Because space used for Snapshot copies might grow unexpectedly the autosize function can be used to make space available when reaching a certain volume threshold This would also happen when the space reserved for user data gets low
bull Space reservation for Snapshot copies is used to hide the capacity used for Snapshot copies from the consumers (NAS clients)
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Normally using autodelete is not recommended in NAS environments Keeping a certain number of
Snapshot copies for file versioningrestores might be part of the SLAs defined for file services
Note Deleting snapshots may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 2) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
14 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
Volume Snapshot Options
reserve yes Value depends on number of Snapshot copies and change rate within the volume
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
ZERO FAT PROVISIONING
The zero fat method is the most efficient way to provision NAS volumes
bull Volumes are created without space guarantee bull The size of the volume still follows the formula X + Δ
X is the size of the primary data = sum of all user data (files and directories) within the volume Δ is the amount of space needed to hold Snapshot data Sizing the volume defines a container with a virtual size for the consumers NAS users are familiar with fixed-sized file shares
bull Space used for Snapshot copies can grow unexpectedly You can use the autosize function to make space available when reaching a certain volume threshold You can also use the autosize function when the space reserved for user data gets low
bull Space reserved for Snapshot copies is used to hide from the consumers (NAS clients) the capacity taken up by Snapshot copies
bull For volumes with deduplication enabled volume autogrow is a mandatory option bull Using autodelete is normally not recommended in NAS environments Keeping a certain amount of
space for Snapshot copies for file versioningrestores is part of the SLAs defined for file services
Note Deleting Snapshot copies may be a reasonable approach when no other option for freeing up space is available but this will be a specific and individual decision
Table 3) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none
fractional_reserve 100 Leave at default mostly relevant for SAN environment Default value up to Data ONTAP 733 is 100 For later releases 0 is the default
autosize on Turn autosize on There is no artificial limited volume that needs to be monitored Autosize makes sense to allow growth of user data beyond the guaranteed space limit
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
15 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first - Autodelete is not recommended in most environments
Volume Snapshot Options
reserve yesno The value depends on the number of Snapshot copies and the change rate within the volume Displaying only the committed usable space using SLA is the preferred way to provision NAS storage However there might be situations in which the Snapshot reserve area is omitted (no)
schedule switched on Automatic Snapshot technology schedules
autodelete off Deleting Snapshot copies is not recommended in most NAS environments
SAN
For SAN we consider three options
bull Full fat Both primary data and its Snapshot copy space are preallocated bull Low fat The primary data is preallocated The Snapshot copy space is allocated on demand bull Zero fat Primary data and its Snapshot copy space are allocated on demand This variant achieves
the best ratio of storage efficiency when provisioning applications from scratch
Figure 8) Provisioning model for SAN storage from scratch
Primary Data (LUN) Space Allocation
Fat Thin
Full Fat Option No Option
Low Fat Option Zero Fat Option
Fat
Thin
Snapshot Copy Space
Allocation
FULL FAT PROVISIONING
This method can be treated as the historical way of provisioning block storage with Data ONTAP
bull Volumes are created with space guarantee bull A fractional (overwrite) reserve is used to guarantee that the primary data can be overwritten
completely with Snapshot copies in place If this space is not available Snapshot copy creation will fail
bull The size of the volume follows the formula 2X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
16 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The enhancements to the volume autosize capabilities (such as volume size-dependent thresholds) and the robustness of the Snapshot feature autodelete implementation have made full fat provisioning more or less obsolete As of today the default settings for creating volumesLUNs in Data ONTAP still apply to these settings See Provisioning from Scratch Full Fat to Zero Fat Provisioning for a discussion of using tools such as Provisioning Manager
Table 4) Full fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 100 Even technically possible a fractional reserve below 100 incorporates a potential risk to run out of Snapshot copy overwrite space This situation should be avoided
autosize off Autosize could be used as an option to create free space needed for Snapshot copy creation
Volume Snapshot Options
reserve 0
schedule switched off
autodelete off
LUN Options
reservation enable
LOW FAT PROVISIONING
With low fat provisioning we use a more space-efficient way to provision volumes
bull Volumes are created with space guarantee bull LUNs are created with space guarantee as well This setup does not benefit from unused blocks with
a LUN (During the lifetime of a LUN the amount of free unused blocks typically decreases Without space reclamation techniques allocated blocks on the storage system stay allocated)
bull The size of the volume follows the formula X + Δ X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data
bull Because space used for Snapshot copies might grow unexpectedly the autosize and autodelete policies are used to make space available when reaching a preset volume threshold
Table 5) Low fat provisioning
Option Recommended Value Notes
Volume Options
guarantee volume
fractional_reserve 0 Snapshot space is controlled by autodelete and autosize options
autosize on Turn autosize on
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
17 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under its specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow Increasing the size of the volume does not destroy any data or information There is no reason not to increase the size of the volume It can be reverted afterward if the volume free space increases again There might be configurations where automatic volume growth is not desired
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration of Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete on There might be Snapshot copies that are needed to fulfill certain SLAs such as backup SLAs Setting this policy needs to be negotiated with the business requirements In the worst case scenario deleting Snapshot copies is not an option
autodelete options volume oldest_first There is a precedent for Snapshot copies being a candidate for deletion oldest_first is the current default
LUN Options
reservation enable Reserves space for the LUN during creation
ZERO FAT PROVISIONING
Full and low fat provisioning use fully allocated volumes and LUNs Per-default space allocation happens within the boundaries of the LUN and the volume Zero fat follows a 100 ldquoallocate on demandrdquo concept
bull Volumes are created without space guarantee bull LUNs are created without space guarantee bull The size of the volume follows the formula X ndash N + Δ
X is the size of the primary data = sum of all LUN capacities within the volume Δ is the amount of space needed to hold Snapshot copy data N is the amount of unused blocks within a given LUN
Table 6) Zero fat provisioning
Option Recommended Value Notes
Volume Options
guarantee none No space reservation for volume at all
fractional_reserve 0 With Data ONTAP 733 fractional_reserve can be modified even for volumes without a space guarantee of type volume Prior to Data ONTAP 733 the value was fixed at 100
autosize on Turn autosize on
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
18 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Option Recommended Value Notes
autosize options -m X -i Y The business model drives the maximum value for the autosize configuration because it offers additional disk space for the consumer under specific conditions A reasonable resizing increment depends on various factors such as data growth rate in the particular volume the volume size itself and so on
try first volume_grow
Volume Snapshot Options
reserve 0 For NAS volumes setting a Snapshot copy reserve area and configuration Snapshot copy schedules is a common setup For SAN volumes this needs to be switched off according to NetApp best practices (see Fibre Channel and iSCSI Configuration Guide)
schedule switched off
autodelete off Deleting Snapshot copies might be an option when the volume can no longer be resized because the maximum configured size has been reached or when the aggregatersquos free space becomes low
LUN Options
reservation disable No preallocation of blocks for LUN
SUMMARY OF PROVISIONING METHODS
There are good reasons for using any of the provisioning methods already described however full fat for SAN environments should be avoided wherever possible because of the storage efficiency ratio Even with a 100 block usage ratio on primary data zero fat provisioning has many advantages and is the preferred method
bull The aggregatersquos free space is a global pool that can serve space for volumes This gives more flexibility than volumes with their own dedicated free space
bull For SAN volumes the block consumption can be easily monitored bull Deduplication savings go directly into the global pool of free space which is the aggregate or the
resource pool in which it belongs bull Monitoring is needed only on the aggregate level Volumes will grow on demand
Table 7) Comparison of provisioning methods
Characteristics Full Fat Low Fat Zero Fat
Space consumption 2X + Δ X + Δ X ndash N + Δ2
Space efficient
No Partially for Snapshot copies
Yes
Monitoring Optional Required on volume and aggregate level
Required on aggregate level
Notificationmitigation process required
No Optional in most cases Yes
2 N is the traditional thin provisioning impact = amount of blocks logically allocated but not used
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
19 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Characteristics Full Fat Low Fat Zero Fat
Pool benefitting of dedupe savings
Volume fractional reserve area
Volume free space area Aggregate free space area
Risk of an out of space condition on primary data
No No as long as autodelete is able to delete any Snapshot copies
Yes when monitoring and notification processes are missing
Typical use cases Small installations None or few storage management skills (no monitoring infrastructure)
Large database environments
Shared storage infrastructure Testdev environments Storage pools for virtualized servers
FULLLOWZERO FAT PROVISIONING WITH PROVISIONING MANAGER
NetApp Provisioning Manager focuses on fast and efficient provisioning of storage resources in the NetApp storage infrastructure It speeds up provisioning workflows and boosts capacity usage by using policy-based automation for the entire NetApp NAS and SAN infrastructure These processes are
bull Faster than manually provisioning storage bull Easier to maintain than scripts bull Instrumental in minimizing the risk of data loss resulting from misconfigured storage
Provisioning Manager applies user-defined policies to consistently select the appropriate resources for each provisioning activity This frees administrators from the headache of searching for available space to provision and allows more time for strategic issues The use of a centralized management console allows administrators to monitor the status of their provisioned storage resources Provisioning Manager can help improve business agility and capacity use shrink provisioning time and improve administrator productivity Provisioning Managerrsquos thin provisioning and deduplication capabilities provide a high level of storage efficiency from your NetApp storage investment
A GUI allows you to implement the provisioning models fulllowzero fat within Provisioning Manager See Figure 9 and Figure 10 for configuring storage efficiency in a provisioning policy for NAS and SAN Whenever storage is provisioned using this provisioning policy the settings apply automatically For more information refer to TR-3710 Operations Manager Provisioning Manager and Protection Manager Best Practices Guide
Provisioning Manager encapsulates technical details when provisioning storage and supports an easy integration with existing management tools and orchestration frameworks Policies and their use in so-called datasets and storage services allow you to exploit NetApp storage efficiency technologies without exposing a high level of technical detail to a higher level of management software
Note Provisioning Manager up to version 40 does not allow you to specify autosize and autodelete individually the policy template determines if these features are used and which options are selected In order to implement the provisioning methods outlined a customized ldquoProvisioning Scriptrdquo needs to be provided to set autosize and autodelete parameters according to the recommendations for FullLowZero methods Post provisioning scripts are standard with Provision Manager Use caution when Provisioning Manager runs conformance checks this reverts individual settings
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
20 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 9) Configuring fullzero fat provisioning policy using Provisioning Manager for NAS Select checkboxes as outlined Provisioning Manager deviates from zerofull fat by first growing volumes with autosize and then allowing snapshot autodelete
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
21 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 10) Configuring fulllowzero fat provisioning policy using Provisioning Manager for SAN storage Select checkboxes as outlined Provisioning Manager deviates by not turning on autosize for zero fat
FULLLOWZERO FAT PROVISIONING FOR STORAGE SERVICES
Storage services are an easy abstraction to provision storage in a utilitylike fashion A storage service describes all characteristic attributes for storage needed in a certain scenario A storage service catalog lists the available templates and allows you to provision storage with these attributes on demand Technically storage services or datasets consist of one or more resource pools a protection policy and a provisioning policy Fulllowzero fat provisioning for storage services is configured in the configuration wizard of the provisioning policy Figure 11 shows the provisioning policies closest to fulllowzero fat configurations
Figure 11) Fulllowzero fat provisioning policies for datasets and storage services
Because this wizard is able to configure the deduplication feature two policies are configured for the zero fat configurations one with deduplication and one without deduplication
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
22 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
HOW SHOULD A VOLUME BE SIZED
Because physical allocation of data within a zero fat-provisioned volume is done on demand theoretically the volume size can be set to a very high value that can easily keep all application data and Snapshot copies As the unallocated space in the volume is not exclusively reserved for this volume itself all other applications can benefit from the shared pool of unallocated storage However NetApp recommends that you size the volume to the expected size of its containing objects and use the autogrow option to let it grow on demand The important advantage is that the commitment rate acts as a metric for data consolidation
Note The commitment rate reflects the amount of logical data consolidation This metric is suitable for deciding when data should be left for organic growth
Additionally the volume size limits when using deduplication should be taken into account because the maximum sizes depend on the storage controllers
APPLICATION RECOMMENDATIONS
Thin provisioning is most effective when applications use data that is committed to them step by step When applications preformat data the immediate effect of thin provisioning is lost002C and only deduplication may reclaim sharable blocks Because thin provisioning has no performance penalty the general recommendation is to provision with the zero fat configuration
For SAN-attached storage NetApp recommends that you use file systems supporting space reclamation technologies such as the SCSI UNMAP and SCSI WRITESAME commands This passes the information through the storage stack that a particular block is not used anymore and allows unused space to be reclaimed On Windowsreg platforms this can be configured in NetApp SnapDrivereg
For Oraclereg database best practices refer to WP-7084 Storage Efficiency in an Oracle Environment
32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
This section deals with provisioning storage for similar applications from a golden template A valid use case is a hosting provider who offers and serves predefined application services in mass quantities Instead of provisioning each application from scratch the data of the application instance is provisioned by creating a copy of a preconfigured template or golden copy that is customized using a postprocessing procedure
When applications are provisioned this way NetApp cloning technologies generate virtual copies of the template data instantly and with efficient use of space This achieves a high degree of data consolidation and cost savings
The potential of NetApp cloning technologies also plays a central role in development and test environments as well as software maintenance scenarios Testing and updates can be performed very easily because these cloning capabilities work instantly and with almost no overhead for performance CPU and memory
There are two ways to align application data to a NetApp shared storage infrastructure
bull Volume-centric storage layout bull Dedupe-centric storage layout
Depending on the data lifetime suitability for deduplication consistency and tool constraints one way of aligning application data is more appropriate than the other
In both variants the storage of the application template can be provisioned as either full low or zero fat The cloning procedure inherits the attributes of the parent volume To create space-efficient clones the space guarantee must be set to none
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
23 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
CONSEQUENCES FOR MONITORING
When using one of the following layouts very high data consolidation can be achieved Because this effect depends on the usage characteristics of the corresponding applications monitoring the aggregate is key In case a low fat volume acts as a template that is cloned preserving the original space guarantees monitoring is necessary for the cloned volumes as well
VOLUME-CENTRIC STORAGE LAYOUT
In volume-centric storage layout an application instance is organized into one or a few volumes to benefit from the Data ONTAP volume-centric management and maintenance operations such as instant cloning and volume-consistent Snapshot copies
In addition to the convenient ways to manage volumes volume-centric storage layouts have storage efficiency advantages in two dimensions
bull High instant storage efficiency savings High instant savings when cloning data of an application instance with FlexClone savings might deteriorate over time
bull Long-term storage efficiency savings Medium long-term savings when deduplicating application data
A volume-centric layout makes it easy to provision storage for another instance of an application by cloning a consistent volume representing the template of the intended application and attaching it to an instance where it is processed This approach works for both NAS and SAN
Figure 12 shows the data alignment of an application instance and its volume An application instance organizes its data in one or more dedicated volumes Note that the entire construct is created within one aggregate
Because deduplication is performed on the volume level long-term savings depends on the block-sharing rate within one instance of an application Volume-centric layouts are preferred in the following cases
bull Simplicity of data management using volumes bull Individual control over the SLA of each application instance bull Application instances with a short duration bull No consideration of deduplication bull Management tools that require volume-centric layouts
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
24 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 12) Volume-centric storage provisioning Application instances are aligned horizontally with their volumes
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtreeTemplate
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance 1
Deduplication Block Sharing FlexVol
LUNqtree LUNqtree LUNqtree LUNqtree LUNqtree
Deduplication Block Sharing FlexVol
Instance n
FlexClone Block Sharing
Impact on commitment and storage utilization The impact of using FlexClone to clone a volume-centric storage layout to implement storage template-based provisioning is visualized schematically At clone creation Data ONTAP creates metadata for the new instance of the data It allocates data for storing changes to the cloned copy or new data on request Thus the overcommitment of the aggregate containing the cloned data increases when creating the clone However this does not affect the space used in the aggregate When data in the clone is rendered and new data is added by the application the aggregate use will grow
Best Practice
A volume-centric layout implicitly implements a consistency group It is preferable to align all application data in it which should be recovered at a certain point Cloning can achieve significant savings when a FlexClone volume is created to provision data for a new service instance
Client side data realignment such as disk defragmentation or database table space reorganization has a counterproductive effect on the FlexClone savings This realignment has a temporarily counterproductive effect on the deduplication savings required to execute the deduplication process If possible the following actions on client data should be avoided
bull Reorganizing data for example database reorganization of table spaces or defragmentation of virtual disks provisioned through cloning
bull Preformatting data
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
25 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
DEDUPE-CENTRIC STORAGE LAYOUT
In a dedupe-centric storage layout the goal is to achieve high storage efficiency returns from the deduplication feature In contrast to the volume-centric storage layout data of different application instances is grouped to achieve storage efficiency returns across a set of application instances Figure 13 shows a sample dedupe-centric storage layout Data of application instances is organized horizontally Individual data of each application is grouped vertically in a volume to implement deduplication
This layout makes sense in virtualization scenarios where the images of the guest machines can be grouped easily Grouping partitions containing boot images and commonly used programs is very effective because they share much of the same data To implement template-based provisioning with such a layout cloning template data must be performed with the fileLUN FlexClone operation FileLUN FlexClone allows storage objects to be cloned within a volume providing finer granularity
This storage layout provides the following storage efficiency advantages in a short- and long-term perspective
bull Very high long-term storage efficiency savings Long-term storage efficiency savings are achieved due to the deduplication-centric storage layout and deduplication returns
bull Short-term storage efficiency savings Instant storage efficiency savings are provided when cloning an application instance through a fileLUN FlexClone operation for example template application data
In contrast to the volume-centric storage layout application instances are bundled together in a matrix style because of their participation in a volume This implies that the applications share major operational tasks and are managed as a bundle From an SLA perspective a diversification of service levels within the application instances cannot be implemented as easily as with a volume-centric layout Achieving application-consistent Snapshot copies requires the iterative application of fileLUN FlexClone functionality to all storage objects of the instance This is slightly more difficult than cloning with a volume FlexClone operation
TR-3505 NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide provides a deeper understanding of NetApp deduplication and its deployment
Figure 13 illustrates dedupe-centric storage provisioning Volumes are shared among several application instances to achieve cross-dedupe returns Note that this construct is created within an aggregate Volumes can be assigned to different aggregates
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
26 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 13) Dedupe-centric storage provisioning Application instances are aligned horizontally volumes are aligned vertically
Template
Instance 1
Instance 2
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Deduplication Block Sharing
within FlexVolume
FlexVol
LUNqtree
LUNqtree
LUNqtree
Impact on commitment and used aggregate usage When creating the FlexVol volumes for this layout their individual size is contributing to the commitment rate The aggregate use grows with the provisioning and object use within the FlexVol volumes Provisioning a new instance in this layout through a fileLUN FlexClone operation has no effect on the overcommitment rate It has an effect on the overdeduplication value of the volumes itself Thus NetApp recommends using zero fat configuration for the volume to have autogrow enabled
Best Practice
This layout is very attractive for applications using multiple but similar storage objects among service instances (for example virtual disks in virtual machine hypervisors) They usually use similar operating systems and applications in dedicated virtual disks Thus grouping these storage objects leads to a very high degree of consolidation due to deduplication
Quickly changing data such as pages and swapfiles should not be considered for inclusion in deduplicated volumes on primary storage Deduplication savings are limited due to their high change rate and do not justify running the deduplication process NetApp recommends that this type of data is not placed together with data that dedupes well in the same volume
We further recommend not performing client data realignments such as Windows disk defragmentation or database table space reorganizations Because of the way that NetApp storage controllers work defragmentation of client data is served at no performance penalties
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
27 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
Settlednomad provisioning is a provisioning pattern that helps increase the utilization of NetApp shared storage When the online migratability features of storage are exploited response times to mitigate data growth scenarios are independent of application-specific planned downtime windows Storage is classified into potential migration candidates and can be migrated away from a tight aggregate at a storage controller to another one while assuring its accessibility Thus it is an elegant technology to relax the use of an aggregate outside the planned downtime windows of the served applications NetApp MultiStore technology implements this feature using the vFiler abstraction which NetApp recommends you consider in the provisioning process Secure multi-tenancy environments implemented using MultiStore technology harmonize well with this provisioning approach
The settlednomad provisioning pattern is a perfect metaphor to react on data growth in an aggregate Figure 14 illustrates the concept of settlednomad provisioning in the aggregates of the storage controllers and the migration of a nomad outside its aggregate The settled part describes data that does not move during its lifetime It might use vFiler units to simplify operation and hardware maintenance of the storage controller but there is no direct need The nomad parts are considered moving parts and thus must make use of vFiler units The ratio between the size of the settled and nomad parts depends on the growth rate and lifetime of the data in the settled part Assuming that the aggregate size is constant over this period the aggregate is filled with settled and nomad data Over the data lifetime more and more nomads are migrated away At the end of the lifetime the settled data is left
It is irrelevant whether the data growth happens in the settled or nomad part when a nomad is migrated away the resource situation on the aggregate is relaxed It is preferable to provision several nomads of different sizes This allows you to
bull React on different growth scenarios of the data bull Quickly migrate smaller nomads when time or an interstorage controller network is considered to be a
limited resource bull Operate the aggregate in its operational sweet spot corridor over a long time frame by slicing the
migratable entities in the right way you can be sure that the aggregate operates in a predefined use interval
Figure 14) Settlednomad provisioning into an aggregate In case of aggregate tightness a nomad is migrated to a separate aggregate
Settled
Aggregate
Nomad Nomad
To summarize the settlednomad provisioning pattern is an elegant method to adjust the block use of an aggregate The use of an aggregate can be controlled and kept in a desired corridor
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
28 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
SLA-BASED ASSESSMENT FOR SETTLEDNOMAD
The goals of an SLA-based assessment are to optimize SLA fulfillment and to avoid or minimize penalty costs The accessibility of the applications described by its individual service levels is used for an assessment into settled and nomad instances
We use the introduced SLA metric of service disruption and map it to the stickiness of the settlednomad instances The vFiler entities allow online migration of NFS and iSCSI-attached nomad instances without any changes at the client side Fibre Channel-attached storage cannot be migrated online at the time of writing Refer to TR-3881 for an understanding of DataMotion in an Oracle database and Microsoft Exchange environment
Alignment by technical impact For data belonging to application with SLAs that fit perfectly into what is provided a direct assignment can be made For example application instances with the lowest acceptable service disruption should be the last candidates to be migrated (settled) applications with the highest acceptable service disruptions should be considered as nomads However there might be data of application instances that likely will be migrated during the application lifetime You must take into account the business impact of migrating these instances
Figure 15) Alignment by technical impact (sorted by negative impact in descending order)
Settled Nomad
Eg All FC
Instance Inst1 InstN
Medium LowInside SLANeg Impact HighOutside SLA
Alignment by business impact An assessment of penalty costs is made for the data of the remaining applications For vFiler migration a very short negative impact on the performance of the service level must be taken into account during the migration Thus application data with the highest negative impact is considered to be the stickiest
Figure 16) Alignment by business impact (sorted by negative impact in descending order)
Settled Nomad
Instance
Neg Impact $$ $
Semi-Settled Nomad
PERFORMANCE AND THROUGHPUT IMPACT OF MIGRATION
Migration of a nomad might be triggered due to heavy storage consumption in an aggregate It might also be triggered due to performance limitations of the corresponding storage controller Because the progress of migration is consuming additional resources on the network and the participating storage controllers
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
29 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
this consumption must be taken into account to avoid further intensifying the situation Refer to TR-3881 for a quantitative evaluation of DataMotion
NetApp recommends that the use of storage controllers be adjusted in a high-availability configuration in such a way that the remaining controller can master the load in the case of a failover Doing so should leave enough resources to perform migrations
Migrating vFiler entities consists mainly of SnapMirrorreg and MultiStore technology Thus it requires these licenses on all participating storage controllers TR-3814 NetApp Data Motion provides a thorough presentation of migration using the NetApp Data Motion solution It focuses on implementing and triggering the migration of vFiler entities using NetApp Provisioning Manager Furthermore vFiler units can be managed manually to allow for handy off-line migration with a very short interruption of storage accessibility
ENABLE SETTLEDNOMAD FOR ALREADY-PROVISIONED STORAGE
While NetApp recommends that you initially consider the settlednomad setting and take sizing and lifetime of storage into account it is possible to implement this in a planned downtime window If NFS-attached storage should be migrated existing volumes can be adopted by a vFiler entity Because the vFiler entity has its own IP address the clients attaching the storage need to be remounted
SETTLEDNOMADLIKE SETTING WITH SHORTLONG-TERM DATA PAIRING
In the previous section the settlednomad pattern was described to mitigate organic data growth The same effect can be achieved when storage is identified as belonging to instances that are going to be deprovisioned due to their end of life Taking into account the expected lifetime of provisioned storage allows you to plan deprovisioning situations in advance This relaxes the dependence on aggregates outside planned downtime windows and without the technical requirements of a settlednomad setting
ONLINE MIGRATION IN VIRTUALIZED ENVIRONMENTS
Online migration features in a virtualization hypervisor provide a further alternative to implementing a responsive scheme to react to data growth scenarios For example VMwarereg Storage VMotiontrade is capable of transferring a virtual machine including its storage when it is attached using a datastore Storage of virtual machines served by a NetApp datastore can be migrated to another NetApp served datastore by migrating each virtual machine In such cases a nomad can simply be implemented by a NASSAN-attached datastore
In contrast to a data transfer based on SnapMirror directly between NetApp storage controllers the migration traffic flows using the hypervisor This might have consequences for the execution of the virtual machines Also the NetApp storage efficiency savings cannot be exploited during the transfer Deduplication savings are gained back by executing the deduplication process on the destination storage controller
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
30 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
4 OPERATION This section focuses on the operation and management of overcommitted storage The goal of management is to fulfill the SLAs of the stored data while achieving a high level of efficiency It addresses questions of how to detect situations that need manual assistance how to raise the awareness of the operational staff and how to resolve situations that arise
We first consider situations that put the SLA fulfillment at risk Then we focus on actions that can be taken to avoid further aggravation by making the situation evident and presenting mitigation alternatives
Consider the operational process as a loop that monitors and evaluates the current situation and triggers the transition of a storage resource among phases
bull Provision storage bull Leave room for organic growth It might be desirable to still allow for extending storage of previously
provisioned applications bull Reduce storage use with mitigation alternatives such as deletion data motion and so on
These transitions must occur within a specified time frame to optimize operational flexibility and to prevent endangering the SLAs The point is to detect situations that will violate the SLAs in the future
SITUATIONS PUTTING SLA FULFILLMENT AT RISK
Over time more and more data is stored and processed by the provided applications NetApp storage efficiency technologies compensate this growth To prevent running out of physical resources usage must be managed within safe boundaries This makes sure the operations team has enough time to react with the appropriate mitigation strategy
The following list summarize situations that are critical for service delivery
bull Running out of time Some mitigation alternatives must be triggered in advance and a passage of time might be needed for their effect to become evident This time determines the number of mitigation alternatives that could be considered at a certain time
bull Running out of mitigation alternatives Several mitigation alternatives exist to control the usage However some alternatives are one-time activities and some must be performed within a certain time frame Depending on the situation not all alternatives might be available for use
bull Running too tight on storage Over time applications use more and more of the blocks from storage that were committed to them This forces Data ONTAP to allocate from a pool of free blocks Assuming data growth the size of the free block pool directly translates into available time to react
bull Running out of storage completely This must be prevented because it has a high impact on the availability of the service Furthermore data integrity can be at risk Consider the following scenarios minus Application wants to write to committed storage but fails (NASSAN) For applications this
looks like a storage failure and implies service disruption Data integrity can be at risk minus Application wants to allocate new storage but fails (NAS) An application is confronted with a
ldquoNo space left on devicerdquo exception Verify the application behavior on this exception Most applications can deal with this situation and data integrity is not at risk
Two cases need to be differentiated when mitigation is necessary to solve a situation of tight storage Storage for an object such as a LUN or a share can be tight because of minus Insufficient space within the volume in which the storage object is contained minus Insufficient free space within the aggregate in which the storage object and its volume are
contained
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
31 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
The following sections focus on how to detect that a change is necessary and that a storage resource should be transitioned to another phase We describe the different phases and how monitoring can support their detection and how this information can be made known to operational groups
41 PHASES AND TRANSITIONS
This section outlines the phases of a storage resource Starting with an empty aggregate storage is provisioned to certain thresholds After that storage is left for organic growth After certain thresholds are exceeded further inspection or activities must be performed to mitigate storage tightness
bull Provisioning storage When certain thresholds are within a defined range storage is provisioned to the aggregates Monitoring should support making a decision to transition to the next phase
bull Leave storage for organic growth When certain thresholds are exceeded provisioned storage is left for organic growth Depending on the environment storage of existing applications might still be extended and a second threshold might signal that extensions are not possible anymore Monitoring should support making a decision to transition to the next or prior phase
bull Mitigate storage use When certain thresholds are exceeded this phase must make sure that committed storage can be delivered to store applications data The effect of a mitigation activity should be to put storage resource back in the preferred operational corridor Monitoring should support making a decision to transition back to the organic growth phase
42 MONITORING
NetApp Operations Manager delivers comprehensive monitoring and management for NetApp shared storage It provides alerts reports performance monitoring and configuration tools to keep the NetApp storage infrastructure in line with business requirements for maximum availability and efficiency NetApp Operations Manager provides a single human interface and an application programming interface (API) for integration with third-party management and orchestration software vendors
Operations Manager monitors the NetApp shared storage infrastructure and is able to raise awareness for certain situations Events can be set to trigger an action when operational parameters are within a certain range and indicate a relevant situation When the event triggers an alarm notification can be sent by e-mail pager Simple Network Management Protocol (SNMP) or customized scripts To raise awareness about a certain situation the event must be characterized using the metrics provided by Operations Manager To communicate the event an alarm must be set
THRESHOLDS
Operations Manager monitors relevant parameters that indicate the presence of specified situations Thresholds can be set to trigger actions for example to notify the operational team that an alarm situation exists The thresholds can be set to notify in advance Operations Manager also performs trending on operational parameters to express the urgency of a certain situation This supports the decision making on how to react to a certain situation
Within your Operations Manager instance the thresholds can be verified and set by navigating to the Default Threshold page and following SetuprarrOptionsrarrDefault Thresholds or the link httpopsmgrserverportdfmeditoptions Figure 17 shows a sample configuration page
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
32 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 17) Operations Manager screen to configure thresholds on operational metrics
For aggregates Operations Manager provides a set of thresholds described in the following list They represent absolute limits Operations Manager alarms can be used to notify operational staff and raise awareness of a specific situation
Monitoring the aggregates is very important They are the physical containers of preallocated and growable storage objects that host application data If an aggregate of a storage controller runs at uncontrolled usage it could have direct consequences for applications for which it is providing data
The concrete settings for these thresholds depend on the necessary time to relax aggregate block usage When no mitigation alternatives should or can be taken over the lifetime of the data mitigation actions must be performed in scheduled downtime windows Thus threshold settings and actions tend to be more conservative to avoid SLA-endangering situations
bull Aggregate full threshold This threshold on the metric aggregate block use allows triggering an alarm that notifies a person in charge
bull Aggregate nearly full threshold This threshold is the counterpart of the aggregate full threshold but provides an earlier notification
bull Aggregate over committed threshold This threshold on the metric of committed storage allows triggering an alarm that notifies a person in charge This metric refers to the amount of storage that is assigned to applications It represents the level of consolidation and also the width and increase of the block use corridor
bull Aggregate nearly over committed threshold This threshold is the counterpart of the aggregate over committed threshold but provides an earlier notification
Operations Manager provides a threshold that can be used to alert operational staff when volumes are in a certain state
bull Volume full threshold This event notifies a person in charge that the preset threshold on the metric volume has been reached
bull Volume almost full threshold This event is the counterpart of the volume full threshold but provides an earlier notification
bull Volume autosized This event notifies a person in charge when a volume was extended using the autogrow functionality
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
33 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
TRENDING
Operations Manager 40 supports a variety of trending features for certain storage objects This is an important feature for all storage objects with a fixed size It allows you to estimate when the time frame within a certain situation needs to be mitigated The trend is calculated as a linear regression of up to 90 days in the past For aggregates Operations Manager calculates a trend on the daily growth rate In your Operations Manager instance use the link httpopsmgrserverportdfmreportviewaggregates-growth-rates for trending of aggregate growth rates and the estimated remaining time until the storage object is full Each aggregate can be drilled down and you can select trending based on an interval of one day one week one month three months or one year To see the effect of recent data activities set the interval of a trend calculation to enclose this activity Investigate if growth rates calculated over different intervals deviate significantly
Figure 18) Trending of data growth and days-to-full prediction in Operations Manager
Note The calculation basis of time to full is the usable aggregate capacity This value is not calculated based on the aggregate full threshold setting
The trending on the volume level is analogous to the trending on the aggregate level In your Operations Manager instance access the link httpopsmgrserverportdfmreportviewvolumes-growth-rates for trending of volume growth rates NetApp recommends that you order the view by growth rates descending or time to full increasing in order to focus on the relevant candidates
On the volume level you can set an alarm to fire when the volume growth is outside the usual boundary
Abnormal volume growth This event notifies when the growth rate of a volume exceeds a preset limit It is helpful to signal unusual behavior concerning storage consumption and point the operational staff to the right storage object
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
34 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
INDIVIDUAL THRESHOLDS PER AGGREGATE OR VOLUME
For each aggregate or volume the general default settings can be overwritten and be made more specific In order to do so select your aggregate or volume of choice For example you can use the links already provided in this technical report When selecting a concrete aggregate it can be configured using the Edit Settings link and dialog When selecting a concrete volume its configuration can be adapted using the Edit Quota Settings link and dialog
MONITORING STORAGE EFFICIENCY RETURNS
NetApp Operations Manager provides a dashboard to visualize storage efficiency returns in the NetApp shared storage infrastructure This report lists important parameters drilled down by utilization capacity unused reserve capacity storage efficiency and efficiency return breakdown It allows you to judge the effectiveness of the NetApp storage efficiency technologies Figure 19 provides a sample screenshot of the storage efficiency dashboard in NetApp Operations Manager Consult the NetApp Operations Manager Efficiency Dashboard Installation and User Guide for further information on the information provided by this dashboard
Figure 19) Storage efficiency dashboard in Operations Manager
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
35 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
43 NOTIFICATION
Operational staff must be notified when situations occur that require a transition of phases especially situations with negative consequences for the SLA fulfillment Operations Manager provides alarms for notification Alarms are bound by the metrics and thresholds explained in section 42 and notify operational staff storage administrators or storage capacity planners Alarms are the instrument used to keep the management effort of the NetApp storage infrastructure low
After being notified the responsible person can evaluate the situation and decide which actions to take Further the trends on operational parameters provided by Operations Manager simplify the decision-making process
Depending on the organizational structure the responsibilities to operate plan and administer the storage infrastructure can be separated into different groups persons or roles Thus we characterize the mitigation activities by required skill set and time to act This allows an easy alignment to a given organizational structure
Operations Manager supports different methods to send a notification The notification methods can be used in combination for example a notification can be sent by both e-mail and SNMP
NOTIFY BY E-MAIL
An alarm can be sent to multiple destinations by e-mail Repeated notifications can be sent when the situation is not resolved To set an alarm access the alarm configuration page by following SetuprarrAlarms from the default Operations Manager dashboard Clicking Advanced Version accesses an advanced version of this page The direct link for the advanced version is httpopsmgrserverportdfmeditalarms-advanced Figure 20 shows how to configure an alarm Adjust the threshold as described in section 42
NetApp recommends using distribution lists or aliases with meaningful names rather than addresses of individual persons If you follow this recommendation changing responsibilities and roles does not require you to make corresponding changes to Operations Manager
NOTIFY BY SNMP
Operations Manager supports the notification of alarms using SNMP a widely used standard that is supported by most orchestration frameworks and ticketing systems Using SNMP Operations Manager can be integrated into existing ticketing systems Figure 20 shows setting up an alarm firing based on the aggregate almost full threshold The SNMP trap host is configured using hostname or IP address and the port on which the SNMP agent is listening The alarm can be saved and tested
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
36 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 20) Configuring an alarm based on the threshold aggregate almost full
Note The SNMP event must be routed to the responsible groups or persons in the ticketing system Thus mapping the detected situation and responsible operational group must be implemented there
NOTIFY BY SCRIPT
Operations Manager supports notifications in highly customized integration scenarios A user-defined adapter can be executed which delivers the information to the infrastructure or system of choice A script can be used to implement such an adapter and act as the glue between Operations Manager and the customer infrastructure To set an alarm on the event aggregate almost full which starts a script instrument Operations Manager on the command line
dfm alarm create ndashs script_to_execute ndashh aggregate-almost-full
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
37 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
44 MITIGATE STORAGE USE
Uncontrolled storage use can limit operational flexibility and might put SLA fulfillment at risk This section focuses on mitigation activities to preserve flexibility by controlling use within its defined corridor The effect of a mitigation activity should return the usage to its defined corridor
Storage tightness might occur in aggregates or volumes depending on their configuration When all volumes in an aggregate are thin provisioned with the zero fat configuration they use the shared pool of free blocks of the aggregate to deal with data growth To solve this situation a mitigation activity on the aggregate level is necessary
When storage objects in a fixed size volume cannot grow to the committed space a mitigation activity on the volume level is necessary to solve upcoming volume tightness
MITIGATION ACTIVITIES FOR AGGREGATES
Aggregates are the coarsest storage object within a NetApp storage controller Finer-grained storage objects such as FlexVol volumes and their content are usually thin provisioned using zero fat configuration They might grow on demand however because they live within an aggregate of physically limited size the growth of the storage object itself is also limited As described in the following list providing usable space in the aggregate automatically allows contained storage objects to grow
1 Increase the aggregate You can add drives to aggregates during operation You can repeat this mitigation activity The maximum aggregate size depends on the Data ONTAP version the type of aggregate and the type of storage controller Aggregates with 64-bit supported with Data ONTAP 8 have very high limits Additional drives can be used immediately however their procurement needs to be taken into account Rebalancing data between existing and new drives results in a uniformly distributed use of the drives
2 Decrease aggregate Snapshot copy reserve This reserve is needed in MetroCluster and for SyncMirrorreg configurations In other configurations you can decrease this reserve or set it to zero
3 Shrink preallocated volumes Volumes with preallocated space reserve available aggregate-free space When possible these volumes can be shrunk returning the freed space to the aggregate to allow others to make use of the preallocated space
4 Enable deduplication and shrink the volume 5 If available migrate a nomad online to a different storage controller Doing this on the NetApp storage
controller level requires storage provisioning based on vFiler and a MultiStore and SnapMirror license Adequate free space on the aggregates of the target storage controller is required This mitigation activity is not limited in its applicability
6 A volume can be migrated from one aggregate to another aggregate within the same or another storage controller SnapMirror replicates the data while it is still served To switch over to the replicated data the client needs to detach from the source and reattach to the replica After completion the replica is considered the new source This operation has an impact on client downtime Typically inter-data center bandwidth allows you to synchronize the source and the replica within the range of a few minutes
7 If none of the listed activities can be used the application must be stopped to achieve a consistent state
The mitigation activities for aggregate tightness are summarized in Table 8 Note that Provisioning Manager performs mitigation alternative 3 to 6 for secondary storage online
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
38 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Table 8) Mitigation alternatives to control use within aggregates
No Mitigation Activity Repeatability SLA Impact Preparation Time
Time to Show Effect
1 Increase aggregate capacity by adding disks (Data ONTAP 7x) Low limits
None HW procurement
Immediate (+rebalancing)
Data ONTAP 8 High limits
2 Decrease the aggregatelsquos Snapshot copy reserve area if possible One time None None Immediate
3 Shrink other volumes in the aggregate if they have enough free space
One time Low None Immediate
4 Run deduplication and shrink volumes Repeatable Low
Time to execute dedupe
Immediate
5 Migrate nomads (online) Repeatable Low None Minutes vFiler migration time
6 Migrate volumes to a different aggregate (offline) Repeatable Medndashhigh
Next planned downtime window
Minutes Volume switch-over time
7 Prevent application data loss and stop the application then migrate (offline)
Repeatable Lowndashhigh Coordinate with app owner
Minutes Migration time
MITIGATION ACTIVITIES FOR VOLUME TIGHTNESS
Mitigation activities for volume tightness are relevant for volumes that are at risk because their storage objects cannot grow to the committed size When it is not possible to enable growth for storage objects contained in volumes you need to perform an aggregate mitigation activity
Note Some of these mitigation alternatives depend on and affect used capacity (in the aggregate)
Table 9) Mitigation activities for resource tightness within volumes
No Mitigation Activity Repeatability SLA Impact Prep Time Time to show effect
1 Reduce the volumelsquos Snapshot copy reserve (if configured and not used) One time Low None Immediate
2 Increase the volume if there is free space in the aggregate (see Table 8) One time Low None Immediate
3 Delete Snapshot copies not needed or those skipped by the AutoDelete function Limited Low None Immediate
4 Activate FAS deduplication for the volume (requires proper space guarantees) One time
Lowpossible performance impact
Wait for schedule Hours
5 If the volume contains more than a single LUN migrate those objects to another volume or aggregate
Repeatable High Next planned downtime window
Minutes Volume migration time
6 Stop application and migrate data Repeatable High Coordinate wapp owner
Minutes Migration time
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
39 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
5 REAL-LIFE SETTINGS This section summarizes two different operational settings The first one does not make use of online data migration and settlednomad provisioning pattern the second setting implements a settlednomad provisioning pattern to maintain the flexibility for online data migrations
The concrete threshold settings and approaches might be very customer and application specific To exploit NetApp storage efficiency features in your own data center NetApp recommends that you start conservatively After you are familiar with the process work toward the customer-specific optimum
51 SAMPLE SETTING 1 REAL-LIFE SETTING
This section describes a real-life setting a customer started with It makes use of a limited set of mitigation alternatives This is especially beneficial when the installed storage capacity should be constant over a long time frame or physical systems are already fully equipped A settlednomad setting is not considered Thus the threshold to signal a transition of the phases are set lower and more conservatively for this customer Because on-line data migration and aggregate extension are not available as a mitigation alternative sufficient available space is required to safely reach the next planned downtime window as shown in Figure 21 In practice refer to the aggregate days to full trend value to get an idea of available days to full based on past data growth
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Aggregate extension is not a mitigation alternative bull Online migration is not a mitigation alternative
Figure 21) Storage to enable organic data growth between planned downtime windows
Data Data Growth
Planned Downtime Window
Planned Downtime Window
Months Time
Note Several months might fall between planned downtime windows to perform major mitigation alternatives
The primary concern is preventing the critical situation where aggregates reach a utilization level that is too high to enable organic growth during the period of agreed planned downtime windows To prevent this situation sufficient space must be reserved to enable data growth Second the level of data consolidation is monitored to manage accumulated growth rates safely
Provisioning new data is stopped when one or both thresholds on the first and second metrics are reached The operational teams are notified using an alarm on the Operations Manager event aggregate
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
40 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
nearly full threshold (event configured when metric exceeds 50) and the event aggregate nearly over committed threshold (event configured when metric exceeds 110) These alarms stop the responsible entities from provisioning new storage the aggregate is left for organic growth
An assessment of the storage situation might be performed Depending on experiences and knowledge of the application growth rates seen in the past the thresholds may be adapted After the upper threshold of the operational sweet spot corridor is left an alarm based on aggregate full threshold (set initially to 65) is sent to the storage administrators to make the decision for migrating data in the next planned downtime window In the meantime organic growth can take place in the yellow-marked area shown in Figure 22 The metrics used are
bull First metric Aggregate capacity used bull Second metric Aggregate space committed
Because all storage is provisioned using the zero fat option no artificial limited storage container exists Thus there is no need to consider a volume-based metric Figure 22 shows the behavior depending on metrics aggregate capacity used and aggregate committed space
Figure 22) Transition of changes depending on the metrics aggregate capacity used and aggregate committed space
Data Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used
Aggregate Space Committed
0ndash50 gt 65
0ndash110 gt 120
Provisioning New Storage Y
Capacity Assessment Adapt Thresholds
Mitigate
Y Y
Y
Provisioning New Storage Y
Assess Capacity Y
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
41 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
52 SAMPLE SETTING 2 SETTLEDNOMAD
This section describes a setting that takes the settlednomad provisioning pattern into account and allows migrating nomad data flexibly and in a timely manner thanks to vFiler technology This setting requires storage space at alternative locations where nomads might be migrated It is seen more often in larger environments with an emphasis on NFS-attached storage It allows operating the NetApp storage infrastructure at very high use and in narrower operational sweet spot corridors Figure 23 visualizes the effect of a mitigation alternative that can be performed online
Figure 23) Narrower corridors due to the ability to perform mitigation alternatives in hours instead of months
Settled
Detecting the Need to Act
Effect of Mitigation (eg migration)
Hours Time
N NN N N
In this sample setting as well as in sample setting 1 the critical situation to prevent is where aggregates become too full However the flexibility gained with online data migration does not require taking a further metric into account for example storage overcommitment
bull All storage is provisioned using the zero fat option with growable FlexVol volumes Only aggregate monitoring is used
bull Storage is provisioned using the settlednomad pattern with ability to perform online migration bull Days to full aggregate trending was more than 200 days on average Note that this value depends on
the individual situation and is calculated against 100
The sole metric in this setting is aggregate capacity used Table 10 contains the thresholds describing the transition of phases
Table 10) Phase transitions with settlednomad provisioning pattern and on-line migration mitigation alternative
Detection Threshold Notify Mitigation
gt 70 Storage operations Stop provisioning of storage
gt 85 Storage operations Stop extending provisioned storage
gt 90 Storage operations Relax resource situation and migrate nomad
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
42 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Figure 24) Visualization of phase transitions depending on metric aggregate capacity used
Settled Data GrowthAggregate Capacity
Operational Sweet Spot Corridor
Aggregate Capacity Used 0ndash70 70ndash85 gt 90
Provisioning New Storage Y
Extending Already Provisioned Storage
Relax UtilizationmdashNetApp Data Motion a Nomad
Y Y
Y
N N N
You can achieve a very high data consolidation in this setting by using NetApp storage controllers The served amount of logical data exceeds the physical usable capacity by factors
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
43 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
6 STORAGE EFFICIENCY COOKBOOK To increase consolidation we propose the following steps to exploit the advantages of NetApp storage efficiency technologies
Figure 25) Typical picture of aggregate capacity metrics while turning to zero fat configurations and dedupe
Elapsed Time
Capacity
1 Month 3 Months
Committed Capacity
Overall Trend
Last 3-Month Trend
Capacity Used
1 2 3
As a general rule we donrsquot introduce artificially limited container types They increase monitoring effort and might prevent pooling unused space For an existing landscape proceed as follows
1 Install and configure Operations Manager the earlier the better From day one Operations Manager collects data The more information it collects the better are the predictions and trending The diagrams provided by Operations Manager give a good idea of growths rates and their steadiness Make sure all NetApp storage controllers are monitored Wait for one month Define which mitigation alternatives your operational team is comfortable with Check the boxes accompanying the provided list and identify the time your team needs to perform the actions If you can perform online migrations for nomads define the time to negotiate and approve the migration For all other data define the time to the next planned downtime window
2 Change all volumes to zero fat configuration with the autogrow feature set to on Since there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow During this period the capacity used diminishes as shown in Figure 25 Usually each change in the volume configuration can be detected So far only metadata has changed and unused space in the volumes is now available from a common shared pool The aggregated free space is available for the same applications storing the data We recommend monitoring for three months to understand the growth rate of your environment
3 Derive the growth trend of the aggregates Note that the overall trend might still be negative Use Operations Manager to help determine the trend Make sure that it excludes the time frame when changing the volume configuration to zero fat and that it includes relevant operations of your applications such as month- and year-end closing of business applications or regular software maintenance updates (for example in virtualized environments)
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
44 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Work backward to determine the thresholds of the phases a Define the aggregate use at a level where your operational team is comfortable At first do not
exceed 80 Add an attention area (yellow) depending on the mitigation alternatives and their time to show effect
b Determine the maximum distance between the planned downtimes or the time to perform the intended mitigation alternatives
c Determine the growth rate Operations Manager provides help in determining the trend of data growth
d Determine the minimum space required to comfortably allow organic growth in the period between agreed planned downtimes of the services provided Operations Manager helps you to understand the growth rate of the past
To provision storage following these steps
1 Create big aggregates to enable shared storage in your data center We recommend to size in such a way that the aggregate can be extended once for eventual aggregate mitigation Very few situations exist where a silo-centric approach with dedicated aggregates for applications makes sense Free space and performance in an aggregate can be shared Few big aggregates reduce the monitoring effort Also build aggregates in a limited number of standardized configurations and sizes
2 Create volumes in zero fat configuration with autogrow feature set to on Because there is no artificial space limitation for the autogrow volume monitoring is restricted to aggregate monitoring When using deduplication set the volume to autogrow Whenever possible use Provisioning Manager for convenience and for repeating configurations a Classify your data and provision for flexibility Give NFS a preference and make use of vFiler
entities b Turn on deduplication Even in situations where deduplication rates are expected to be low there
is sometimes a big surprise If you prefer to try deduplication on the storage controller then create a clone of the intended volume and deduplicate it to estimate the effect Use Performance Advisor to identify a repeating time frame of low activity to schedule the deduplication job Also use deduplication scheduled by change rate Mind the maximum sizes depending on the storage controller
c Initially size volumes to the expected size of the data you are going to store Thus the aggregate over-commitment metric in Operations Manager represents the data consolidation more precisely
d Trim existing volumes provisioned in fulllow fat to zero fat configuration Use the following commands of the console of the storage controller to configure zero fat without Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt off
Use the following command sequence to configure zero fat with Snapshot autodelete for NAS environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
45 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
Use the following commands to configure zero fat without Snapshot autodelete for SAN environments
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt off lun set reservation ltlungt disable
Use the following command sequence to configure zero fat for SAN environments with autodelete set to on
vol options ltvolumegt guarantee none vol options ltvolumegt try_first volume_grow vol autosize ltvolumegt -m ltmaximum sizegt -i ltincrement sizegt on snap reserve -V ltvolumegt 0 snap autodelete ltvolumegt trigger volume snap autodelete ltvolumegt delete_order oldest_first snap autodelete ltvolumegt on lun set reservation ltlungt disable
e Identify storage of inactive data Storage keeping inactive data is most often perfectly suited to act as nomad candidates that could be migrated
f Identify storage that is close to deprovisioning Deprovisioning of storage relaxes use and can act as a mitigation alternative
g Turn already provisioned volumes in zero fat configuration 3 Let Operations Manager monitor the landscape Use reported aggregate daily growth rates and days
to full trending reported by Operations Manager to adapt the thresholds Remember that days to full trending reports against 100 capacity used of aggregate
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
46 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
7 REFERENCES bull TR-3505 ldquoNetApp Deduplication for FAS and V-Series Deployment and Implementation Guiderdquo
wwwnetappcomuslibrarytechnical-reportstr-3505html bull TR-3563 ldquoNetApp Thin Provisioning Improving Storage Utilization and Reducing TCOrdquo
wwwnetappcomuslibrarytechnical-reportstr-3563html bull TR-3710 ldquoOperations Manager Provisioning Manager and Protection Manager Best Practices
Guiderdquo wwwnetappcomuslibrarytechnical-reportstr-3710html
bull TR-3786 ldquoA Thorough Introduction to 64-Bit Aggregatesrdquo wwwnetappcomuslibrarytechnical-reportstr-3786html
bull TR-3814 ldquoNetApp Data Motionrdquo wwwnetappcomuslibrarytechnical-reportstr-3814html
bull TR-3827 ldquoIf Yoursquore Doing This Then Your Storage Could Be Underutilizedrdquo wwwnetappcomuslibrarytechnical-reportstr-3827html
bull TR-3881 rdquoDataMotion For Volumes For Enterprise Applicationsrdquo httpwwwnetappcomuslibrarytechnical-reportstr-3881html
bull NetApp Operations Manager Efficiency Dashboard Installation and User Guide httpnownetappcomNOWdownloadtoolsomsed_pluginInstallUserGuidepdf
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-
47 Storage Efficiency Every Day How to Achieve and Manage Best-in-Class Storage Use
8 ACKNOWLEDGMENTS This report was developed in concert with the Field Centers for Innovation and covers field best practices and product group expertise It would not have been possible without the input of many experts Significant contributions were made by Matthew Agoni Carlos Alvarez Jeff Berks Manfred Buchmann Hans Deuerlein Erik Dybwad Niels Reker Oliver Dziuba Larry Freeman Gary Garcia Pretoom Goswami Naveen Harsani George John Nigel Maddock Andreas Martinovsky Holger Niermann Cesar Orosco Christian Ott Shiva Raja Michael Reusch Maurice Skubski John Tyrrell Oliver Walsdorf and Allen Wang
NetApp provides no representations or warranties regarding the accuracy reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customerrsquos responsibility and depends on the customerrsquos ability to evaluate and integrate them into the customerrsquos operational environment This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document
copy Copyright 2010 NetApp Inc All rights reserved No portions of this document may be reproduced without prior written consent of NetApp Inc Specifications are subject to change without notice NetApp the NetApp logo Go further faster Data ONTAP FlexClone FlexVol MultiStore RAID-DP SnapDrive SnapMirror Snapshot SyncMirror and vFiler are trademarks or registered trademarks of NetApp Inc in the United States andor other countries Windows is a registered trademark of Microsoft Corporation Oracle is a registered trademark of Oracle Corporation VMware is a registered trademark and VMotion is a trademark of VMware Inc All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such RA-0007-1010
- EXECUTIVE SUMMARY
- INTRODUCTION
-
- 21 TERMINOLOGY
- 22 GOAL OF THIS DOCUMENT
- 23 AUDIENCE
- 24 SCENARIO
- 25 NETAPP TECHNOLOGIES FOR STORAGE EFFICIENCY AND FLEXIBILITY
-
- PROVISIONING
-
- 31 PROVISIONING FROM SCRATCH FULL FAT TO ZERO FAT PROVISIONING
- 32 PROVISIONING FROM TEMPLATES VOLUME AND DEDUPE-CENTRIC LAYOUTS
- 33 SETTLEDNOMAD PROVISIONING FOR NETAPP DATA MOTION
-
- OPERATION
-
- 41 PHASES AND TRANSITIONS
- 42 MONITORING
- 43 NOTIFICATION
- 44 MITIGATE STORAGE USE
-
- REAL-LIFE SETTINGS
-
- 51 SAMPLE SETTING 1 REAL-LIFE SETTING
- 52 SAMPLE SETTING 2 SETTLEDNOMAD
-
- STORAGE EFFICIENCY COOKBOOK
- REFERENCES
- ACKNOWLEDGMENTS
-