EMC VPLEX Metro Witness Technology and High Availability · 37 VPLEX Witness volume types and rule...

EMC VPLEX Metro Witness Technology and High Availability

Version 2.0

• EMC VPLEX Witness

• VPLEX Metro High Availability

• Metro HA Deployment Scenarios

Jennifer AspesiOliver Shorey

EMC VPLEX Metro Witness Technology and High Availability2

Copyright © 2010, 2011 EMC Corporation. All rights reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date regulatory document for your product line, go to the Technical Documentation and Advisories section on EMC Powerlink.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.

All other trademarks used herein are the property of their respective owners.

Part number H7113.2

Contents

Preface

Chapter 1 VPLEX Family and Use Case OverviewIntroduction ....................................................................................... 16VPLEX value overview .................................................................... 17VPLEX product offerings ................................................................ 21

VPLEX Local, VPLEX Metro, VPLEX Geo ..............................21Architecture highlights ..............................................................23

Metro High Availability design considerations............................ 27Planned application mobility compared with disaster restart ...........................................................................................28

Chapter 2 Hardware and SoftwareIntroduction ....................................................................................... 32

VPLEX I/O ..................................................................................32High-level VPLEX I/O discussion ...........................................32Distributed coherent cache........................................................33VPLEX family clustering architecture ....................................33VPLEX single, dual, quad..........................................................35VPLEX sizing tool.......................................................................36Upgrade paths.............................................................................36Hardware upgrades ...................................................................36Software upgrades......................................................................37

VPLEX management interfaces ...................................................... 38Web-based GUI...........................................................................38VPLEX CLI...................................................................................38SNMP support for performance statistics...............................39LDAP /AD support ...................................................................39

EMC VPLEX Metro Witness Technology and High Availability 3

Contents

VPLEX Element Manager API.................................................. 39Simplified storage management..................................................... 41Management server user accounts................................................. 42Management server software.......................................................... 43

Management console ................................................................. 43Command line interface ............................................................ 45System reporting......................................................................... 46

Director software .............................................................................. 47Configuration overview................................................................... 48

Small configurations .................................................................. 48Medium configurations ............................................................. 49Large configurations .................................................................. 50

I/O implementation ......................................................................... 52Cache coherence ......................................................................... 52Meta-directory ............................................................................ 52How a read is handled............................................................... 52How a write is handled ............................................................. 53

Chapter 3 System and Component IntegrityOverview............................................................................................ 56Cluster ................................................................................................ 57Path redundancy through different ports ..................................... 58Path redundancy through different directors............................... 59Path redundancy through different engines................................. 60Path redundancy through site distribution .................................. 61Safety check ....................................................................................... 62

Chapter 4 Foundations of VPLEX High AvailabilityFoundations of VPLEX High Availability .................................... 64Failure handling without VPLEX Witness (Static bias)............... 72

Chapter 5 Introduction to VPLEX WitnessVPLEX Witness overview and architecture .................................. 84VPLEX Witness target solution, rules, and best practice ............ 87VPLEX Witness failure semantics................................................... 89CLI example outputs........................................................................ 95

VPLEX Witness – The importance of the third failure domain ....................................................................................... 102


Contents

Chapter 6 Combining VPLEX High Availability and VPLEX WitnessMetro HA overview........................................................................ 106VPLEX Metro HA with Cross-Cluster Connect.......................... 107VPLEX Metro HA without Cross-Cluster Connect.................... 116

Chapter 7 ConclusionConclusion........................................................................................ 124

Better protection from storage-related failures ....................125Protection from a larger array of possible failures...............125Greater overall resource utilization........................................126

Glossary

5EMC VPLEX Metro Witness Technology and High Availability

Contents


Title Page

Figures

1 Application and data mobility example ..................................................... 182 HA infrastructure example ........................................................................... 193 Distributed data collaboration example ..................................................... 204 VPLEX offerings ............................................................................................. 225 Architecture highlights.................................................................................. 246 VPLEX cluster example ................................................................................. 347 VPLEX Management Console ...................................................................... 448 Management Console welcome screen ....................................................... 459 VPLEX small configuration .......................................................................... 4910 VPLEX medium configuration..................................................................... 5011 VPLEX large configuration ........................................................................... 5112 Port redundancy............................................................................................. 5813 Director redundancy...................................................................................... 5914 Engine redundancy........................................................................................ 6015 Site redundancy.............................................................................................. 6116 High level functional sites in communicaton............................................. 6417 High level Site A failure ................................................................................ 6518 High level Inter-site link failure ................................................................... 6519 VPLEX active and functional between two sites ....................................... 6620 VPLEX concept diagram with failure at Site A.......................................... 6721 Correct resolution after volume failure at Site A....................................... 6822 VPLEX active and functional between two sites ....................................... 6923 Inter-site link failure and cluster partition ................................................. 7024 Correct handling of cluster partition........................................................... 7125 VPLEX static detach rule............................................................................... 7326 Typical detach rule setup .............................................................................. 7427 Non-preferred site failure ............................................................................. 7528 Volume remains active at Cluster 1............................................................. 7629 Typical detach rule setup before link failure ............................................. 7730 Inter-site link failure and cluster partition ................................................. 78


Figures

31 Suspension after inter-site link failure and cluster partition ................... 7932 Cluster 2 has bias............................................................................................ 8033 Preferred site failure causes full Data Unavailability ............................... 8134 High Level VPLEX Witness architecture.................................................... 8535 High Level VPLEX Witness deployment .................................................. 8636 Supported VPLEX versions for VPLEX Witness ....................................... 8837 VPLEX Witness volume types and rule support....................................... 8838 Typical VPLEX Witness configuration ....................................................... 8939 VPLEX Witness and an inter-cluster link failure....................................... 9040 VPLEX Witness and static bias after cluster partition .............................. 9141 VPLEX Witness typical configuration for Cluster 2 detaches ................. 9242 VPLEX Witness diagram showing Cluster 2 failure ................................. 9343 VPLEX Witness with static bias override ................................................... 9444 Possible dual failure cluster isolation scenarios ...................................... 10145 Highly unlikely dual failure scenarios that require manual

intervention..................................................................................................... 10246 Two further dual failure scenarios that would require manual

intervention..................................................................................................... 10347 High-level diagram of a Metro HA Cross-Cluster Connect solution

for VMware ..................................................................................................... 10748 Metro HA Cross-Cluster Connect diagram with failure domains ....... 10949 Metro HA Cross-Cluster Connect diagram with disaster in zone A1.. 11050 Metro HA Cross-Cluster Connect diagram with failure in zone A2.... 11151 Metro HA Cross-Cluster Connect diagram with failure in zone A3

or B3 ................................................................................................................. 11252 Metro HA Cross-Cluster Connect diagram with failure in zone C1 .... 11353 Metro HA Cross-Cluster Connect diagram with intersite link

failure .............................................................................................................. 11554 Metro HA Standard High-level diagram ................................................. 11655 Metro HA high-level diagram with fault domains ................................. 11756 Metro HA high-level diagram with failure in domain A2..................... 11957 Metro HA high-level diagram with intersite failure.............................. 121


Title Page

Tables

1 Overview of VPLEX features and benefits .................................................. 252 Configurations at a glance ............................................................................. 353 Management server user accounts ............................................................... 424 Output from ls for brief VPLEX Witness status.......................................... 975 Output from ll command for brief VPLEX Witness component

status ..................................................................................................................98


Tables


Preface

This EMC Engineering TechBook describes and provides an insightful discussion on how implementation of VPLEX will lead to a higher level of availability.

As part of an effort to improve and enhance the performance and capabilities of its product lines, EMC periodically releases revisions of its hardware and software. Therefore, some functions described in this document may not be supported by all versions of the software or hardware currently in use. For the most up-to-date information on product features, refer to your product release notes. If a product does not function properly or does not function as described in this document, please contact your EMC representative.

Audience This document is part of the EMC VPLEX family documentation set, and is intended for use by storage and system administrators.

Readers of this document are expected to be familiar with the following topics:

◆ Storage Area Networks

◆ Storage Virtualization Technologies

◆ EMC Symmetrix and CLARiiON Products

Relateddocumentation

Related documents include:

◆ EMC VPLEX Architecture Guide

◆ EMC VPLEX Installation and Setup Guide

◆ EMC VPLEX Site Preparation Guide

◆ Implementation and Planning Best Practices for EMC VPLEX Technical Notes


12

Preface

◆ Using VMware Virtualization Platforms with EMC VPLEX - Best Practices Planning

◆ VMware KB: Using VPLEX Metro with VMware HA

This document is divided into the following chapters:

◆ Chapter 1, “VPLEX Family and Use Case Overview,” summarizes the VPLEX family. It also covers some of the key features of the VPLEX family system, architecture and use cases.

◆ Chapter 2, “Hardware and Software,” summarizes hardware, software, and network components of the VPLEX system. It also highlights the software interfaces that can be used by an administrator to manage all aspects of a VPLEX system.

◆ Chapter 3, “System and Component Integrity,” summarizes how VPLEX clusters are able to handle hardware failures in any subsystem within the storage cluster.

◆ Chapter 4, “Foundations of VPLEX High Availability,” summarizes the concepts of the industry-wide dilemma of building absolute HA environments and how VPLEX Metro functionality manually accepts the historical challenge.

◆ Chapter 5, “Introduction to VPLEX Witness,” explains how VPLEX functionality can provide the absolute HA capability, by introducing a “Witness” to the inter-cluster environment.

◆ Chapter 6, “Combining VPLEX High Availability and VPLEX Witness,” provides a tactical approach to identifying how application features by example, using VMware combined with VPLEX and a Witness, create “absolute” HA integrity.

◆ Chapter 7, “Conclusion,” provides a summary of benefits using VPLEX technology as related to VPLEX Witness and High Availability.

Authors This TechBook was authored by the following individuals from the Enterprise Storage Division, VPLEX Business Unit based at EMC Headquarters, Hopkinton, MA.

Jennifer Aspesi has over 10 years of work experience with EMC in Storage Area Networks (SAN), Wide Area Networks (WAN), and Network and Storage Security technologies. Jen currently manages the Corporate Systems Engineer team for the VPLEX Business Unit. She earned her M.S. in Marketing and Technological Innovation from Worcester Polytech Institute, Massachusetts.


Preface

Oliver Shorey has over 11 years of working within the Business Continuity arena, seven of which have been with EMC engineering, designing and documenting high-end replication and geographically-dispersed clustering technologies. He is currently a Principal Corporate Systems Engineer in the VPLEX Business Unit.

Additionalcontributors

Additional contributors to this book include:

Colin Durocher has 8 years of experience in developing software for the EMC VPLEX product as its predecessor and current state, testing it, and helping customers implement it. He is currently working on the product management team for the VPLEX business unit. He has a B.S. in Computer Engineering from the University of Alberta and is currently pursuing an MBA from the John Molson School of Business.

Gene Ortenberg has more than 15 years of experience in building fault-tolerant distributed systems and applications. For the past 8 years he has been designing and developing highly-available storage virtualization solutions at EMC. He currently holds a position of a Software Architect for the VPLEX Business Unit under the EMC Enterprise Storage Division.

Fernanda Torres has over 10 years of Marketing experience in the Consumer Products industry, most recently in consumer electronics. Fernanda is the Product Marketing Manager for VPLEX under the EMC Enterprise Storage Division. She has undergraduate degree from the University of Notre Dame and a bilingual degree (English/Spanish) from IESE in Barcelona, Spain.

Typographicalconventions

EMC uses the following type style conventions in this document:

Normal Used in running (nonprocedural) text for:• Names of interface elements (such as names of windows, dialog

boxes, buttons, fields, and menus)• Names of resources, attributes, pools, Boolean expressions,

buttons, DQL statements, keywords, clauses, environment variables, functions, utilities

• URLs, pathnames, filenames, directory names, computer names, filenames, links, groups, service keys, file systems, notifications

Bold Used in running (nonprocedural) text for:• Names of commands, daemons, options, programs, processes,

services, applications, utilities, kernels, notifications, system calls, man pages


14

Preface

We'd like to hear from you!

Your feedback on our TechBooks is important to us! We want our books to be as helpful and relevant as possible, so please feel free to send us your comments, opinions and thoughts on this or any other TechBook:

[email protected]

Bold (cont.) Used in procedures for:• Names of interface elements (such as names of windows, dialog

boxes, buttons, fields, and menus)• What user specifically selects, clicks, presses, or types

Italic Used in all text (including procedures) for:• Full titles of publications referenced in text• Emphasis (for example a new term)• Variables

Courier Used for:• System output, such as an error message or script • URLs, complete paths, filenames, prompts, and syntax when

shown outside of running text

Courier bold Used for:• Specific user input (such as commands)

Courier italic Used in procedures for:• Variables on command line• User input variables

< > Angle brackets enclose parameter or variable values supplied by the user

[ ] Square brackets enclose optional values

| Vertical bar indicates alternate selections - the bar means “or”

{ } Braces indicate content that you must specify (that is, x or y or z)

... Ellipses indicate nonessential information omitted from the example


[email protected]

1

This chapter provides a brief summary of the main use cases for the EMC VPLEX family and design considerations for High Availability. It also covers some of the key features of the VPLEX family system. Topics include:

◆ Introduction ........................................................................................ 16◆ VPLEX value overview ..................................................................... 17◆ VPLEX product offerings ................................................................. 21◆ Metro High Availability design considerations............................. 27

VPLEX Family and UseCase Overview

VPLEX Family and Use Case Overview 15

16

VPLEX Family and Use Case Overview

IntroductionThe purpose of this TechBook is to introduce EMC® VPLEX™ High Availability and the VPLEX Witness as it is conceptually architectured typically by customer storage administrators and EMC Solutions Architects. The introduction of VPLEX Witness provides customers with “absolute” physical and logical fabric and cache coherent redundancy as it is properly designed in the VPLEX Metro environment.

This guide is designed to provide an overview of the features and functionality associated with the VPLEX Metro configuration and the importance of Active/Active data resiliency for today’s advanced host applications.



VPLEX value overviewAt the highest level, VPLEX has unique capabilities that storage administrators value and are seeking to enhance their existing data centers. It delivers distributed, dynamic and smart functionality into existing or new data centers to provide storage virtualization across Geographical boundaries.

◆ VPLEX is distributed, because it is a single interface for multi-vendor storage and it delivers dynamic data mobility, which is being able to move applications and data in real-time, with no outage required.

◆ VPLEX is dynamic, because is provides data availability and flexibility as well as maintaining business through failures traditionally requiring outages of manual restore procedures.

◆ VPLEX is smart, because its unique AccessAnywhere technology can present and keep the same data consistent within and between sites and enable distributed data collaboration.

Because of these capabilities, VPLEX delivers unique and differentiated value to address three distinct requirements within our target customers’ IT environments:

◆ The ability to dynamically move applications and data across different compute and storage installations, be they within the same data center, across a campus, within a Geographical region – and now, with VPLEX Geo, across even greater distances.

◆ The ability to create high-availability storage and a compute infrastructure across these same varied Geographies with unmatched resiliency.

◆ The ability to provide efficient real-time data collaboration over distance for such “big data” applications as video, Geographic /oceanographic research, and more.

EMCVPLEX technology is a scalable, distributed-storage federation solution that provides non-disruptive, heterogeneous data movement and volume management functionality.

Insert VPLEX technology between hosts and storage in a storage area network (SAN) and data can be extended over distance within, between, and across data centers.

VPLEX value overview 17

18


The VPLEX architecture provides a highly available solution suitable for many deployment strategies including:

◆ Application and Data Mobility — The movement of virtual machines (VM) without downtime. An example is shown in Figure 1.

• Storage administrators have the ability to automatically balance loads through VPLEX, using storage and compute resources from either cluster’s location. When combined with server virtualization, VPLEX allows users to transparently move and relocate Virtual Machines and their corresponding applications and data over distance. This provides a unique capability allowing users to relocate, share and balance infrastructure resources between sites, which can be within a campus or between data centers, up to 10 ms apart with VPLEX Metro, or further apart (50ms RTT) across asynchronous distances with VPLEX Geo.

Figure 1 Application and data mobility example

◆ HA Infrastructure — Reduces recovery time objective (RTO). An example is shown in Figure 2.

• High Availability is a term that several products will claim they can deliver. Ultimately, a High Availability solution is supposed to protect against a failure and keep an application online. Storage administrators plan around HA to provide near continuous uptime for their critical applications, and



automate the restart of an application once a failure has occurred, with as little human intervention as possible. With conventional solutions, customers typically have to choose a Recovery Point Objective and a Recovery Time Objective. But even while some solutions offer small RTOs and RPOs, there can still be downtime, and for most customers, any downtime at all can be costly.

Figure 2 HA infrastructure example

◆ Distributed Data Collaboration — Increases utilization of passive data recovery (DR) assets and provides simultaneous access to data. An example is shown in Figure 3 on page 20.

• This is when a workforce has multiple users at different sites that need to work on the same data, and maintain consistency in the dataset when changes are made. Use cases include co-development of software where the development happens across different teams from separate locations, and collaborative workflows such as engineering, graphic arts, videos, educational programs, designs, research reports, and so forth.

• When customers have tried to build collaboration across distance with the traditional solutions, they normally have to save the entire file at one location and then send it to another site using FTP. This is slow, can incur heavy bandwidth costs

VPLEX value overview 19

20


for large files, or even small files that move regularly, and negatively impacts productivity because the other sites can sit idle while they wait to receive the latest data from another site. If teams decide to do their own work independent of each other, then the dataset quickly becomes inconsistent, as multiple people are working on it at the same time and are unaware of each other’s most recent changes. Bringing all of the changes together in the end is time-consuming, costly, and grows more complicated as the data-set gets larger.

Figure 3 Distributed data collaboration example



VPLEX product offerings VPLEX first meets high-availability and data mobility requirements and then scales up to the I/O throughput required for the front-end applications and back-end storage.

High-availability and data mobility features are characteristics of VPLEX Local, VPLEX Metro, and VPLEX Geo.

A VPLEX cluster consists of one, two, or four engines (each containing two directors), and a management server. A dual-engine or quad-engine cluster also contains a pair of Fibre Channel switches for communication between directors.

Each engine is protected by a standby power supply (SPS), and each Fibre Channel switch gets its power through an uninterruptible power supply (UPS). (In a dual-engine or quad-engine cluster, the management server also gets power from a UPS.)

The management server has a public Ethernet port, which provides cluster management services when connected to the customer network.

VPLEX Local, VPLEX Metro, VPLEX Geo

EMC offers VPLEX in three configurations to address customer needs for high-availability and data mobility:

◆ VPLEX Local

◆ VPLEX Metro

◆ VPLEX Geo

Figure 4 on page 22 provides an example of each.

VPLEX product offerings 21

22


Figure 4 VPLEX offerings

VPLEX Local VPLEX Local provides seamless, non-disruptive data mobility and ability to manage multiple heterogeneous arrays from a single interface within a data center.

The VPLEX Local allows increased availability, simplified management, and improved utilization across multiple arrays.

VPLEX Metro with AccessAnywhere VPLEX Metro with AccessAnywhere enables active-active, block level access to data between two sites within synchronous distances. The distance is limited as to what Synchronous behavior can withstand as well as consideration to host application stability and MAN traffic. It is recommended that depending on the application that consideration for Metro be less than or equal to 5ms1 RTT.

The combination of virtual storage with VPLEX Metro and virtual servers enables the transparent movement of virtual machines and storage across a distance.This technology provides improved utilization across heterogeneous arrays and multiple sites.

1. Refer to VPLEX and vendor-specific White Papers for confirmation of latency limitations.



VPLEX Geo with AccessAnywhereVPLEX Geo with AccessAnywhere enables active-active, block level access to data between two sites within asynchronous distances. VPLEX Geo enables better cost-effective use of resources and power. Geo provides the same distributed device flexibility as Metro but extends the distance up to and within 50ms RTT. As with any Asynchronous transport media, bandwidth is also important to consider for optimal behavior as well as application sharing on the link.

For the purpose of this TechBook, the focus on technologies is based on Metro configuration only. VPLEX Witness is supported with VPLEX Geo however beyond the scope of this TechBook.

Architecture highlights

VPLEX support is open and heterogeneous, supporting both EMC storage and common arrays from other storage vendors, such as HDS, HP, and IBM. VPLEX conforms to established world wide naming (WWN) guidelines that can be used for zoning.

VPLEX supports operating systems including both physical and virtual server environments with VMware ESX and Microsoft Hyper-V. VPLEX supports network fabrics from Brocade and Cisco including legacy McData SANs.

An example of the architecture is shown in Figure 5 on page 24.


24


Figure 5 Architecture highlights



Table 1 lists an overview of VPLEX features along with the benefits.

For all VPLEX products, the appliance-based VPLEX technology:

◆ Presents storage area network (SAN) volumes from back-end arrays to VPLEX engines

◆ Packages the SAN volumes into sets of VPLEX virtual volumes with user-defined configuration and protection levels

◆ Presents virtual volumes to production hosts in the SAN via the VPLEX front-end

◆ For VPLEX Metro and VPLEX Geo products, presents a global, block-level directory for distributed cache and I/O between VPLEX clusters.

Location and distance determine high-availability and data mobility requirements. For example, if all storage arrays are in a single data center, a VPLEX Local product federates back-end storage arrays within the data center.

When back-end storage arrays span two data centers, the AccessAnywhere feature in a VPLEX Metro or a VPLEX Geo product federates storage in an active-active configuration between VPLEX clusters. Choosing between VPLEX Metro or VPLEX Geo depends on distance and data synchronicity requirements.

Table 1 Overview of VPLEX features and benefits

Features Benefits

Mobility Move data and applications without impact on users.

Resiliency Mirror across arrays without host impact, and increase high availability for critical applications.

Distributed cache coherency Automate sharing, balancing, and failover of I/O across the cluster and between clusters.

Advanced data caching Improve I/O performance and reduce storage array contention.

Virtual Storage federation Achieve transparent mobility and access in a data center and between data centers.

Scale-out cluster architecture Start small and grow larger with predictable service levels.


26


Application and back-end storage I/O throughput determine the number of engines in each VPLEX cluster. High-availability features within the VPLEX cluster allow for non-disruptive software upgrades and expansion as I/O throughput increases.



Metro High Availability design considerationsVPLEX Metro 5.0 introduces High Availability concepts beyond what is traditionally known as physical high availability. To design the high availability environment, introduction of the “Witness” prevents failures and asserts the activity between clusters in a multi-site architecture. EMC VPLEX is the first product to bring to market the features and functionality provided by VPLEX Witness.

Through this TechBook, Storage Administrators and customers gain an easy to understand overview on the high availability solution that provides them:

◆ Automatic load balancing between their data centers

◆ Active/Active use of both of their data centers

◆ High availability for their applications (no single points of storage failure, auto-restart)

◆ Fully automatic failure handling

◆ Better resource utilization

◆ Lower CapEx and lower OpEx as a result

Broadly speaking when one considers legacy environments we typically see “highly” available designs implemented within a data center, and Disaster Recovery type functionality deployed between data centers.

One of the main reasons for this is that within data centers components generally operate in an Active/Active (or Active/Passive with automatic failover) whereas between data centers legacy replication technologies use active passive techniques which require manual failover to use the passive component.

When using VPLEX Metro Active/Active replication technology in conjunction with new features such as Witness server (as described in “Introduction to VPLEX Witness” on page 83,) the lines between local High Availability and long distance Disaster Recovery are somewhat blurred since HA can be stretched beyond the data center walls. Since “replication” is a by-product of federated and distributed storage disaster avoidance, is also achievable within these geographically dispersed HA environments.

Metro High Availability design considerations 27

28


Planned application mobility compared with disaster restart This section compares planned application mobility and disaster restart.

Planned applicationmobility

Conceptually, a planned event wherein an application can be moved fully online (without disruption) from one location to another (be it the same or remote data center) but critically this can only be performed when all components that participate in this movement are available and the running state of the application exists in volatile memory.

An example of this online application mobility would be VMware vMotion where a virtual machine would need to be fully operational before it can be moved. It may sound obvious but if the VM was offline then movement could not be performed on line (This is important to understand and is the key difference over application restart).

When vMotion is executed all live components that are required to make the VM function are copied elsewhere in the background before cutting the VM over.

Since these types of mobility tasks are totally seamless to the user some of the use cases associated are for disaster avoidance where an application or VM can be moved ahead of a disaster (such as, Hurricane, Tsunami, etc.) as the running state is available to be copied, or in other cases it can be used to enable the ability to load balance across multiple systems or even data centers.

Due to the need for the running state to be available for these types of relocations these movements are always deemed planned activities.

Disaster restart Disaster restart is where an application or service is re-started in another location after a failure (be it on a different server or data center) and will typically interrupt the service/application during the failover.

A good example of this technology would be a VMware HA Cluster configured over two geographically dispersed sites using VPLEX Metro where a cluster will be formed over a number of ESX servers and either single or multiple virtual machines can run on any of the ESX servers within the cluster.



If for some reason an active ESX server were to fail (perhaps due to site failure) then the VM can be re-started on a remaining ESX server within the cluster at the remote site as the datastore where it was running spans the two locations since it is configured on a VPLEX Metro distributed volume. This would be deemed an unplanned failover which will incur a small outage of the application since the running state of the VM was lost when the ESX server failed meaning the service will be unavailable until the VM has restarted elsewhere.

Although comparing a planned application mobility event to an unplanned disaster restart will result in the same outcome (i.e. a service relocating elsewhere) we can now see that there is a big difference since the planned mobility job keeps the application online during the relocation whereas the disaster restart will result in the application being offline during the relocation as a restart is conducted.

A pre-requisite for a geographical cluster to perform disaster restart would be an Active/Active underlying replication solution (VPLEX Metro only at this publication). When using legacy Active/Passive type solutions in these scenarios would also typically require an extra step over and above standard application failover since a storage failover would also be required. This is where VPLEX can assist greatly since it is active/active therefore in most cases no manual intervention at the storage layer is required.The value of VPLEX Witness and application of following physically high available and redundant hardware connectivity best practices will truly provide customers with “Absolute” availability!

Metro High Availability design considerations 29

30



2

This chapter provides insight into the hardware and software interfaces that can be used by an administrator to manage all aspects of a VPLEX system. In addition, a brief overview of the internal system software is included. Topics include:

◆ Introduction ........................................................................................ 32◆ VPLEX management interfaces........................................................ 38◆ Simplified storage management ...................................................... 41◆ Management server user accounts .................................................. 42◆ Management server software ........................................................... 43◆ Director software................................................................................ 47◆ Configuration overview.................................................................... 48◆ I/O implementation .......................................................................... 52

Hardware and Software

Hardware and Software 31

32


IntroductionThis section provides basic information on the following:

◆ “VPLEX I/O” on page 32

◆ “High-level VPLEX I/O discussion” on page 32

◆ “Distributed coherent cache” on page 33

◆ “VPLEX family clustering architecture ” on page 33

VPLEX I/O

VPLEX is built on a lightweight protocol that maintains cache coherency for storage I/O and the VPLEX cluster provides highly available memory cache, processing power, front-end, and back-end Fibre Channel interfaces.

EMC hardware powers the VPLEX cluster design so that all devices are always available and I/O that enters the cluster from anywhere can be serviced by any node within the cluster.

The AccessAnywhere feature in the VPLEX Metro and VPLEX Geo products extends the cache coherency between data centers at a distance.

High-level VPLEX I/O discussion

VPLEX abstracts a block-level ownership model into a high level directory that is updated for every I/O and shared across all engines. The directory uses a small amount of metadata and tells all other engines in the cluster, in 4k block transmissions, which block of data is owned by which engine and at what time.

After a write completes and ownership is reflected in the directory, VPLEX dynamically manages read requests for the completed write in the most efficient way possible.

When a read request arrives, VPLEX checks the directory for an owner. After VPLEX locates the owner, the read request goes directly to that engine.

On reads from other engines, VPLEX checks the directory and tries to pull the read I/O directly from the engine cache to avoid going to the physical arrays to satisfy the read.



This model enables VPLEX to stretch the cluster as VPLEX distributes the directory between clusters and sites. VPLEX is efficient with minimal overhead and enables I/O communication over distance.

Distributed coherent cache

The VPLEX engine includes two directors that have a total of 26 GB of local cache. Cache pages are keyed by volume and go through a lifecycle from staging, to visible, to draining.

The global cache is a combination of all director caches that spans all clusters. The cache page holder information is maintained in in-memory data structure called a directory.

The directory is divided into chunks and distributed among the VPLEX directors and locality controls where ownership is maintained.

A meta-directory identifies which director owns which directory chunks within the global directory.

VPLEX family clustering architecture The VPLEX family uses a unique clustering architecture to help customers break the boundaries of the data center and allow servers at multiple data centers to have read/write access to shared block storage devices. A VPLEX cluster, as shown in Figure 6 on page 34, can scale up through the addition of more engines, and scale out by connecting clusters into an EMC VPLEX Metro-Plex™ (two VPLEX Metro clusters connected within Metro distances).

Introduction 33

34


Figure 6 VPLEX cluster example

VPLEX Metro transparently moves and shares workloads for a variety of applications, VMs, databases and cluster file systems. VPLEX Metro consolidates data centers, and optimizes resource utilization across data centers. In addition, it provides non-disruptive data mobility, heterogeneous storage management, and improved application availability. VPLEX Metro supports up to two clusters, which can be in the same data center, or at two different sites within synchronous environments. Also, introduced with these solutions architected by this TechBook, Geo cluster across distances achieves the asynchronous partner to Metro. It is out of the scope of this document to analyze Geo capabilities with VPLEX Witness.



VPLEX single, dual, quadThe VPLEX cluster supports 16000 storage volumes and 16000 virtual volumes with UI responsiveness under 10 seconds for common operations. The VPLEX cluster also supports 2000 initiators per cluster.

The VPLEX engine provides cache and processing power with redundant directors that each include two I/O modules per director and one optional WAN COM I/O module for use in VPLEX Metro and VPLEX Geo configurations.

The rackable hardware components are shipped in NEMA standard racks or provided, as an option, as a field rackable product. Table 2 provides a list of configurations.

Single-engine VPLEX◆ Two directors

◆ 32 Fibre Channel ports

◆ 64 GB cache

◆ I/O throughput characteristics

Table 2 Configurations at a glance

Single engine Dual engine Quad engine

Directors 2 4 8

Redundant Engine SPSs Yes Yes Yes

FE Fibre Channel ports 8 16 32

BE Fibre Channel ports 8 16 32

Cache size 72 GB 144 GB 288 GB

Management Servers 1 1 1

Internal Fibre Channel switches (Local Comm)

None 2 2

Uninterruptable Power Supplies (UPSs)

None 2 2

Introduction 35

36


Dual-engine VPLEX◆ Four directors


◆ 128 GB cache


Quad-engine VPLEX◆ Eight directors


◆ 256 GB cache


VPLEX sizing tool

Use the EMC VPLEX sizing tool provided by EMC Global Services Software Development to configure the right VPLEX cluster configuration.

The sizing tool concentrates on I/O throughput requirement for installed applications (mail exchange, OLTP, data warehouse, video streaming, etc.) and back-end configuration such as virtual volumes, size and quantity of storage volumes, and initiators.

Upgrade paths

VPLEX facilitates application and storage upgrades without a service window through its flexibility to shift production workloads throughout the VPLEX technology.

In addition, high-availability features of the VPLEX cluster allow for non-disruptive VPLEX hardware and software upgrades.

This flexibility means that VPLEX is always servicing I/O and never has to be completely shut down.

Hardware upgrades

Upgrades are supported for single-engine VPLEX systems to dual- or quad-engine systems.

Two VPLEX Local systems can be reconfigured to work as a VPLEX Metro or VPLEX Geo.



Information for VPLEX hardware upgrades is in the Procedure Generator that is available through EMC PowerLink.

Software upgrades

VPLEX features a robust non-disruptive upgrade (NDU) technology to upgrade the software on VPLEX engines. Management server software must be upgraded before running the NDU.

Due to the VPLEX distributed coherent cache, directors elsewhere in the VPLEX installation service I/Os while the upgrade is taking place. This alleviates the need for service windows and reduces RTO.

The NDU includes the following steps:

◆ Preparing the VPLEX system for the NDU

◆ Starting the NDU

◆ Transferring the I/O to an upgraded director

◆ Completing the NDU

Introduction 37

38


VPLEX management interfacesWithin the VPLEX cluster, TCP/IP-based management traffic travels through a private network subnet to the components in one or more clusters. In VPLEX Metro and VPLEX Geo, VPLEX establishes a VPN tunnel between the management servers of both clusters. The VPLEX management station also extends to the VPLEX Witness via VPN tunnel (3-ways) once it is implemented into an environment.

Web-based GUI

VPLEX includes a Web-based graphical user interface (GUI) for management. The EMC VPLEX Management Console Help provides more information on using this interface.

To perform other VPLEX operations that are not available in the GUI, refer to the CLI, which supports full functionality. The EMC VPLEX CLI Guide provides a comprehensive list of VPLEX commands and detailed instructions on using those commands.

The EMC VPLEX Management Console contains but not limited to the following functions:

◆ Supports storage array discovery and provisioning

◆ Local provisioning

◆ Distributed provisioning

◆ Mobility Central

◆ Online help

VPLEX CLI

VPlexcli is a command line interface (CLI) to configure and operate VPLEX systems. It also generates the EZ Wizard Setup process to make installation of VPLEX easier and quicker.

The CLI is divided into command contexts. Some commands are accessible from all contexts, and are referred to as ‘global commands’.

The remaining commands are arranged in a hierarchical context tree that can only be executed from the appropriate location in the context tree.



The VPlexcli encompasses all capabilities in order to function if the management station is unavailable. It is fully functional, comprehensive, supporting full configuration, provisioning and advanced systems management capabilities.

SNMP support for performance statistics

The VPLEX snmpv2c SNMP agent:

◆ Supports retrieval of performance-related statistics as published in the VPLEX-MIB.mib.

◆ Runs on the management server and fetches performance related data from individual directors using a firmware specific interface.

◆ Provides SNMP MIB data for directors for the local cluster only.

LDAP /AD supportVPLEX offers Lightweight Directory Access Protocol (LDAP) or Active Directory for an authentication directory service.

VPLEX Element Manager APIVPLEX Element Manager API uses the Representational State Transfer (REST) software architecture for distributed systems such as the World Wide Web. It allows software developers and other users to use the API to create scripts to run VPLEX CLI commands.

The VPLEX Element Manager API supports all VPLEX CLI commands that can be executed from the root context on a director.

The system management software for VPLEX family systems consists of the following high-level components:

◆ Command line utility

◆ Management console (web interface)

◆ Business layer

◆ Firmware layer

Each cluster in a VPLEX deployment requires one management server, which is embedded in the VPLEX cabinet along with other essential components, such as the directors and internal Fibre

VPLEX management interfaces 39

40


Channel switches. The management server communicates through private, redundant IP networks with each director. The management server is the only VPLEX component that is configured with a public IP address on the customer network.

The management server is accessed through a Secure Shell™ (SSH®). Additionally the administrator may run VNC client to the management server. Within the SSH session the administrator can run a CLI utility called VPlexcli to manage the system. Alternatively, the VPLEX management console web interface (GUI) can be started by pointing a browser at the management server’s public IP address.

The following processes run on the management server:

◆ System Management Server — Communicates with the directors, retrieves logs by querying system state, supports multiple concurrent CLI and HTTP sessions, listens to the system events and determines which events are of interest for call home, and interprets the call home list and initiates the call home.

◆ EmaAdapter — Collects events from VPLEX components and sends them to ConnectEMC.

◆ ConnectEMC — Receives the formatted events and sends them to EMC.com



Simplified storage management VPLEX supports a variety of arrays from various vendors covering both active/active and active/passive type arrays. VPLEX simplifies storage management by allowing simple LUNs, provisioned from the various arrays, to be managed through a centralized management interface that is simple to use and very intuitive. In addition, a Metro-Plex or Geo-Plex environment that spans data centers allows the storage administrator to manage both locations through the one interface from either location by logging in at the local site.

Simplified storage management 41

42


Management server user accountsThe management server requires the setup of user accounts for access to certain tasks. Table 3 describes the types of user accounts on the management server.

Some service and administrator tasks require OS commands that require root privileges. The management server has been configured to use the sudo program to provide these root privileges just for the duration of the command. Sudo is a secure and well-established UNIX program for allowing users to run commands with root privileges.

VPLEX documentation will indicate which commands must be prefixed with "sudo" in order to acquire the necessary privileges. The sudo command will ask for the user's password when it runs for the first time, to ensure that the user knows the password for his account. This prevents unauthorized users from executing these privileged commands when they find an authenticated SSH login that was left open.

Table 3 Management server user accounts

Account type Purpose

admin (customer) • Performs administrative actions, such as user management

• Creates and deletes Linux CLI accounts• Resets passwords for all Linux CLI users• Modifies the public Ethernet settings

service (EMC service)

• Starts and stops necessary OS and VPLEX services• Cannot modify user accounts• (Customers do have access to this account)

Linux CLI accounts • Uses VPlexcli to manage federated storage

All account types • Uses VPlexcli• Modifies their own password• Can SSH or VNC into the management server• Can SCP files off the management server from directories

to which they have access



Management server softwareThe management server software is installed during manufacturing and is fully field upgradeable. The software includes:

◆ VPLEX Management Console

◆ VPlexcli

◆ Server Base Image Updates (when necessary)

◆ Call-home software

Management console

The VPLEX Management Console provides a graphical user interface (GUI) to manage the VPLEX cluster. The GUI can be used to provision storage, as well as manage and monitor system performance.

Figure 7 on page 44 shows the VPLEX Management Console window with the cluster tree expanded to show the objects that are manageable from the front-end, back-end, and the federated storage.

Management server software 43

44


Figure 7 VPLEX Management Console

The VPLEX Management Console provides online help for all of its available functions. You can access online help in the following ways:

◆ Click the Help icon in the upper right corner on the main screen to open the online help system, or in a specific screen to open a topic specific to the current task.

◆ Click the Help button on the task bar to display a list of links to additional VPLEX documentation and other sources of information.



Figure 8 is the welcome screen of the VPLEX Management Console GUI, which utilizes a secure http connection via a browser. The interface uses Flash technology for rapid response and unique look and feel.

Figure 8 Management Console welcome screen

Command line interface

The VPlexcli is a command line interface (CLI) for configuring and running the VPLEX system, for setting up and monitoring the system’s hardware and intersite links (including com/tcp), and for configuring global inter-site I/O cost and link-failure recovery. The CLI runs as a service on the VPLEX management server and is accessible using Secure Shell (SSH).

Management server software 45

46


For information about the VPlexcli, refer to the EMC VPLEX CLI Guide.

System reportingVPLEX system reporting software collects various configuration information from each cluster and each engine. The resulting configuration file (XML) is zipped and stored locally on the management server or presented to the SYR system at EMC via call home.

You can schedule a weekly job to automatically collect SYR data (VPlexcli command scheduleSYR), or manually collect it whenever needed (VPlexcli command syrcollect).



Director softwareThe director software provides:

◆ Basic Input/Output System (BIOS ) — Provides low-level hardware support to the operating system, and maintains boot configuration.

◆ Power-On Self Test (POST) — Provides automated testing of system hardware during power on.

◆ Linux — Provides basic operating system services to the Vplexcli software stack running on the directors.

◆ VPLEX Power and Environmental Monitoring (ZPEM) — Provides monitoring and reporting of system hardware status.

◆ EMC Common Object Model (ECOM) —Provides management logic and interfaces to the internal components of the system.

◆ Log server — Collates log messages from director processes and sends them to the SMS.

◆ GeoSynchrony (I/O Stack) — Processes I/O from hosts, performs all cache processing, replication, and virtualization logic, interfaces with arrays for claiming and I/O.

Director software 47

48


Configuration overviewThe VPLEX configurations are based on how many engines are in the cabinet. The basic configurations are small, medium, and large, as shown in .

The configuration sizes refer to the number of engines in the VPLEX cabinet. The remainder of this section describes each configuration size.

Small configurations

The VPLEX-02 (small) configuration includes the following:

◆ Two directors

◆ One engine

◆ Redundant engine SPSs

◆ 8 front-end Fibre Channel ports

◆ 8 back-end Fibre Channel ports

◆ One management server

The unused space between engine 1 and the management server in Figure 9 on page 49 is intentional.



Figure 9 VPLEX small configuration

Medium configurations

The VPLEX-04 (medium) configuration includes the following:

◆ Four directors

◆ Two engines





◆ Redundant Fibre Channel COM switches for local COM; UPS for each Fibre Channel switch

VPLX-000255

SPS 1

Engine 1

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

Management server

Configuration overview 49

50


Figure 10 shows an example of a medium configuration.

Figure 10 VPLEX medium configuration

Large configurationsThe VPLEX-08 (large) configuration includes the following:

◆ Eight directors

◆ Four engines





VPLX-000254

SPS 1

Engine 2

Engine 1

UPS A

Management server

SPS 2

UPS B

Fibre Channel switch B

Fibre Channel switch A

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO



◆ Redundant Fibre Channel COM switches for local COM; UPS for each Fibre Channel switch

Figure 11 shows an example of a large configuration.

Figure 11 VPLEX large configuration

VPLX-000253

SPS 1

Engine 2

Engine 1

Engine 4

Engine 3

UPS A

Management server

SPS 2

SPS 3

SPS 4

UPS B

Fibre Channel switch B

Fibre Channel switch A

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

ONI

OFFO

Configuration overview 51

52


I/O implementationThe VPLEX cluster utilizes a write-through mode whereby all writes are written through the cache to the back-end storage. Writes are completed to the host only after they have been completed to the back-end arrays, maintaining data integrity.

This section describes the VPLEX cluster caching layers, roles, and interactions. It gives an overview of how reads and writes are handled within the VPLEX cluster and how distributed cache coherency works. This is important to the introduction of high availability concepts.

Cache coherence

Cache coherence creates a consistent global view of a volume.

Distributed cache coherence is maintained using a directory. There is one directory per user volume and each directory is split into chunks (4096 directory entries within each). These chunks exist only if they are populated. There is one directory entry per global cache page, with responsibility for:

◆ Tracking page owner(s) and remembering the last writer

◆ Locking and queuing

Meta-directory

Directory chunks are managed by the meta-directory, which assigns and remembers chunk ownership. These chunks can migrate using Locality-Conscious Directory Migration (LCDM). This meta-directory knowledge is cached across the share group for efficiency.

How a read is handled

When a host makes a read request, VPLEX first searches its local cache. If the data is found there, it is returned to the host.



If the data is not found in local cache, VPLEX searches global cache. Global cache includes all directors that are connected to one another within the VPLEX cluster. When the read is serviced from global cache, a copy is also stored in the local cache of the director from where the request originated.

If a read cannot be serviced from either local cache or global cache, it is read directly from the back-end storage. In this case both the global and local cache are updated to maintain cache coherency.

I/O flow of a read miss1. Read request issued to virtual volume from host.

2. Look up in local cache of ingress director.

3. On miss, look up in global cache.

4. On miss, data read from storage volume into local cache.

5. Data returned from local cache to host.

I/O flow of a local read hit1. Read request issued to virtual volume from host.


3. On hit, data returned from local cache to host.

I/O flow of a global read hit1. Read request issued to virtual volume from host.


3. On miss, look up in global cache.

4. On hit, data read from owner director into local cache.

5. Data returned from local cache to host.

How a write is handledAll writes are written through cache to the back-end storage. Writes are completed to the host only after they have been completed to the back-end arrays.

When performing writes, the VPLEX system Data Management (DM) component includes a per-volume caching subsystem that utilizes a subset of the caching capabilities:

I/O implementation 53

54


◆ Local Node Cache: cache data management, and back-end I/O interaction.

◆ Distributed Cache (DMG – Directory Manager): Cache coherence, dirty data protection, and failure recovery mechanics (fault-tolerance).

I/O flow of a write miss1. Write request issued to virtual volume from host.

2. Look for prior data in local cache.

3. Look for prior data in global cache.

4. Transfer data to local cache.

5. Data is written through to back-end storage.

6. Write is acknowledged to host.

I/O flow of a write hit1. Write request issued to virtual volume from host.

2. Look for prior data in local cache.

3. Look for prior data in global cache.

4. Invalidate prior data.

5. Transfer data to local cache.

6. Data is written through to back-end storage.

7. Write is acknowledged to host.


3

This chapter explains how VPLEX clusters are able to handle hardware failures in any subsystem within the storage cluster. Topics include:

◆ Overview............................................................................................. 56◆ Cluster.................................................................................................. 57◆ Path redundancy through different ports ...................................... 58◆ Path redundancy through different directors ................................ 59◆ Path redundancy through different engines .................................. 60◆ Path redundancy through site distribution.................................... 61◆ Safety check......................................................................................... 62

System andComponent Integrity

System and Component Integrity 55

56

System and Component Integrity

OverviewVPLEX clusters are capable of surviving any single hardware failure in any subsystem within the overall storage cluster. These include host connectivity subsystem, memory subsystem, etc. A single failure in any subsystem will not affect the availability or integrity of the data. Multiple failures in a single subsystem and certain combinations of single failures in multiple subsystems may affect the availability or integrity of data.

This availability requires that host connections be redundant and that hosts are supplied with multipath drivers. In the event of a front-end port failure or a director failure, hosts without redundant physical connectivity to a VPLEX cluster and without multipathing software installed may be susceptible to data unavailability.



ClusterA cluster is a collection of one, two, or four engines in a physical cabinet. A cluster serves I/O for one storage domain and is managed as one storage cluster.

All hardware resources (CPU cycles, I/O ports, and cache memory) are pooled:

◆ The front-end ports on all directors provide active/active access to the virtual volumes exported by the cluster.

◆ For maximum availability, virtual volumes must be presented through each director so that all directors but one can fail without causing data loss or unavailability. All directors must be connected to all storage.

Cluster 57

58


Path redundancy through different portsBecause all paths are duplicated, when a director port goes down for any reason, data seemlessly processes through a port of the other director, as shown in Figure 12.

Figure 12 Port redundancy

Multipathing software plus redundant volume presentation yields continuous data availability in the presence of port failures.



Path redundancy through different directorsIf a a director were to go down, the other director can completely take over the I/O processing from the host, as shown in Figure 13.

Figure 13 Director redundancy

Multipathing software plus volume presentation on different directors yields continuous data availability in the presence of director failures.

Path redundancy through different directors 59

60


Path redundancy through different enginesIn a clustered environment, if one engine goes down, another engine completes the host I/O processing, as shown in Figure 14.

Figure 14 Engine redundancy

Multipathing software plus volume presentation on different engines yields continuous data availability in the presence of engine failures.



Path redundancy through site distributionDistributed site redundancy now enabled through Metro HA ensures that if a site goes down, or even if the link to that site goes down, the other site can continue seamlessly processing the host I/O, as shown in Figure 15. On site failure of Site B, the I/O continues unhindered on Site A.

Figure 15 Site redundancy

Path redundancy through site distribution 61

62


Safety checkIn addition to the redundancy fail-safe features, the VPLEX cluster provides event logs and call home capability.


4

This chapter explains VPLEX architecture and operation:

◆ Foundations of VPLEX High Availability ..................................... 64◆ Failure handling without VPLEX Witness (Static bias) ................ 72

Foundations of VPLEXHigh Availability

Foundations of VPLEX High Availability 63

64

Foundations of VPLEX High Availability

Foundations of VPLEX High Availability The following section discusses several disruptive scenarios at a high level to a multiple site VPLEX configuration. The purpose of this section is to provide the customer or solutions’ architect the ability to understand site failure semantics prior to the implementation of VPLEX Witness and related solutions outlined in this book. This section isn’t designed to highlight flaws in the current high availability architecture as implemented in basic VPLEX best practices. All solutions that are deployed in a Metro HA Active /Active state be they VPLEX or not will run into the same issues when not deploying a “Witness.” The decision for an architect to apply the VPLEX Witness capabilities or enhance connectivity paths across data centers using the Metro HA Cross-Cluster Connect solution is dependent on their basic fail-over needs.

Note: To ensure the explanation of this subject remains at a high level, for the following section the graphics have been broken down into major objects (e.g. Site A, Site B and Link) please assume that within each site resides a VPLEX cluster therefore when a site failure is shown it will also cause a full VPLEX cluster failure within that site. Please also assume that the link object between sites represents the main inter-cluster data network connected to each VPLEX cluster in either site. Also, assume that each site shares the same failure domain. A site failure will affect all components within this failure domain including VPLEX cluster.

This representation of Figure 16 as described shows normal operation where all three components are fully operational. (Note: green symbolizes normal operation and red symbolizes failure)

Figure 16 High level functional sites in communication

Figure 17 on page 65 demonstrates that site A has failed. When we observe this figure imagine if a service, application or VM was running only in site A at the time of the incident it would now need to be restarted at the remaining site B. We know this as we have an



external perspective since we can see the entire diagram, however if we were looking at this purely from site B’s perspective all the VPLEX would know is that communication has been lost to Site A, although it would be impossible to distinguish if this was a full failure at site A or simple a link failure.

Figure 17 High level Site A failure

A link failure as depicted by the red arrow in Figure 18 is representative of an inter-cluster link failure.

Figure 18 High level Inter-site link failure

Similar to the previous example if we looking at this from an overall perspective we can see that it is the link which is faulted, however if we consider this from Site A or Site B’s perspective all that the VPLEX knows is that communication is lost to site A (exactly like the previous example) and cannot distinguish if it is the link or the site at fault.

If you take the basic Site A or Site B failure scenario as a basic disaster recovery scenario then apply the concepts of Active /Active philosophy. The next section shows how different failures affect a VPLEX distributed volume and highlights the different resolution required in each case starting with the site failure scenario. The high level Figure 19 on page 66 shows a VPLEX distributed volume spanning two sites:


66


Figure 19 VPLEX active and functional between two sites

As shown, the distributed volume is made up of a mirror at each site (M1 and M2) and using the distributed cache coherency semantics provided by VPLEX GeoSynchrony a consistent data presentation of a logical volume is achieved across both clusters. Furthermore due to the cache coherency the ability to perform Active/Active data access (both read and write) from two sites in enabled. Additionally shown in the example is a distributed network where users are able to access either site which would be true in a fully active/active environment.

In Figure 20 on page 67, if there was a failure at one of the sites (in this case site A has failed) then the distributed volume would become degraded since the hardware required at site A to support this particular mirror leg is no longer available. For a resolution to this example we would want to simply keep the volume active at site B so the application can resume there.



Figure 20 VPLEX concept diagram with failure at Site A

Figure 21 on page 68 shows the desired resolution if failure at site A was to occur. As discussed previously the outcome of this is to keep the volume online in site B.


68


Figure 21 Correct resolution after volume failure at Site A

The next section discusses the outcome after an inter-cluster link partition/failure. Figure 22 on page 69 shows the configuration before the failure.



Figure 22 VPLEX active and functional between two sites

Recall based on the Site A / Site B simple failure scenarios, when a link failed, neither site knew of the exact failure. With an Active / Active distributed volume, a link failure would also degrade the distributed volume since write I/O at either site would be unable be propagated to the remote site.

Figure 23 on page 70 shows what would happen if there was no “mechanism” to suspend I/O at one of the site in this scenario.


70


Figure 23 Inter-site link failure and cluster partition

As shown we can see this would lead to conflicting detach or split brain since writes could be accepted on both sites therefore giving the potential to end up with two different copies of the data. To protect against data corruption this situation has to be avoided therefore VPLEX must act and suspend access to the distributed volume on one of the clusters.

Figure 24 on page 71 displays a valid and acceptable state in the event of a link partition as site A is now suspended. This is the default and automatic behavior of VPLEX distributed volumes and protects against data corruption and split brain scenarios. The following section explains in more detail how this functions.



Figure 24 Correct handling of cluster partition


72


Failure handling without VPLEX Witness (Static bias)As previously demonstrated, in the presence of failures, VPLEX Active/Active distributed solutions require different resolutions depending on the type of failure, however since VPLEX version 4.0 had no means to perform external arbitration no “mechanism” existed to distinguish between a site failure and a link failure. To overcome this, a feature called “static bias” is used to guard against split brain scenarios occurring.

The premise of static bias is to set a rule ahead of failure for each distributed volume that spans two VPLEX clusters to effectively define which cluster will be declared a preferred cluster and maintain access to the volume and which cluster should be declared the alternative therefore suspending access should either of the VPLEX clusters lose communication with each other (this concept covers both site and link failure). This is known as a “detach rule” and means that one site can unilaterally detach the other cluster and assume that the detached cluster is either dead or that it will stay suspended if it is alive.

Figure 25 on page 73 shows how static bias can be set for each distributed volume or for naming sake, referred to as DR1.



Figure 25 VPLEX static detach rule

This detach rule can either be set within the VPLEX GUI or via VPLEX CLI.

Each volume can be either set to Cluster 1 detaches, or Cluster 2 detaches.

If the DR1 is set to Cluster 1 detaches, then in any failure scenario the preferred cluster for that volume would be declared as Cluster 1, but if the DR1 detach rule is set to Cluster 2 detaches, then in any failure scenario the preferred cluster for that volume would be declared as Cluster 2.

Note: Some people when looking at this prefer to substitute the word detaches for the word preferred which is perfectly acceptable and can make it easier to understand.

Failure handling without VPLEX Witness (Static bias) 73

74


Setting the rule set on a volume to Cluster 1 detaches, would mean that Cluster 1 would be the preferred site for the given volumes. (Additionally the terminology that Cluster 1 has the bias for the given volume is also appropriate)

Once this rule is set then regardless of the failure (be it link or site) the rule will always be invoked.

Below shows some examples of the rule set in action for different failures

The next example shows a site loss at B with a single DR1s set to Cluster 1 detaches. Figure 26 shows the initial running setup of the configuration. We can see that the volume is set to Cluster 1 detaches.

Figure 26 Typical detach rule setup



If there was a problem at site B, then the DR1 will become degraded as shown in Figure 27.

Figure 27 Non-preferred site failure

As the bias rule was set to Cluster 1 detaches, then the distributed volume will remain active at site A. This is shown in Figure 28 on page 76.


76


Figure 28 Volume remains active at Cluster 1

Therefore in this scenario if the service, application or VM was running only at site A (the preferred site) then it would continue uninterrupted without needing to re-start, however if the application was running only at site B on the given distributed volume then it will need to be restarted at site A, but since VPLEX is an active/active solution no manual intervention at the storage layer will be required in this case.

The next example shows static bias working under link failure conditions

Figure 29 on page 77 shows a configuration with a distributed volume set to Cluster 1 detaches as per the previous configuration.



Figure 29 Typical detach rule setup before link failure

If the link were now lost then the distributed volume will again be degraded as shown in Figure 30 on page 78.


78


Figure 30 Inter-site link failure and cluster partition

To ensure that split brain does not occur after this type of failure the static bias rule is applied and IO is suspended at Cluster 2 in this case as the rule is set to Cluster 1 detaches.

This can be observed in Figure 31 on page 79.



Figure 31 Suspension after inter-site link failure and cluster partition

Therefore in this scenario if the service, application or VM was running only at site A then it would continue uninterrupted without needing to re-start, however if the application was running only at site B then it will need to be restarted at site A since the bias rule set will suspend access for the given distributed volumes on Cluster2. Again no manual intervention will be required in this case at the storage level as the volume at Cluster 1 automatically remained available.

In summary we can see Static Bias is a very effective method of preventing split brain, however there is a particular scenario that will result in manual intervention if the static bias feature is used alone. This can happen if there is a VPLEX cluster or site failure at the “preferred cluster” (such as the pre-defined preferred cluster for the given distributed volume). This is shown in the example below where we begin with the configuration shown in Figure 32 on page 80 where there is distributed volumes which has Cluster 2 detaches set on the DR1.


80


Figure 32 Cluster 2 has bias

If site B had a total failure in this example disruption will now also occur at site A as shown in Figure 33 on page 81.



Figure 33 Preferred site failure causes full Data Unavailability

As we can see the preferred site has now failed and bias rule has been used, but since the rule is “static” and cannot distinguish between a link failure or remote site failure we can see that in this example the remaining site becomes suspended therefore in this case manual intervention will be required to bring the volume on line at site A.

Static bias is a very powerful rule. It does provide zero RPO and zero RTO resolution for non-preferred cluster failure and inter-cluster partition scenarios. It completely avoids split brain manually and in the presence of a preferred cluster failure providing non-zero RTO; it is good to note that this feature is available without automation and is a valuable failback when the Witness is unavailable or customer infrastructure cannot accommodate.

However, what if there were a “mechanism” other than the standard CLI intervention and provide a global view of failures in the Metro environment in the previous example? VPLEX Witness has been designed to overcome these scenarios since it can override the static bias and leave what was the non preferred site ACTIVE.


82



5


◆ VPLEX Witness overview and architecture ................................... 84◆ VPLEX Witness target solution, rules, and best practice ............. 87◆ VPLEX Witness failure semantics.................................................... 89◆ CLI example outputs ......................................................................... 95

Introduction to VPLEXWitness

Introduction to VPLEX Witness 83

84

Introduction to VPLEX Witness

VPLEX Witness overview and architectureVPLEX Metro 5.0 systems can now rely on a new component called VPLEX Witness. VPLEX Witness is an optional component designed to be deployed in customer environments where the regular bias rule sets are insufficient to provide seamless zero or near-zero RTO fail-over in the presence of site disasters and VPLEX cluster failures.

As described in the previous section, without VPLEX Witness, all distributed volumes rely on configured rule set to identify the preferred cluster in the presence of cluster partition or cluster/site failure. However, if the preferred cluster happens to fail (in the result of a disaster event, etc.), VPLEX is unable to automatically allow the alternative cluster to continue I/O to the affected distributed volumes. VPLEX Witness has been designed specifically overcome this case.

An external VPLEX Witness Server is installed as a virtual machine running on a customer supplied VMware ESX host deployed in a failure domain separate from either of the VPLEX clusters (to eliminate the possibility of a single fault affecting both the cluster and the VPLEX Witness). VPLEX Witness connects to both VPLEX clusters over the management IP network. By reconciling its own observations with the information reported periodically by the clusters, the VPLEX Witness enables the cluster(s) to distinguish between inter-cluster network partition failures and cluster failures and automatically resume I/O in these situations.

Figure 34 on page 85 shows a high level architecture of VPLEX Witness and how it can augment an existing static bias solution.The VPLEX Witness server resides in a fault domain separate from VPLEX Cluster 1 and Cluster 2.



Figure 34 High Level VPLEX Witness architecture

Since the VPLEX Witness server is external to both of the production locations more perspective can be gained as to the nature of a particular failure and the correct action taken since as mentioned previously it is this perspective that is vital to be able to determine between a site outage and a link outage as either one of these scenarios requires a different action to be taken.

Figure 35 on page 86 shows a high-level circuit diagram of how the VPLEX Witness Server should be connected.

VPLEX Witness overview and architecture 85

86


Figure 35 High Level VPLEX Witness deployment

As you can see the VPLEX Witness server is connected via the VPLEX management IP network in a third failure or fault domain.

Depending on the scenarios that is to be protected against, this third fault domain could reside in a different floor within the same building as VPLEX Cluster 1 and Cluster 2. It can also be located in a completely geographically dispersed data center which could be in a different country.

Note: VPLEX Witness Server supports up to 1 second of network latency over the management IP network.

Clearly with the example of the third floor in the building you would not be protected from a total building failure so depending on the requirement careful consideration should be given to choose this third failure domain.



VPLEX Witness target solution, rules, and best practiceVPLEX Witness is architecturally designed for intersite VPLEX clusters. Customers who wish to use VPLEX Local will not require VPLEX Witness functionality.

Furthermore VPLEX Witness is only suitable for customers who have a third failure domain connected via two physical networks from each of the data centers where the VPLEX clusters reside into each VPLEX management station Ethernet port.

VPLEX Witness failure handling semantics only apply to Distributed volumes in all consistency groups on a pair of VPLEX v5.x clusters if VPLEX Witness is enabled.

VPLEX Witness failure handling semantics do not apply to:

◆ Local volumes

◆ Distributed volumes outside of a consistency group

◆ Distributed volumes within a consistency group if the VPLEX Witness is disabled

At the time of writing only one VPLEX Witness Server can be configured for a given Metro system and when it is configured and enabled, its failure semantics applies to all configured consistency groups.

Additionally a single VPLEX Witness Server (virtual machine) can only support a single VPLEX Metro system (however more than one VPLEX Witness Server can be configured onto a single physical ESX host).

Figure 36 on page 88 shows the supported versions (at the time of writing) for VPLEX Witness.

VPLEX Witness target solution, rules, and best practice 87

88


Figure 36 Supported VPLEX versions for VPLEX Witness

As mentioned in Figure 36, depending on the solution, VPLEX Static bias alone without VPLEX Witness may still be relevant in some cases. Figure 37 shows the volume types and rules which can be supported with VPLEX Witness

Figure 37 VPLEX Witness volume types and rule support

Check the latest VPLEX ESSM (EMC simple support matrix) for the latest information including VPLEX Witness server physical host requirements and site qualification.



VPLEX Witness failure semanticsAs seen in the previous section VPLEX Witness will operate at the consistency group level for a group of distributed devices and will function in conjunction with the detach rule set within the Consistency Group.

Starting with the inter-cluster link partition the next few pages discuss failure scenarios (both site and link) which were raised in previous sections and show how the failure semantics differ using VPLEX Witness compared to just using static bias alone.

Figure 38 shows a typical setup for VPLEX 5.x with a single distributed volume configured in a consistency group which has a rule set configured for Cluster 2 detaches (such as Cluster 2 is preferred). Additionally it shows the VPLEX Witness server is connected via the management network in a third failure domain.

Figure 38 Typical VPLEX Witness configuration

VPLEX Witness failure semantics 89

90


If the inter-cluster link were to fail in this scenario VPLEX Witness would still be able to communicate with both VPLEX clusters since the management network that connects the VPLEX Witness server to both of the VPLEX clusters is still operational. By communicating with both VPLEX Clusters, the VPLEX Witness feature will deduce that the inter-cluster link has failed since both VPLEX Clusters report to the VPLEX Witness server that the connectivity with the remote VPLEX cluster has been lost. (such as, Cluster 1 reports that Cluster 2 is unavailable and vice versa). This is shown in Figure 39.

Figure 39 VPLEX Witness and an inter-cluster link failure

In this case the clusters adhere to the pre-configured static bias rules and volume access at Cluster 1 will be suspended since the rule set was configured as Cluster 2 detaches. Figure 40 on page 91 shows the final state after this failure.



Figure 40 VPLEX Witness and static bias after cluster partition

The next example shows how VPLEX Witness can assist if we have a site failure at the preferred site. As discussed above this type of failure without VPLEX Witness would cause the volumes in the remaining site to go offline. This is where VPLEX Witness greatly improves the outcome of this event and remove the need for manual intervention.

Figure 41 on page 92 shows a typical setup for VPLEX v5.x with a distributed volume configured in a consistency group and has a rule set configured for Cluster 2 detaches (such as, Cluster 2 wins).


92


Figure 41 VPLEX Witness typical configuration for Cluster 2 detaches

Figure 42 on page 93 shows that site B has now failed.



Figure 42 VPLEX Witness diagram showing Cluster 2 failure

As we know from the previous section, when a site has failed then the distributed volumes are now degraded, however unlike our previous example where there was a site failure at the preferred site and the static bias rule was used forcing volumes into a suspend state at Cluster 1, VPLEX Witness will now observe that communication is still possible to Cluster 1 (but not Cluster 2). Additionally since Cluster 1 cannot contact Cluster 2, VPLEX Witness can make an informed decision and instruct Cluster 1 to override the static rule set and proceed with I/O.


94


Figure 43 shows the outcome:

Figure 43 VPLEX Witness with static bias override

Clearly this is big improvement on the scenario where this happened with just the static bias rule set but not using VPLEX Witness. Since volumes had to be suspended at Cluster 1 previously there was no way to tell the difference between a site failure or a link failure.

Refer to VPLEX Witness product documentation to fully understand all other rules and states of the feature such as cluster isolation.



CLI example outputsOn systems where VPLEX Witness is deployed and configured, the VPLEX Witness CLI context appears under the root context as "Cluster-Witness." By default, this context is hidden and will not be visible until VPLEX Witness has been deployed by running the "Cluster-Witness configure" command. Once the user deployed VPLEX Witness, the VPLEX Witness CLI context becomes visible.

The CLI context typically displays the following information:

VPlexcli:/> cd cluster-witness/

VPlexcli:/cluster-witness> ls Attributes: Name Value------------- -------------admin-state enabledprivate-ip-address 128.221.254.3public-ip-address 10.31.25.45

Contexts: components

VPlexcli:/cluster-witness> ll components/ /cluster-Witness/components:

Name ID Admin State Operational State Mgmt Connectivity---------- -- ----------- ------------------- -----------------cluster-1 1 enabled in-contact okcluster-2 2 enabled in-contact okserver - enabled clusters-in-contact ok

VPlexcli:/cluster-Witness> ll components/* /cluster-Witness/components/cluster-1: Name Value----------------------- ------------------------------------------------------admin-state enableddiagnostic INFO: Current state of cluster-1 is in-contact (last state change: 0 days, 13056 secs ago; last message from server: 0 days, 0 secs ago.)id 1management-connectivity okoperational-state in-contact

/cluster-witness/components/cluster-2: Name Value----------------------- ------------------------------------------------------admin-state enabled

CLI example outputs 95

96


diagnostic INFO: Current state of cluster-2 is in-contact (last state change: 0 days, 13056 secs ago; last message from server: 0 days, 0 secs ago.)id 2management-connectivity okoperational-state in-contact

/cluster-Witness/components/server: Name Value----------------------- ------------------------------------------------------admin-state enableddiagnostic INFO: Current state is clusters-in-contact (last state change: 0 days, 13056 secs ago.) (last time of communication with cluster-2: 0 days, 0 secs ago.) (last time of communication with cluster-1: 0 days, 0 secs ago.)id -management-connectivity okoperational-state clusters-in-contact

Details of cluster-Witness CLI context attributesOn systems where Cluster Witness is deployed, the Cluster Witness CLI context appears as cluster-Witness under the root context. By default, the Cluster Witness context is an optional hidden context and must be created with the cluster-Witness configure command after Cluster Witness deployment.

See the VPLEX CLI Guide for more information on the cluster-Witness configure command.

The cluster-Witness context includes the following sub-contexts:

/cluster-Witness/components

/cluster-Witness/components/cluster-1

/cluster-Witness/components/cluster-2

/cluster-Witness/components/server

Use the ls and ll commands to display VPLEX Witness status information.

Use ll command to display status related to the VPLEX Witness components on Cluster 1, Cluster 2, and the VPLEX Witness Server.

VPlexcli:/cluster-Witness> ls



Attributes: Name Value------------- -------------admin-state enabledprivate-ip-address 128.221.254.3public-ip-address 10.31.25.45

Contexts: components

Table 4 Output from ls for brief VPLEX Witness status

Field name Description

admin-state This attribute identifies whether VPLEX Witness functionality (as a whole) is enabled or disabled. If VPLEX Witness functionality is enabled, the clusters send health observations to the VPLEX Witness Server and the VPLEX Witness Server provides guidance to the clusters when the VPLEX Witness Server observes inter-cluster partition and cluster failure/isolation scenarios. If VPLEX Witness functionality is disabled, the clusters follow configured detach rule sets to allow or suspend I/O to the distributed volumes in all consistency groups when inter-cluster partition or cluster failure/isolation scenarios occur. When VPLEX Witness functionality is disabled, the communication of health observations and guidance stops between the clusters and the VPLEX Witness Server. In this case, all distributed volumes in all consistency groups leverage their pre-configured detach rule sets regardless of VPLEX Witness.To determine the administrative state of individual components, refer to the admin-state attribute associated with the individual component context. This admin-state value at the top-level cluster-Witness context is one of the following: unknown: There is partial management network connectivity between this Management Server and VPLEX Witness components that are supposed to report their administrative state. To identify the component that is unreachable over the management network, refer to the output of the individual component contexts. enabled: All VPLEX Witness components are reachable over the management network and report their administrative state as enabled. disabled: All VPLEX Witness components are reachable over the management network and report their administrative state as disabled. inconsistent: All VPLEX Witness components are reachable over the management network but some components report their administrative state as disabled while others report it as enabled. This should be an extremely rare state, which may result from a potential but highly unlikely failure during enabling or disabling. Please call EMC Customer Service if you see this state.

private- ip-address This read-only attribute identifies the private IP address of the VPLEX Witness Server VM (128.221.254.3) that is used for VPLEX Witness-specific traffic.

public-ip-address This read-only attribute identifies the public IP address of the VPLEX Witness Server VM that is used as an endpoint of the IPsec tunnel.

components This sub-context displays all the individual components of VPLEX Witness that include both VPLEX clusters configured with VPLEX Witness functionality and the VPLEX Witness Server. Each sub-context displays details for the corresponding individual component.


98


Use ll command to display status related to the VPLEX Witness components on Cluster 1, Cluster 2, and the VPLEX Witness Server.

From the VPlexcli:/cluster-Witness> context, issue:

VPlexcli:/cluster-Witness> ll components/ /cluster-Witness/components: Name ID Admin State Operational State Mgmt Connectivity---------- -- ----------- ------------------- -----------------cluster-1 1 enabled in-contact okcluster-2 2 enabled in-contact okserver - enabled clusters-in-contact ok

Table 5 Output from ll command for brief VPLEX Witness component status (page 1 of 2)

Field name Description

admin-state This field identifies whether the corresponding component is enabled or not. The supported values are: enabled: VPLEX Witness functionality is enabled on this component disabled: VPLEX Witness functionality is disabled on this component. unknown: This component is not reachable and its administrative state cannot be determined.

diagnostic This is a diagnostic string is generated by CLI based on the analysis of the data and state information reported by the corresponding component.

id The cluster-id for the cluster components. The VPLEX CLI ignores this field for the VPLEX Witness Server and reports the value as a dash “-”.

management-connectivity

This field displays the communication status to the VPLEX Witness component from the local CLI session over the management network. The possible values are: ok: The component is reachable failed: The component is not reachable



operational-state(server component)

This field represents the operational state of the corresponding server component. The clusters-in-contact state is the only healthy state. All other states indicate a problem.clusters-in-contact: According to the latest data reported by each of clusters, both clusters are in contact with each other over the inter-cluster network. cluster-partition: According to VPLEX Witness Server observations, the clusters partitioned from each other over the inter-cluster network, while the VPLEX Witness Server could still talk to each of them. cluster-unreachable: According to VPLEX Witness Server observations, one cluster has either failed or become isolated (that is partitioned from its peer cluster and disconnected from the VPLEX Witness Server). unknown: VPLEX Witness Server does not know the states of one or both of the clusters and needs to learn them before it can start making decisions. VPLEX Witness Server assumes this state upon startup.When the server operational state is set to "cluster-partition" or "cluster-unreachable", this operational state may not necessarily reflect the current observation of the VPLEX Witness Server. After VPLEX Witness Server transitions to this state and provides guidance to both clusters, it stays in this state regardless of more recent observations until it observes complete recovery of the clusters and their inter-cluster connectivity. (This prevents split brain.) The VPLEX Witness Server state and the guidance that it provides to the clusters based on its state is sticky in a sense that if VPLEX Witness Server observes a failure (changes its state and provides guidance to the clusters), the VPLEX Witness Server will maintain this state even if current observations change. VPLEX Witness Server will maintain its failure state and guidance until both cluster and their connectivity fully recover. This policy is implemented in order to avoid potential Data Corruption scenarios due to possible split brain.

operational-state(cluster component)

This field represents the operational state of the corresponding cluster component. in-contact: This cluster is in contact with its peer over the inter-cluster network. Rebuilds may be in progress. Subject to other system-wide restrictions, I/O to all distributed volumes in all consistency groups is allowed from VPLEX Witness’ perspective. cluster-partition: This cluster is not in contact with its peer and VPLEX Witness Server declared that two clusters partitioned. Subject to other system-wide restrictions, I/O to all distributed volumes in all consistency groups is allowed from VPLEX Witness’ perspective. remote-cluster-isolated-or-dead: This cluster is not in contact with its peer and the VPLEX Witness Server declared that the remote cluster (i.e. the peer) was isolated or dead. Subject to other system-wide restrictions, I/O to all distributed volumes in all consistency groups is allowed from VPLEX Witness’ perspective. local-cluster-isolated: This cluster is not in contact with its peer and the VPLEX Witness Server declared that the remote cluster (i.e. the peer) as the only proceeding cluster. This cluster must suspend I/O to all distributed volumes in all consistency groups regardless of bias. unknown: This cluster is not in contact with its peer over the inter-cluster network and is awaiting guidance from the VPLEX Witness Server. I/O to all distributed volumes in all consistency groups is suspended regardless of bias.

Table 5 Output from ll command for brief VPLEX Witness component status (page 2 of 2)


100


VPLEX Witness Clusterisolation semantics

and dual failures

As discussed in the previous section we can see that deploying a VPLEX solution with VPLEX Witness will give continuous availability to the storage volumes regardless of there being a site failure or inter-cluster link failure. These types of failure are deemed single component failures and we have shown no single point of failure can induce data unavailability using the VPLEX Witness.

It should be noted, however, that in rare situations more than one fault or component outage can occur especially when considering inter-cluster communication links which if two failed at once would lead to a VPLEX cluster isolation at a given site.

For instance, if we consider a typical VPLEX Setup with VPLEX Witness we will automatically have three failure domains (let’s call then A, B & C where VPLEX Cluster 1 resides at A, VPLEX Cluster 2 at B and the VPLEX Witness server resides at C). In this case there will be in inter cluster link between A and B (Cluster 1 and 2), plus a management IP link between A and C as well as a management IP link between B and C effectively giving a triangulated topology.

In rare situations there is a chance that if any two of these three links fail then one of the sites will be isolated (cut off).

Due to the nature of VPLEX Witness, these types of isolation can also be dealt with effectively without manual intervention.

This is achieved since a site isolation is very similar in terms of technical behavior to a full site outage the main difference being that the isolated site is still fully operational and powered up (but needs to be forced into I/O suspension) unlike a site failure where the failed site is not operational.

In these cases the failure semantics and VPLEX Witness are effectively the same however two further actions are taken at the site that becomes isolated:

◆ I/O is shut off/suspended at the isolated site.

◆ The VPLEX cluster will attempt to call home.



Figure 44 shows the three scenarios that are described above:

Figure 44 Possible dual failure cluster isolation scenarios

As discussed previously, it is extremely rare to experience a double failure and figure 44 showed how VPLEX can automatically ride through isolation scenarios; however there are also some other possible situations where a dual failure could occur and require manual intervention at one of the VPLEX Clusters as VPLEX Witness will not be able to distinguish the actual failure

Note: If best practices are followed then the likely hood of these scenarios occurring are significantly less likely than even the rare isolation incidents discussed above mainly as the faults would have to disrupt components in totally different fault domains that would be spread over many miles.


102


Figure 45 shows three scenarios where a double failure would require manual intervention to bring the remaining component online since VPLEX Witness would not be able to determine the gravity of the failure.

Figure 45 Highly unlikely dual failure scenarios that require manual intervention

VPLEX Witness – The importance of the third failure domain

As discussed in the previous section we now understand that dual failures can occur but are highly unlikely. As also mentioned many times within this TechBook, it is imperative that if VPLEX Witness is to be deployed then the VPLEX Witness server component is installed into a different failure domain than either of the two VPLEX clusters.



Figure 46 shows two further dual failure scenarios where both a VPLEX cluster has failed as well as the VPLEX Witness server.

Figure 46 Two further dual failure scenarios that would require manual intervention

Again as before if best practice is followed and each component resides within its own fault domain then these two situations are just as unlikely as the previous three scenarios that required manual intervention, however now consider what could happen if the VPLEX Witness server was not deployed within a third failure domain, but rather in the same domain as one of the VPLEX clusters.

This situation would mean that a single domain failure would potentially induce a dual failure as two components may have been residing in the same fault domain. This effectively turns a highly unlikely scenario into a more probable single failure scenario and should be avoided.

By deploying the VPLEX Witness server into a third failure domain the dual failure risk is substantially lowered therefore manual intervention would never be required since a fault would have to disable more than one dissimilar component potentially hundreds of miles apart spread over different fault domains.


104



6


◆ Metro HA overview......................................................................... 106◆ VPLEX Metro HA with Cross-Cluster Connect........................... 107◆ VPLEX Metro HA without Cross-Cluster Connect..................... 116

Combining VPLEX HighAvailability and VPLEX

Witness

Combining VPLEX High Availability and VPLEX Witness 105

106

Combining VPLEX High Availability and VPLEX Witness

Metro HA overviewFrom a technical perspective VPLEX Metro HA solutions are effectively three new flavors of reference architecture which utilize the new VPLEX Witness feature in VPLEX v5.0 and therefore greatly enhance an overall solutions ability to tolerate component failure causing less or no disruption than legacy solutions with little or no human intervention over either Cross-Cluster or Metro distances

The two main architecture types enabled by VPLEX Witness feature are:

◆ Metro HA with Cross-Cluster Connect defined as those clusters that are within limitations of host ISL cross connectivity.

◆ Metro HA with distances higher than the limitations of ISL cross connectivity.

This section will look at each of these solutions in turn and show how value can be derived by stepping through the different failure scenarios.



VPLEX Metro HA with Cross-Cluster ConnectVPLEX Metro HA Cross-Cluster Connect can be deployed when two sites are within campus distance of each other (up to 1ms round trip latency). A VPLEX Metro distributed volume can then be deployed across the two sites using a cross connect front end configuration and a VPLEX Witness server installed within a different fault domain.

Figure 47 shows a high level schematic of a Metro HA Cross-Cluster Connect solution for VMware.

Figure 47 High-level diagram of a Metro HA Cross-Cluster Connect solution for VMware

The key benefit to this solution and can eliminate in most cases RTO altogether if objects or components were to fail.

VPLEX Metro HA with Cross-Cluster Connect 107

108


Failure scenariosAlthough the following VPLEX Metro HA environments are compatible with multiple cluster technologies including HyperV and Microsoft Cluster Services, we will assume for these failure scenarios that vSphere 4.1 or higher is configured in a stretched HA topology with DRS so that all of the physical hosts (ESX servers) are within the same HA cluster.

As discussed previously this type of configuration brings the ability to teleport virtual machine’s over distance which is extremely useful in disaster avoidance, load balancing and cloud infrastructure use cases all using out of the box features and functions, however additional value can be derived from deploying the VPLEX Metro HA Cross-Cluster Connect solution to ensure total availability.

Figure 48 on page 109 shows the topology of an Metro HA Cross-Cluster Connect environment divided up into logical fault domains. The following sections will demonstrate the recovery automation for a single failure within any of these domains and show how no single fault in any domain can take down the system as a whole, and in most cases without even an interruption of service.



Figure 48 Metro HA Cross-Cluster Connect diagram with failure domains


110


If a physical host failure were to occur in either domain A1 or B1 the VMware HA cluster would restart the affected virtual machine’s on the remaining ESX servers.

Figure 49 shows all physical ESX hosts failing in domain A1. Since all of the physical hosts in domain B1 are connected to the same datastores via the VPLEX Metro Distributed device VMware HA can restart the virtual machines on any of the physical ESX hosts in domain B1.

Figure 49 Metro HA Cross-Cluster Connect diagram with disaster in zone A1

The next example describes what will happen in the unlikely event that a VPLEX cluster was to fail in either domain A2 or B2. Examples of how this could happen would include power outage, flood or fire.

In this instance there would be no interruption of service to any of the virtual machines.



Figure 50 shows a full VPLEX cluster outage in domain A2. As you can see from the graphic since the ESX servers are cross connected to both VPLEX clusters in each site VMware will simply re-route the I/O to the alternate path which is still available since VPLEX is configured with a VPLEX Witness protected distributed volume which will ensure the distributed volume will remain online in domain B2 as the VPLEX Witness Server will observe that it cannot communicate with the VPLEX cluster in A2 and guide the VPLEX cluster in B2 to remain online as this also cannot communicate with A2 therefore meaning that A2 is either isolated or failed.

Note: Similarly in the event of a full isolation at A2 then the distributed volumes would simply suspend since communication would not be possible to either the VPLEX Witness Server or the VPLEX cluster in domain B2. In this case the outcome is identical from a VMware perspective and there will be no interruption.

Figure 50 Metro HA Cross-Cluster Connect diagram with failure in zone A2


112


The next example describes what will happen in the event of a failure to one (or all of) the back end storage arrays in either domain A3 or B3.

Again in this instance there would be no interruption to any of the virtual machines.

Figure 51 shows the failure to all storage arrays that reside in domain A3. Since a cache coherent VPLEX Metro distributed volume is configured between domains A2 and B2 IO can continue to be actively serviced from the VPLEX in A2 even though the local back end storage has failed. This is due to the embedded VPLEX cache coherency which will efficiently cache any reads into the A2 domain whilst also propagating writes to the back end storage in domain B3 via the remote VPLEX cluster in site B2.

Figure 51 Metro HA Cross-Cluster Connect diagram with failure in zone A3 or B3



The next example describes what will happen in the event of a VPLEX Witness server failure in domain C1.

Again in this instance there would be no interruption to any of the virtual machines or VPLEX clusters.

Figure 52 shows a complete failure to domain C3 where the VPLEX Witness server resides. Since the VPLEX Witness in not within the I/O path and is only an optional component I/O will actively continue for any distributed volume in domains A2 and B2 since the inter-cluster link is still available therefore meaning cache coherency can be maintained between the VPLEX cluster domains.

Although the service is uninterrupted, both VPLEX clusters will now dial home and remote they have lost communication with the VPLEX Witness Server. In this case, the system is in jeopardy of a DU should the cluster failure in the inter-cluster network partition happen while the Witness is down. The Witness may be disabled manually if necessary to invoke static bias rules should it be disabled long-term.

Figure 52 Metro HA Cross-Cluster Connect diagram with failure in zone C1


114


The next example describes what will happen in the event of a failure to the inter-cluster link between domains A2 and B2.

Again in this instance there would be no interruption to any of the virtual machines or VPLEX clusters.

Figure 53 on page 115 shows the inter-cluster link has failed between domains A2 and B2. In this instance the static bias rule set which was defined previously will be invoked since neither VPLEX cluster can communicate with the other VPLEX cluster (but the VPLEX Witness Server can communicate with both VPLEX Clusters) therefore access to the given distributed volume within one of the domains A2 or B2 will be suspended. Since in this example there are alternate paths still available to the remote VPLEX cluster where the volume remains online VMware will simply re-route the traffic to the alternate VPLEX cluster therefore the virtual machine will remain online and unaffected whichever site it was running on.

Note: It is plausible in this example that the alternate path is physically routing across the same ISL that has failed. In this instance there could be a small interruption if a virtual machine was running is A1 as it will be restarted in B1 since the alternate path is also dead.



Figure 53 Metro HA Cross-Cluster Connect diagram with intersite link failure


116


VPLEX Metro HA without Cross-Cluster ConnectVPLEX Metro HA without Cross-Cluster connection deployment is very similar to Metro HA Cross-Cluster connect deployment as mentioned in the previous section however this solution is designed to cover distances beyond the campus range (i.e. campus would be used for latencies of up to 1ms round trip) and into distances of a metropolitan range where round trip latency would be around 5 ms but does not exceed 10ms (assuming the application is tolerant to this). A VPLEX Metro distributed volume can then be deployed across the two sites as well as deploying a VPLEX Witness server within a different third failure/fault domain.

Figure 54 shows a high level schematic of an Metro HA solution for VMware. without the Cross-Cluster deployment.

Figure 54 Metro HA Standard High-level diagram



The key benefit to this solution is a significant reduction and in some cases the elimination of RTO altogether if objects or components were to fail.

Failure scenariosAgain for this section we will assume for these failure scenarios that vSphere 4.1 or higher is configured in a stretched HA topology so that all of the physical hosts at either site (ESX servers) are within the same HA cluster. Also, as with the previous section deploying a stretched VMware configuration with Metro HA, it is also possible to enable long distance virtual machine teleportation since the virtual machine datastores still reside on a VPLEX Metro distributed volume.

Figure 55 shows the topology of an Metro HA environment divided up into logical fault domains. The next section will demonstrate the recovery automation for a single failure within any of these domains.

Figure 55 Metro HA high-level diagram with fault domains

VPLEX Metro HA without Cross-Cluster Connect 117

118


The following example describes what will happen in the unlikely event that a VPLEX cluster was to fail in either domain A2 or B2. In this instance there would no interruption of service to any virtual machine’s running in domain B1, however any virtual machine’s that were running in domain A1 would see a minor interruption as the virtual machine’s are restarted at B1.

Figure 56 on page 119 shows a full VPLEX cluster outage in domain A2. As you can see from the graphic since the ESX servers are not cross zoned/presented to both VPLEX clusters in each site VMware will have to perform a HA restart for the virtual machines within domain A2. It can do this since the distributed volumes will remain active at B2 as the VPLEX is configured with VPLEX Witness protected distributed volume which will deduce that the domain A2 is unavailable (since the neither the VPLEX Witness Server of the VPLEX cluster in B2 can communicate with the VPLEX cluster in A2 therefore VPLEX Witness will guide the VPLEX cluster in B2 to remain online).



Figure 56 Metro HA high-level diagram with failure in domain A2

The next example describes what will happen in the event of a failure to the inter-cluster link between domains A2 and B2.

One of two outcome of this scenario will happen:

◆ If the static bias for a given distributed volume was set to Cluster 1 detaches (assuming Cluster 1 resides in domain A2) and the virtual machine was running at the same site where the volume remains online (aka the preferred site) then there is no interruption to service.

◆ If the static bias for a given distributed volume was set to Cluster 1 detaches (assuming Cluster 1 resides in domain A2) and the virtual machine was running at the remote site (Domain B1) then the virtual machine’s storage will be in the suspended state. Most guest operating systems will fail in this case, allowing the


120


virtual machine to be restarted in domain A1 after a small amount of disruption. However, it is possible with vSphere 4.0/4.1 that the guest OS will simply hang and VMware HA will not be prompted to restart it.

Note: Though it is beyond the scope of this TechBook, to avoid any disruption, VMware DRS host affinity rules can be used to ensure that virtual machines are always running in their preferred location – the location that the storage they rely on is biased towards.

Figure 57 shows the inter-cluster link has failed between domains A2 and B2 In this instance the static bias rule set which was defined as Cluster 1 detaches previously will be invoked since neither VPLEX cluster can communicate with the other VPLEX cluster (but the VPLEX Witness Server can communicate with both VPLEX clusters) therefore access to the given distributed volume within the domains B2 will be suspended for the given distributed volume whilst remaining active at A2.

Therefore, virtual machines that were running at A1 will be uninterrupted and virtual machine’s that were running at B1 will be restarted at A1.



Figure 57 Metro HA high-level diagram with intersite failure

The remaining failure scenarios with this solution are identical to the previously discussed VPLEX Metro HA Cross-Cluster Connect solutions. For failure handling in domains A1, B1, A3, B3 or C, see “VPLEX Metro HA with Cross-Cluster Connect” on page 107.


122



7

This chapter provides a VPLEX conclusion:

◆ Conclusion ........................................................................................ 124

Conclusion

Conclusion 123

124

Conclusion

ConclusionAs outlined in this book, using VPLEX AccessAnywhereTM technology in combination with High Availability and VPLEX Witness, storage administrators and data center managers will be able to provide absolute physical and logical high availability for their organizations’ mission critical applications with less resource overhead and dependency on manual intervention. Increasingly, those mission critical applications are virtualized and in most cases using VMware vSphere or Microsoft Hyper-V “virtual machine” technologies. It is expected that VPLEX customers use the HA / VPLEX Witness solution to incorporate several application-specific clustering and virtualization technologies to provide HA benefits for targeted mission critical applications.

As described, the storage administrator is provided with two specific VPLEX Metro-based solutions around High Availability as outlined specifically for VMware ESX 4.1 or higher as integrated into the VPLEX Metro HA Cross-Cluster Connect and Metro environments. VPLEX Metro HA Cross-Cluster Connect provides a slightly higher level of HA than the VPLEX Metro HA deployment without Cross-Cluster connectivity however it is limited to in-data center use or cases where the network latency between data centers is negligible.

Both solutions are ideal for customers who are not only currently or planning on becoming highly virtualized but are looking for the following:

◆ Elimination of the “night shift” storage and server administrator positions. To accomplish this, they must be comfortable that their applications will ride through any failures that happen during the night.

◆ Reduction of capital expenditures by moving from an active/passive data center replication model to a fully active highly available data center model.

◆ Increase application availability by protecting against flood and fire disasters that could affect their entire data center.

From a holistic view of both types of solutions and what it provides the storage administrator, the following benefits are in common with variances. What EMC VPLEX technology with Witness provides to consumers are as follows:


Conclusion

Better protection from storage-related failuresWithin a data center, applications are typically protected against storage-related failures through the use of multipathing software such as EMC PowerPath™. This allows applications to ride through HBA failures, switch failures, cable failures, or storage array controller failures by routing I/O around the location of the failure. The VPLEX Metro HA Cross-Cluster Connect solution extends this protection to the rack and/or data center level by multipathing between VPLEX clusters in independent failure domains. The VPLEX Metro HA solution adds to this the ability to restart the application in the other data center in case no alternative route for the I/O exists in its current data center. As an example, if a fire where to affect an entire VPLEX rack, the application could be restarted in the backup data center automatically.This provides customers a much higher level of availability and lower level of risk.

Protection from a larger array of possible failuresTo highlight advantages of VPLEX Witness functionality, let’s recall how VMware HA operates.

VMware HA and other offerings provides automatic restart of virtual machines (applications) in the event of virtual machine failure for any reason (server failure, failed connection to storage, etc.). This restart involves a complete boot-up of the virtual machine’s guest operating system and applications. While VM failure leads to an outage, the recovery from that failure is usually automatic.

VMware FT (Fault Tolerance) provides an additional level of protection by maintaining a “shadow VM” that matches the precise state of the primary VM. If the primary VM should fail, the shadow VM can take over immediately and without any significant disruption to the application.VMware HA, on its own, provides protection from server failures within a data center.

When combined with VPLEX in the Metro HA configuration, it provides the same level of protection for data center scale disaster scenarios.

Conclusion 125

126

Conclusion

Greater overall resource utilizationUsing the same point of view of server virtualization based products and their recovery capabilities, turning over to utlization, VMware DRS (Distributed Resource Scheduler) can automatically move applications between servers in order to balance their computational and memory load over all the available servers. Within a data center, this has increased server utilization because administrators no longer need to size individual servers to the applications that will run on them. Instead, they can size the entire data center to the suite of applications that will run within it.

By adding HA configuration (Metro and Campus), the available pool of server resources now covers both the primary and backup data centers. Both can actively be used and excess compute capacity in one data center can be used to satisfy new demands in the other.

Alternative Vendor Solutions:

◆ Microsoft Hyper-V Server 2008 R2 with Performance and Resource Optimization (PRO)

Overall, as data centers continue their expected growth patterns and storage administrators struggle to expand capacity and consolidate at the same time, by introducing EMC VPLEX they can reduce several areas of concern. To recap, these areas are:

◆ Hardware and component failures impacting data consistency

◆ System integrity

◆ High availability without manual intervention

◆ Witness to protect the entire highly available system

In reality, by reducing inter-site overhead and dependencies on disaster recovery, administrators can depend on VPLEX to guarantee that their data is available at anytime while the beepers and cell phones are silenced.


Glossary

This glossary contains terms related to VPLEX federated storage systems. Many of these terms are used in this manual.

AAccessAnywhere The breakthrough technology that enables VPLEX clusters to provide

access to information between clusters that are separated by distance.

active/active A cluster with no primary or standby servers, because all servers can run applications and interchangeably act as backup for one another.

active/passive A powered component that is ready to operate upon the failure of a primary component.

array A collection of disk drives where user data and parity data may be stored. Devices can consist of some or all of the drives within an array.

asynchronous Describes objects or events that are not coordinated in time. A process operates independently of other processes, being initiated and left for another task before being acknowledged.

For example, a host writes data to the blades and then begins other work while the data is transferred to a local disk and across the WAN asynchronously. See also ”synchronous.”


128

Glossary

Bbandwidth The range of transmission frequencies a network can accommodate,

expressed as the difference between the highest and lowest frequencies of a transmission cycle. High bandwidth allows fast or high-volume transmissions.

bias When a cluster has the bias for a given DR1 it will remain online if connectivity is lost to the remote cluster (in some cases this may get over ruled by VPLEX Cluster Witness)

bit A unit of information that has a binary digit value of either 0 or 1.

block The smallest amount of data that can be transferred following SCSI standards, which is traditionally 512 bytes. Virtual volumes are presented to users as a contiguous lists of blocks.

block size The actual size of a block on a device.

byte Memory space used to store eight bits of data.

Ccache Temporary storage for recent writes and recently accessed data. Disk

data is read through the cache so that subsequent read references are found in the cache.

cache coherency Managing the cache so data is not lost, corrupted, or overwritten. With multiple processors, data blocks may have several copies, one in the main memory and one in each of the cache memories. Cache coherency propagates the blocks of multiple users throughout the system in a timely fashion, ensuring the data blocks do not have inconsistent versions in the different processors caches.

cluster Two or more VPLEX directors forming a single fault-tolerant cluster, deployed as one to four engines.

cluster ID The identifier for each cluster in a multi-cluster deployment. The ID is assigned during installation.

cluster deployment ID A numerical cluster identifier, unique within a VPLEX cluster. By default, VPLEX clusters have a cluster deployment ID of 1. For multi-cluster deployments, all but one cluster must be reconfigured to have different cluster deployment IDs.


Glossary

clustering Using two or more computers to function together as a single entity. Benefits include fault tolerance and load balancing, which increases reliability and up time.

COM The intra-cluster communication (Fibre Channel). The communication used for cache coherency and replication traffic.

command lineinterface (CLI)

A way to interact with a computer operating system or software by typing commands to perform specific tasks.

continuity ofoperations (COOP)

The goal of establishing policies and procedures to be used during an emergency, including the ability to process, store, and transmit data before and after.

controller A device that controls the transfer of data to and from a computer and a peripheral device.

Ddata sharing The ability to share access to the same data with multiple servers

regardless of time and location.

detach rule A rule set applied to a DR1 to declare a winning and a losing cluster in the event of a failure.

device A combination of one or more extents to which you add specific RAID properties. Devices use storage from one cluster only; distributed devices use storage from both clusters in a multi-cluster plex. See also ”distributed device.”

director A CPU module that runs GeoSynchrony, the core VPLEX software. There are two directors in each engine, and each has dedicated resources and is capable of functioning independently.

dirty data The write-specific data stored in the cache memory that has yet to be written to disk.

disaster recovery (DR) The ability to restart system operations after an error, preventing data loss.

disk cache A section of RAM that provides cache between the disk and the CPU. RAMs access time is significantly faster than disk access time; therefore, a disk-caching program enables the computer to operate faster by placing recently accessed data in the disk cache.


130

Glossary

distributed device A RAID 1 device whose mirrors are in Geographically separate locations.

distributed file system(DFS)

Supports the sharing of files and resources in the form of persistent storage over a network.

Distributed RAID1device (DR1)

A cache coherent VPLEX Metro or Geo volume that is distributed between two VPLEX Clusters

Eengine Enclosure that contains two directors, management modules, and

redundant power.

Ethernet A Local Area Network (LAN) protocol. Ethernet uses a bus topology, meaning all devices are connected to a central cable, and supports data transfer rates of between 10 megabits per second and 10 gigabits per second. For example, 100 Base-T supports data transfer rates of 100 Mb/s.

event A log message that results from a significant action initiated by a user or the system.

extent A slice (range of blocks) of a storage volume.

Ffailover Automatically switching to a redundant or standby device, system,

or data path upon the failure or abnormal termination of the currently active device, system, or data path.

fault domain A concept where each component of a HA solution is separated by a logical or physical boundary so if a fault happens in one domain it will not transfer to the other. The boundary can represent any item which could fail (i.e. a separate power domain would mean that is power would remain in the second domain if it failed in the first domain).

fault tolerance Ability of a system to keep working in the event of hardware or software failure, usually achieved by duplicating key system components.


Glossary

Fibre Channel (FC) A protocol for transmitting data between computer devices. Longer distance requires the use of optical fiber; however, FC also works using coaxial cable and ordinary telephone twisted pair media. Fibre channel offers point-to-point, switched, and loop interfaces. Used within a SAN to carry SCSI traffic.

field replaceable unit(FRU)

A unit or component of a system that can be replaced on site as opposed to returning the system to the manufacturer for repair.

firmware Software that is loaded on and runs from the flash ROM on the VPLEX directors.

GGeographically

distributed systemA system physically distributed across two or more Geographically separated sites. The degree of distribution can vary widely, from different locations on a campus or in a city to different continents.

Geoplex A DR1 device configured for VPLEX Geo

gigabit (Gb or Gbit) 1,073,741,824 (2^30) bits. Often rounded to 10^9.

gigabit Ethernet The version of Ethernet that supports data transfer rates of 1 Gigabit per second.

gigabyte (GB) 1,073,741,824 (2^30) bytes. Often rounded to 10^9.

global file system(GFS)

A shared-storage cluster or distributed file system.

Hhost bus adapter

(HBA)An I/O adapter that manages the transfer of information between the host computers bus and memory system. The adapter performs many low-level interface functions automatically or with minimal processor involvement to minimize the impact on the host processors performance.

Iinput/output (I/O) Any operation, program, or device that transfers data to or from a

computer.

internet Fibre Channelprotocol (iFCP)

Connects Fibre Channel storage devices to SANs or the Internet in Geographically distributed systems using TCP.


132

Glossary

intranet A network operating like the World Wide Web but with access restricted to a limited group of authorized users.

internet smallcomputer system

interface (iSCSI)

A protocol that allows commands to travel through IP networks, which carries data from storage units to servers anywhere in a computer network.

I/O (input/output) The transfer of data to or from a computer.

Kkilobit (Kb) 1,024 (2^10) bits. Often rounded to 10^3.

kilobyte (K or KB) 1,024 (2^10) bytes. Often rounded to 10^3.

Llatency Amount of time it requires to fulfill an I/O request.

load balancing Distributing the processing and communications activity evenly across a system or network so no single device is overwhelmed. Load balancing is especially important when the number of I/O requests issued is unpredictable.

local area network(LAN)

A group of computers and associated devices that share a common communications line and typically share the resources of a single processor or server within a small Geographic area.

logical unit number(LUN)

Used to identify SCSI devices, such as external hard drives, connected to a computer. Each device is assigned a LUN number which serves as the device's unique address.

Mmegabit (Mb) 1,048,576 (2^20) bits. Often rounded to 10^6.

megabyte (MB) 1,048,576 (2^20) bytes. Often rounded to 10^6.

metadata Data about data, such as data quality, content, and condition.

metavolume A storage volume used by the system that contains the metadata for all the virtual volumes managed by the system. There is one metadata storage volume per cluster.


Glossary

Metro-Plex Two VPLEX Metro clusters connected within metro (synchronous) distances, approximately 60 miles or 100 kilometers.

metroplex A DR1 device configured for VPLEX Metro

mirroring The writing of data to two or more disks simultaneously. If one of the disk drives fails, the system can instantly switch to one of the other disks without losing data or service. RAID 1 provides mirroring.

miss An operation where the cache is searched but does not contain the data, so the data instead must be accessed from disk.

Nnamespace A set of names recognized by a file system in which all names are

unique.

network System of computers, terminals, and databases connected by communication lines.

network architecture Design of a network, including hardware, software, method of connection, and the protocol used.

network-attachedstorage (NAS)

Storage elements connected directly to a network.

network partition When one site loses contact or communication with another site.

Pparity The even or odd number of 0s and 1s in binary code.

parity checking Checking for errors in binary data. Depending on whether the byte has an even or odd number of bits, an extra 0 or 1 bit, called a parity bit, is added to each byte in a transmission. The sender and receiver agree on odd parity, even parity, or no parity. If they agree on even parity, a parity bit is added that makes each byte even. If they agree on odd parity, a parity bit is added that makes each byte odd. If the data is transmitted incorrectly, the change in parity will reveal the error.

partition A subdivision of a physical or virtual disk, which is a logical entity only visible to the end user, not any of the devices.


134

Glossary

plex A VPLEX single cluster.

RRAID The use of two or more storage volumes to provide better

performance, error recovery, and fault tolerance.

RAID 0 A performance-orientated striped or dispersed data mapping technique. Uniformly sized blocks of storage are assigned in regular sequence to all of the arrays disks. Provides high I/O performance at low inherent cost. No additional disks are required. The advantages of RAID 0 are a very simple design and an ease of implementation.

RAID 1 Also called mirroring, this has been used longer than any other form of RAID. It remains popular because of simplicity and a high level of data availability. A mirrored array consists of two or more disks. Each disk in a mirrored array holds an identical image of the user data. RAID 1 has no striping. Read performance is improved since either disk can be read at the same time. Write performance is lower than single disk storage. Writes must be performed on all disks, or mirrors, in the RAID 1. RAID 1 provides very good data reliability for read-intensive applications.

RAID leg A copy of data, called a mirror, that is located at a user's current location.

rebuild The process of reconstructing data onto a spare or replacement drive after a drive failure. Data is reconstructed from the data on the surviving disks, assuming mirroring has been employed.

redundancy The duplication of hardware and software components. In a redundant system, if a component fails then a redundant component takes over, allowing operations to continue without interruption.

reliability The ability of a system to recover lost data.

remote directmemory access

(RDMA)

Allows computers within a network to exchange data using their main memories and without using the processor, cache, or operating system of either computer.

Recovery PointObjective (RPO)

The amount of data that can be lost before a given failure event.


Glossary

Recovery TimeObjective (RTO)

The amount of time the service takes to fully recover after a failure event.

Sscalability Ability to easily change a system in size or configuration to suit

changing conditions, to grow with your needs.

simple networkmanagement

protocol (SNMP)

Monitors systems and devices in a network.

site ID The identifier for each cluster in a multi-cluster plex. By default, in a non-Geographically distributed system the ID is 0. In a Geographically distributed system, one clusters ID is 1, the next is 2, and so on, each number identifying a physically separate cluster. These identifiers are assigned during installation.

small computersystem interface

(SCSI)

A set of evolving ANSI standard electronic interfaces that allow personal computers to communicate faster and more flexibly than previous interfaces with peripheral hardware such as disk drives, tape drives, CD-ROM drives, printers, and scanners.

split brain Condition when a partitioned DR1 accepts writes from both clusters.

storage RTO The amount of time taken for the storage to be available after a failure event (In all cases this will be a smaller time interval than the RTO since the storage is a pre-requisite).

stripe depth The number of blocks of data stored contiguously on each storage volume in a RAID 0 device.

striping A technique for spreading data over multiple disk drives. Disk striping can speed up operations that retrieve data from disk storage. Data is divided into units and distributed across the available disks. RAID 0 provides disk striping.

storage area network(SAN)

A high-speed special purpose network or subnetwork that interconnects different kinds of data storage devices with associated data servers on behalf of a larger network of users.

storage view A combination of registered initiators (hosts), front-end ports, and virtual volumes, used to control a hosts access to storage.


136

Glossary

storage volume A LUN exported from an array.

synchronous Describes objects or events that are coordinated in time. A process is initiated and must be completed before another task is allowed to begin.

For example, in banking two withdrawals from a checking account that are started at the same time must not overlap; therefore, they are processed synchronously. See also ”asynchronous.”

Tthroughput 1. The number of bits, characters, or blocks passing through a data

communication system or portion of that system.

2. The maximum capacity of a communications channel or system.

3. A measure of the amount of work performed by a system over a period of time. For example, the number of I/Os per day.

tool commandlanguage (TCL)

A scripting language often used for rapid prototypes and scripted applications.

transmission controlprotocol/Internetprotocol (TCP/IP)

The basic communication language or protocol used for traffic on a private network and the Internet.

Uuninterruptible power

supply (UPS)A power supply that includes a battery to maintain power in the event of a power failure.

universal uniqueidentifier (UUID)

A 64-bit number used to uniquely identify each VPLEX director. This number is based on the hardware serial number assigned to each director.

Vvirtualization A layer of abstraction implemented in software that servers use to

divide available physical storage into storage volumes or virtual volumes.

virtual volume A virtual volume looks like a contiguous volume, but can be distributed over two or more storage volumes. Virtual volumes are presented to hosts.


Glossary

VPLEX Cluster Witness A new feature in VPLEX V5.x that can augment and improve upon the failure handling semantics of Static Bias.

Wwide area network

(WAN)A Geographically dispersed telecommunications network. This term distinguishes a broader telecommunication structure from a local area network (LAN).

world wide name(WWN)

A specific Fibre Channel Name Identifier that is unique worldwide and represented by a 64-bit unsigned binary value.

write-through mode A caching technique in which the completion of a write request is communicated only after data is written to disk. This is almost equivalent to non-cached systems, but with data protection.


138

Glossary


EMC VPLEX Metro Witness Technology and High Availability · 37 VPLEX Witness volume types and rule...

Documents

Transcript of EMC VPLEX Metro Witness Technology and High Availability · 37 VPLEX Witness volume types and rule...