VPLEX Architecture and Design

VPLEX Architecture and Design

Copyright © 2010 EMC Corporation. Do not Copy ‐ All Rights Reserved. 1

© 2010 EMC Corporation. All rights reserved.These materials may not be copied without EMC’s written consent.

Support: Education Services

EMC VPLEX Architecture and Design

April 2010April 2010

Welcome to EMC VPLEX Architecture and Design. Click the play button in the lower right hand corner of this screen to continue.

Copyright © 2010 EMC Corporation. All rights reserved.

These materials may not be copied without EMC's written consent.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

EMC² , EMC, EMC ControlCenter, AdvantEdge, AlphaStor, ApplicationXtender, Avamar, Captiva, Catalog Solution, Celerra, Centera, CentraStar, ClaimPack, ClaimsEditor, ClaimsEditor, Professional, CLARalert, CLARiiON, ClientPak, CodeLink, Connectrix, Co‐StandbyServer, Dantz, Direct Matrix Architecture, DiskXtender, DiskXtender 2000, Document Sciences, Documentum, EmailXaminer, EmailXtender, EmailXtract, enVision, eRoom, Event Explorer, FLARE, FormWare, HighRoad, InputAccel,InputAccelExpress, Invista, ISIS, Max Retriever, Navisphere, NetWorker, nLayers, OpenScale, PixTools, Powerlink, PowerPath, Rainfinity, RepliStor, ResourcePak, Retrospect, RSA, RSA Secured, RSA Security, SecurID, SecurWorld, Smarts, SnapShotServer, SnapView/IP, SRDF, Symmetrix, TimeFinder, VisualSAN, VSAM‐Assist, WebXtender, where information lives, xPression, xPresso, Xtender, Xtender Solutions; and EMC OnCourse, EMC Proven, EMC Snap, EMC Storage Administrator, Acartus, Access Logix, ArchiveXtender, Authentic Problems, Automated Resource Manager, AutoStart, AutoSwap, AVALONidm, C‐Clip, Celerra Replicator, CLARevent, Codebook Correlation Technology, Common Information Model, CopyCross, CopyPoint, DatabaseXtender, Digital Mailroom, Direct Matrix, EDM, E‐Lab, eInput, Enginuity, FarPoint, FirstPass, Fortress, Global File Virtualization, Graphic Visualization, InfoMover, Infoscape, MediaStor, MirrorView, Mozy, MozyEnterprise, MozyHome, MozyPro, NetWin, OnAlert, PowerSnap, QuickScan, RepliCare, SafeLine, SAN Advisor, SAN Copy, SAN Manager, SDMS, SnapImage, SnapSure, SnapView, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix DMX, UltraFlex, UltraPoint, UltraScale, Viewlets, VisualSRM are trademarks of EMC Corporation.

All other trademarks used herein are the property of their respective owners.



© 2010 EMC Corporation. All rights reserved. VPLEX Architecture and Design 2

Course OverviewCourse Overview

This course is intended for audiences who are presently or planning to be engaged in positioning VPLEX, and performing VPLEX solutions design.

Audience

Upon successful completion of this course, you should be able to:

• Describe VPLEX system architecture and configuration options

• Position solutions utilizing VPLEX, and describe their benefits to the customer

• Describe key VPLEX features, how they can be effectively used, and high‐level tasks for implementing them

• Explain how VPLEX can be integrated into your customer’s production environment

• Perform planning and design for VPLEX deployment

Objectives

EMC believes the information in this course is accurate as of its publication date. It is based on pre‐GA product information, which is subject to change without notice. For the most current information, see the EMC Support Matrix and product release notes in Powerlink.

This course provides detailed coverage of VPLEX in typical data center environments. It comprehensively addresses product architecture, host‐to‐virtual‐storage implementation, system environment sizing, management and monitoring of VPLEX environments.

Description

This course provides an introduction to EMC VPLEX. It describes VPLEX system architecture, key features, and recommended implementations.

This training provides familiarity with major VPLEX solutions design concerns. It also includes a high‐level view of implementation tasks related to specific VPLEX features, functionality and management.




Course ModulesCourse Modules

Module 4: Planning and Design Considerations

Module 2: Architecture‐ Physical and Logical Components

Module 3: VPLEX Functionality and Management

Module 1: VPLEX Technology and Positioning

This eLearning course is structured into four modules:

Module 1 briefly covers EMC’s vision on block storage virtualization, and how VPLEX is being positioned.

Module 2 discusses the underlying technology and architecture.

Module 3 covers the major features and capabilities available in the current release.

Module 4 addresses the significant planning and design considerations relevant to VPLEX deployment.



© 2010 EMC Corporation. All rights reserved. Module 1: VPLEX Technology and Positioning 4

Module 1: VPLEX Technology and PositioningModule 1: VPLEX Technology and Positioning

Upon successful completion of this module, you should be able to:

• Articulate how VPLEX can enable EMC’s vision of journey to the private cloud

• Describe VPLEX local and distributed federation

• Provide a high‐level system view of VPLEX Local and VPLEX Metro

• Describe typical scenarios where VPLEX technology can be effectively applied

This module introduces fundamental concepts relevant to VPLEX technology, local federation and distributed federation.

The introductory module briefly outlines EMC’s vision on block storage virtualization, and positions VPLEX enabled solutions within the broader context of that vision.




Journey to the Private CloudJourney to the Private Cloud

Reduce CapEx & OpExLeverage efficiency technologies

Optimize Service LevelsTier and consolidate

InformationInfrastructure

Transitioning to Private Cloud

Deliver “Always On”24 x forever availability

Manage at ScaleSimplify and automate

When EMC thinks of the Private Cloud, its describing a strategy for your infrastructure that enables optimized resource use. This means you're optimized for energy, power and cost savings. You can scale up and out simply and apply automated policies, and you can guarantee greater availability and access for your production environment ‐ significantly reducing or eliminating downtime.




Efficient

Secure

Always‐on

Automated

On‐Demand

Integrated

FAST + Federation + Storage Virtualization

PhysicalStorage

EMC Vision: Virtual StorageEMC Vision: Virtual Storage

24 x forever – run applications without restart. Ever!

Capabilities that free information from physical storage

Move thousands of VM’s over thousands miles

Batch process in low cost energy locations

Dynamic workload balancing and relocation

Aggregate big data centers from separate ones

For years, users have relied on “physical storage” to meet their information needs. New and evolving changes, such as virtualization and the adoption of Private Cloud computing, have placed new demands on how storage and information is managed.

To meet these new requirements, storage must evolve to deliver capabilities that free information from a physical element to a virtualized resource that is fully automated, integrated within the infrastructure, consumed on demand, cost effective and efficient, always on and secure. The technology enablers needed to deliver this to combine unique EMC capabilities such as FAST, Federation, and storage virtualization.

The result is a next generation Private Cloud infrastructure that allows users to:

Move thousands of VMs over thousands of miles

Batch process in low cost energy locations

Enable boundary‐less workload balancing and relocation

Aggregate big data centers

Deliver “24 x forever” – and run or recover applications without ever having to restart.




EMC VPLEX ArchitectureEMC VPLEX Architecture

Scale Out Cluster Architecture Start small and grow big with predictable service levels

Advanced Data CachingImprove I/O performance and reduce storage array contention

Distributed Cache CoherenceAutomatic sharing, balancing and failover of storage domains within and across VPLEX Engines

Local & DistributedFederationNext Generation Data

Mobility and Access

AvailableApril 2010

Access Anywhere

EMC and Non‐EMC Arrays EMC and Non‐EMC Arrays

EMC VPLEX is a next generation architecture for data mobility and information access.

It is based on unique technology that combines scale out clustering and advanced data caching, with the unique distributed cache coherence intelligence to deliver radically new and improved approaches to storage management.

This architecture allows data to be accessed and shared between locations over distance via a distributed federation of storage resources.

The first products being introduced based on this architecture include configurations that support local and metro environments, with additional products planned for future releases.

7




EMC VPLEX CapabilitiesEMC VPLEX Capabilities

Storage VirtualizationLocal Federation Distributed Federation

Access Anywhere

EMC and non EMC ArraysEMC and non EMC Arrays

Streamline storage refreshes, consolidations and migrations

Simplify multi-array allocation, management, and provisioning

Pool storage capacity to extend useful life for N-1 storage assets

within, across, and between Data Centers over distance

and enable information to be “access anywhere”

and provide “just in time”storage services via scale out

Streamline storage refreshes, consolidations and migrations

Simplify multi‐array allocation, management, and provisioning

Pool storage capacity to extend useful life for N‐1 storage assets

Distributed federation builds on traditional virtualization by adding the ability to transparently move and migrate data within and across data centers. This simplifies multi‐array storage management and multi‐site information access, as well as allows capacity to be pooled and efficiently scaled on demand.

8




VPLEX Local: OverviewVPLEX Local: Overview

• Simplify provisioning and volume management

Centralize management of block storage in the data center

Simplify storage provisioning, management and monitoring

Physical storage needs to be provisioned just once ‐ to the virtualization layer

• Non‐disruptive data mobility

Optimize performance, redistribute and balance work loads among arrays

• Workload resiliency

Improve reliability, scale out performance

• Storage pooling

Manage available capacity across multiple frames based on SLAs

VPLEX Local (Single Cluster)

Around 2003, storage virtualization was introduced as a viable solution. The primary value proposition of storage virtualization was moving data non‐disruptively. Customers looked to this technology for transparent tiering, moving back‐end storage data without having to disrupt hosts, simplified operations over multiple frames, as well as ongoing data moves for tech refreshes and lease rollovers.

Customers required tools that enabled storage moves to be made without forcing interaction, and working at the host and database administration levels. This concept of a virtualization controller was introduced and took its place in the market. While EMC released its own version of this with the Invista split‐path architecture, we also continued development on both Symmetrix and CLARiiON to integrate multiple tiers of storage within a single array. Today, we offer Flash, Fibre Channel and SATA within EMC arrays, and a very transparent method of moving data across different storage types and tiers with our virtual LUN capability. We found that providing both choices for customers allowed our products to meet a wider set of challenges than if we only offered just one of the two options.

The challenges addressed by traditional storage virtualization – which can be broadly categorized as simplified storage management ‐ still exist today. VPLEX local federation can solve this class of problems within the context of a single data center.

However, we’ve also seen these data center issues evolve. Newer, different problems have emerged that require new solutions – as we’ll see next, when we discuss distributed federation.




VPLEX Metro: OverviewVPLEX Metro: Overview

• AccessAnywhere: Block storage access within, between and across data centers

• Within synchronous distances Approximately 60 miles or

100 Kilometers

• Connect two VPLEX storage clusters together over distance

• Enables virtual volumes to be shared by both clusters Provides unique distributed cache

coherency for all reads and writes

• Both clusters maintain the same identity for a volume, and preserve the same SCSI state for the logical unit

• Enables VMware VMotion over distance

Cluster‐1/Site A Cluster‐2/Site B

VPLEX Metro (Two Clusters)

With VPLEX distributed federation, it becomes possible to configure shared volumes to hosts that are in different sites or failure domains. This enables a new set of solutions that can be implemented over synchronous distances, where earlier these solutions could reside only within a single data center. VMware VMotion over distance is a prime example of such solutions.

Another key technology that enables AccessAnywhere is remote access. This makes it possible for block storage to be accessed as though it were local, even though it is remote.




SharePoint 2007

Webfront end

Excel

Symmetrix CLARiiON Third‐Party

SAN

Domain 2/Site 2

Windows 2008 ServerMS Exchange

Mail_4File and print

server

VMFS Volume

Planned Events

Mail _3 Mail _2 Mail_1 Mail _2 Mail _1

Domain 1/Site 1

Example: Current ‐Workload Relocation within SitesExample: Current ‐Workload Relocation within Sites

Challenges:

• Uneven resource utilization across sites

• Planned events requiring shutdown

VMotion

SQL Server 2008

SQL 01


MS Exchange MS Exchange

SQL 02Webfront end

SharePoint 2007

Web front end

Excel

SAN

VMFS Volume

Synchronous Distance100 Kms

This typical scenario deals with a dual‐site environment with virtualized Microsoft application servers at each site.VMotion can currently leverage shared SAN storage to move VMs across ESX servers within each site.

However, the customer is now looking to expand the scope of VMotion beyond site boundaries to further improve resource utilization, and to handle planned events that may affect an entire site.




SharePoint 2007

Webfront end

Excel


SAN

Windows 2008 ServerMS Exchange

VMFS Volume

Domain 2/Site 2

Mail_4File and print

server

SQL 01

MS Exchange MS Exchange

Mail _3 Mail _2 Mail_1

SQL 02Webfront end

Web front end

Excel

SAN

Domain 1/Site 1

Proposed: VMotion Over Distance with VPLEX Proposed: VMotion Over Distance with VPLEX

Synchronous Distance100 Kms

FC MAN

Addressing the challenges:

• Distance VMotion: load‐balanceacross sites

• Planned site‐wide events: moveapplications pro‐actively to the other site

Distance VMotion

SQL Server 2008


SharePoint 2007

VMFS Volume

VMFSvolume on distributed device

Mail_3 Mail_2 Mail_1

Other potential benefits:

• Disaster avoidance

• Improved infrastructure availabilityand performance

• Power/energy savings by moving VMsacross sites

The proposed solution can accomplish this as follows.

It involves a VPLEX Metro spanning sites, with the application VMs using shared data stores built on VPLEX distributed devices.

This enables non‐disruptive distance VMotion across sites,

thereby addressing the customer’s primary challenges.

Distance VMotion opens up other possibilities for this customer, as listed here.




VPLEX Local: Single ClusterVPLEX Local: Single Cluster

• 1 to 4 Virtualization Engines per rack

• Up to 8,000 total Virtual Devices per cluster

• N+1 performance scaling

• Cache write‐through to preserve array functionalitySupported User Environments at General Availability

Host Platforms ESX, Windows, Solaris, AIX, HP‐UX, Linux

Multi‐Pathing PowerPath, VMware NMP

Volume Managers VxVM, AIX LVM, HPQ HP LVM

Arrays (at GA) VMAX, DMX, CLARiiON, HDS 99X0, USP‐V, USP‐VM

SAN Fabrics Brocade, McData and CiscoPower Supply

8 port FC SW

8 port FC SW

Power Supply

Power Supply

Power Supply

Switch UPSSwitch UPS

Management Server

Shown is a summary of the key characteristics of a VPLEX Local or single cluster configuration.

Among our key value propositions: you can start small and scale up, you can have centralized management, as well as predictable performance and availability.

The engines are arranged in a true cluster, which means I/O that enters the cluster from anywhere can be serviced from anywhere.

The engines are arranged in an N+1 configuration – which means that as you add more engines, you increase the memory, ports and performance of the total cluster. The cluster can withstand the failure of any device, and any component. The cluster will continue to operate and provide storage services as long as just once device survives. You get transparent mobility across heterogeneous arrays. If you have a need to extend these capabilities out over distance or across multiple failure domains within a single site, a VPLEX Metro configuration may be a more appropriate choice.




VPLEX Metro: Dual ClusterVPLEX Metro: Dual Cluster

Dual Cluster

Metro‐Plex

Power Supply

8 port FC SW

8 port FC SW

Power Supply

Power Supply

Power Supply


Management Server

Power Supply

8 port FC SW

8 port FC SW

Power Supply

Power Supply

Power Supply


Management Server

Up to 8 Virtualization Engines

16K (8K per cluster or shared) total Virtual Devices

Within or across Data Centers

Synchronous distance support

Here is a brief synopsis of VPLEX Metro configurations, limits and key capabilities.

As we saw with VPLEX Local, each single cluster can support 8000 backend Storage Volumes and 8000 Virtual Volumes, regardless of whether you specify 1, 2 or 4 engines. The number of engines influences the total number of FE/BE ports available, and thus scalability and obtainable performance relative to the number of hosts and storage array ports to be serviced. A VPLEX Metro Dual Cluster can support a total of 16000 front‐end and 16000 back‐end. However, when creating distributed RAID 1 Devices remember that you are consuming 2 devices, 1 from each cluster in the Metro, so if all devices are DR1s the limit is 8000 front‐end devices.

One view of a Metro‐Plex is each cluster servicing a different physical site, with up to 100 km between sites.

An equally useful alternate view is two joined clusters at a single site with shared LUNs between them. You may choose to implement these two clusters as two different targets within separate failure domains, for example, in the same data center.

At GA, VPLEX will support clustered host file systems including VMFS. With this deployment, multiple VMFS servers can read/write the same file system simultaneously, while individual virtual machine files are locked.We will also extend support over time to include: SUN Cluster, HP Cluster IBM Cluster and CXFS.

Currently there is a limitation for Stretch host clusters over distance: if one site fails, you need to perform a manual restart of the application on the failed site.



© 2010 EMC Corporation. All rights reserved. Module 2: Architecture - Physical and Logical Components 15

Module 2: Architecture ‐ Physical and Logical Components

Module 2: Architecture ‐ Physical and Logical Components


• Provide a comprehensive view of VPLEX Local and VPLEX Metro

• Describe VPLEX hardware and software architecture at a high level

This module describes physical and logical components comprising a VPLEX system, the currently‐available federation features, and their internal operation.

This module describes the physical components and logical components comprising a VPLEX system.




VirtualVol

VirtualVol

VirtualVol

Virtual Volumes

EMC and Non‐EMC Arrays

VPLEX Management Server

VPLEX ArchitectureVPLEX Architecture

VPLEX Directors

VPLEX Back‐end Ports

LCOM

VPLEX Front‐end Ports

Cluster‐1/Site A

VPLEX Engine

Hosts

VirtualVol

VirtualVol

VirtualVol

Virtual Volumes

EMC and Non‐EMC Arrays


VPLEX Directors

VPLEX Back‐end Ports

LCOM

VPLEX Front‐end Ports

Cluster‐2/Site B

Hosts

IP

FC MAN

Let's look at a typical production SAN environment, and how VPLEX fits and works within it.

The basic building block of a VPLEX system is the Engine. Multiple engines can be configured to form a single VPLEX cluster for scalability. Each Engine includes two High‐Availability Directors with front‐end and back‐end Fibre Channel ports for integration with the customer's fabrics. VPLEX does not rely on (or require) any particular fabric intelligence. The Director FE and BE ports show up as standard F‐ports on the fabrics. VPLEX technology can work equally well with Brocade or Cisco fabrics with no dependency on switching hardware or firmware. Directors within a cluster communicate with each other via redundant, private Fibre Channel links called LCOM links.

Each cluster includes a 1‐U Management Server with a public IP port for system management and administration over the customer’s network. The Management Server also has private, redundant IP network connections to each Director within the cluster.

VPLEX implementation fundamentally involves three tasks: presenting SAN volumes from back‐end arrays to VPLEX engines via each Director’s back‐end ports; packaging these into sets of VPLEX Virtual Volumes with the desired configurations and protection levels; and presenting Virtual Volumes to production hosts in the SAN via the VPLEX front‐end.

Currently a VPLEX system can support a maximum of two clusters. A dual‐cluster system is called a Metro‐Plex. For a dual‐cluster implementation, the two sites must be less than 100 km apart, with round‐trip latency of 5 msecs or less on the FC links. VPLEX clusters within a Metro‐Plex communicate via FC over the Directors’ FC‐MAN ports.

VPLEX implements a VPN tunnel between the Management Servers of the two clusters. This enables each Management Server to communicate with Directors in either cluster via the private IP networks. With this design, it’s possible to conveniently manage a Metro‐Plex from either of the two sites.




VPLEX Engine: CharacteristicsVPLEX Engine: Characteristics

Host & Array Ports

Core Core

Core Core

Core Core

Core Core

CPU Complex

8Gb/s Fibre Channel

Global Memory

Host & Array Ports

Core Core

Core Core

Core Core

Core Core

CPU Complex

8Gb/s Fibre Channel

Global Memory

• Dual HA Directors per engine

• GeoSynchrony software runs on each Director to provide VPLEX features and functionality

• 32‐ 8GB/s Fibre Channel FE/BE ports

For fabric connectivity to hosts and storage arrays

• Fibre Channel interconnect between Directors

• Intel multi‐core CPUs

• 64GB (raw) of cache memory

• Redundant power supplies

• Integrated battery backup

• Built in “Call Home” support

The engine itself it designed with a very highly available hardware architecture. It hosts two Directors with a total of 32 Fibre Channel ports, 16 FE and BE. All major engine components are redundant.

The engine is built for performance with a large cache, and has fully redundant power supplies, battery backups and EMC Call Home capabilities to align with our support best practices.




Distributed Cache CoherencyDistributed Cache Coherency

Cache Directory D Cache Directory F Cache Directory HCache Directory BCache Directory C Cache Directory E Cache Directory G

Cache Cache Cache Cache

Engine Cache Coherency Directory

Block Address 1 2 3 4 5 6 7 8 9 10 11 12 13 …

Cache A

Cache C

Cache E

Cache G

Engine Cache Coherency Directory

Block Address 1 2 3 4 5 6 7 8 9 10 11 12 13 …

Cache A

Cache C

Cache E

Cache G

Cache Directory A

New Write:Block 3

Read:Block 3

Host Host

The VPLEX environment is dynamic and uses a hierarchy to keep track of where I/Os go.

An I/O request can come from anywhere and will be serviced by any available engine in the VPLEX cluster. VPLEX abstracts the ownership model into a high level directory that's updated for every I/O, and shared across all engines. The directory uses a small amount of metadata, and tells all other engines in the cluster, in 4k blocks, which block of data is owned by which engine and at what time. The communication that actually occurs is much less than the 4k blocks that are actually being updated.

If a read request comes in, VPLEX automatically checks the directory for an owner. Once the owner is located, the read request goes directly to that engine.

Once a write is done and the table is modified, if another read request comes in from another engine, it checks the table and can then pull the read directly from that engine's cache. If it's still in cache, there is no need to go to the disk to satisfy the read. This model also enables VPLEX to stretch the cluster, as we can distribute this directory between clusters and therefore, between sites. The design has minimal overhead, is very efficient, and enables effective communication over distance.



VPLEX

19


Hardware Components: EngineHardware Components: Engine

Directors

•Front‐end ports provide active/active access to virtual volumes

•Process FibreChannel SCSI commands from hosts

VPLEX Engine Front

Director A

Director BVPLEX Engine Back

The two directors within a VPLEX engine are designated “A” and “B”. Director A is below Director B. Each director contains dual Intel Quad‐core CPUs that run at 2.4 GHz, 32 GB of read cache memory and a total of (16) 8 Gbps FC ports, 8 front‐end and 8 back‐end. Both directors are active during cluster operations.



VPLEX

20


Hardware Components: I/O ModulesHardware Components: I/O Modules

Front‐End Back‐End

COM COMGigEGigE

There are a total of 12 I/O modules in a VPLEX engine. 10 of these modules are Fibre Channel and 2 are GigE. The Fibre Channel ports can negotiate up to 8 Gbps. Four FC modules are dedicated for front‐end use and four for the back‐end. The two remaining FC modules are used for inter/intra cluster communication. The two GigE I/O modules are not utilized in this release of VPLEX.



VPLEX


Hardware Components: DAEHardware Components: DAE

Internal DAE behind screen

Internal DAE with screen removed

SSD Drive Carrier

VPLEX internal SSDs can be accessed from the front of a VPLEX system. Each director is assigned one SSD, and boots from it. SSDs reside within an SSD Drive Carrier behind the DAE screen. Each SSD Drive Carrier can hold two 2.5 inch SSDs. However, only one SSD is installed per drive carrier. Each SSD has a drive capacity of 30 GB.



VPLEX

22


Hardware Components: I/O Module CarrierHardware Components: I/O Module Carrier

I/O Module Carrier

A VPLEX engine contains two I/O Module carriers, one for Director A and one for Director B. The one on the right is for Director A and the one on the left for Director B. There are two I/O modules per carrier. The one that is shown in this picture contains a Fibre Channel module and a GigE module. As we just discussed, the Fibre Channel module is used for inter‐ and intra‐cluster communication within a VPLEX system.



VPLEX


Hardware Components: I/O Module Types Hardware Components: I/O Module Types

FC IOM

• 4 port 8 Gbps Fibre Channel IOM

• Used for FC COM and FC WAN connectivity with an I/O Module carrier

I/O Module Carrier

FC IOM

0 1 2 3

This is the FC I/O Module from an I/O Module Carrier which is used for inter‐ and intra‐ cluster communication. In this module, Ports 0 and 1 are used for local COM. Ports 2 and 3 are used for WAN COM between clusters in a Metro‐Plex. In medium and large configurations, FC I/O COM ports run at 4 Gbps. In terms of physical hardware, this FC I/O module is identical to the I/O modules used for front‐end and back‐end connectivity in the director slots.



VPLEX


Hardware Components: Management and PowerHardware Components: Management and Power

• Allows for daisy chain connection between engines within a cluster

• USB port unused

Power Supplies

Management Modules

Each engine contains two management modules and two power supplies. Each management module contains two serial ports and two Ethernet ports. The upper of the two serial ports is open, and can be utilized by EMC field personnel for BIOS and POST access. The lower serial port ships pre‐cabled. It is used to monitor the SPS and UPS. The Ethernet ports are used to connect to the Management Server and also to other Directors within the cluster, in a daisy‐chain fashion.



VPLEX


Hardware Components: VPLEX Management ServerHardware Components: VPLEX Management Server

Central Point of Management

The VPLEX Management Server is the central point of management for a VPLEX Local and VPLEX Metro system. It ships with a dual‐core Xeon processor, a 250 GB SATA near‐line drive and 4 GB of memory. The Management Server interfaces between the customer network and the VPLEX cluster. It isolates the VPLEX internal management networks from the customer LAN. It communicates with VPLEX firmware layers within the directors over the private IP connections. A Management server ships with each VPLEX cluster.

Note that the loss of a Management Server does not impact host I/O to VPLEX provided virtual storage. Within a Metro‐Plex there are two Management servers, “one for each cluster”. Both clusters can be controlled from either Management Server. A Metro‐Plex utilizes a secure management connection between the two Management Servers via VPN connection. A VPLEX cluster can be controlled through the Management Console which runs on the Management Server.

The Management Server also enables remote support via an ESRS Gateway. With this functionality in place, VPLEX is able to send Call Home events and system reports to the ESRS Gateway.



VPLEX


Hardware Components: Fibre Channel COM SwitchesHardware Components: Fibre Channel COM Switches

Connectrix DS‐300B: creates a redundant Fibre Channel network for COM

Connectrix DS‐300B switches are used for intra‐cluster communication in a VPLEX medium or large configuration. A pair of DS‐300B switches ship pre‐cabled, with medium or large configurations. These switches create redundant Fibre Channel networks for the internal LCOM connections. Each director has two independent LCOM paths to every other director. A VPLEX medium configuration uses 4 ports per switch and a VPLEX large configuration uses 8 ports per switch. 16 ports remain disabled, unused and unlicensed. Each port runs at 4 Gbps. The LCOM networks are completely private ‐ no customer connections are permitted on these switches. Each Connectrix DS‐300B utilizes an independent UPS.




VPLEX Local: Supported ConfigurationsVPLEX Local: Supported Configurations

SPS SPS

Management Server

Engine 1

Single Engine

SPS SPS

SPS SPS

UPS B

Management Server

Engine 1

Engine 2

UPS A

FC Switch B

FC Switch A

Dual Engine

SPS SPS

SPS SPS

SPS SPS

SPS SPS

UPS B

Management Server

Engine 1

Engine 2

Engine 3

Engine 4

UPS A

FC Switch A

FC Switch B

Quad Engine

All supported VPLEX configurations ship in a standard, single rack.

The shipped rack contains the selected number of engines, one Management Server, redundant Standby Power Supplies (SPSs) for each Engine and any other needed internal components. For the dual and quad configurations only, these include redundant internal FC switches for LCOM connection between the Directors. In addition, dual and quad configurations contain redundant Uninterruptible Power Supplies (UPSs) that service the FC switches and the Management Server.

The software is pre‐installed, the system is pre‐cabled, and also pre‐tested.

Engines are numbered 1‐4 from the bottom to the top. Any spare space in the shipped rack is to be preserved for potential engine upgrades in the future. The customer may not repurpose this space for unrelated uses. Since the engine number dictates its physical position in the rack, numbering will remain intact as engines get added during a cluster upgrade.




Configurations at a GlanceConfigurations at a Glance

256 GB128 GB64 GBCache

22NoneUninterruptible Power Supplies (UPS)

22NoneInternal FC switches (For LCOM)

111Management Servers

643216BE Fibre Channel ports

643216FE Fibre Channel ports

YesYesYesRedundant Engine SPSs

842Directors

Quad EngineDual EngineSingle Engine

Start Small and Transparently Scale Out Engines

This table provides a quick comparison of the three different VPLEX single cluster configurations available at GA.



VPLEX


VPLEX Management: IP InfrastructureVPLEX Management: IP Infrastructure

Management Server

EMC VPLEX Cluster

HTTPS or SSHManagement

Client

CustomerLAN

Director

Director

Director

Internal IP

Network

Internal IP

Network

Shown is a high‐level architectural view of single cluster management. The Management Server is the only VPLEX component that gets configured with a “public” IP on the customer network.

From the customer network, the Management Server can be accessed by a VPLEX storage administrator via an SSH session. Within the SSH session, the administrator can run a CLI utility, called VPlexcli, to manage all aspects of the cluster. A browser‐based GUI is also available.



VPLEX


VPLEX ManagementVPLEX Management

VPLEX Management Console (GUI)

VPlexcli (CLI)

VPLEX provides two ways of management, the “VPlexcli and the VPLEX Management Console.” The VPlexcli can be accessed via a telnet session to TCP port 49500 on the Management Server. The VPLEX Management Console is accessed by pointing a browser at the Management Server IP using the https protocol. Currently VPLEX CLI is the more mature interface providing complete support for all documented features and functionality. The management console has known limitations in some areas. For example, mobility operations can only be performed using CLI.

Every time the VPlexcli is accessed, it creates a session log in the /var/log/VPlex/cli/ directory. Logging in through the Management Console also creates a session file in /var/log/VPlex/cli. VPLEX Management Console

Via https session to the Management Server

Intuitive, easy‐to‐use interface for simplified storage management

Incorporates comprehensive online help




VPLEX Federation: ConstructsVPLEX Federation: Constructs

Extent

Dev

Extent

Dev

Storage Vol

Extent

Storage Vol

Storage Vol

Let’s examine the various types of managed storage objects within EMC VPLEX, their inter‐relationships, and how they relate to entities external to VPLEX – such as customer hosts and customer storage arrays.

Back‐end storage arrays are configured to present LUNs to VPLEX backend‐ports.

Each presented back‐end LUN maps to one VPLEX Storage Volume. Storage Volumes are initially in the “unclaimed” state. Unclaimed storage volumes may not be used for any purpose within VPLEX other than to create meta‐volumes, which are for system internal use only.

Once a Storage Volume has been claimed within VPLEX, it may be carved into one or more contiguous Extents. A single Extent may map to an entire Storage Volume; however, it cannot span multiple Storage Volumes.

A VPLEX Device is the entity enables RAID implementation across multiple storage arrays. VPLEX supports RAID‐0 for striping, RAID‐1 for mirroring, and RAID‐C for concatenation. The simplest possible device is a single RAID‐0 device comprising one extent, as shown here.

Shown next is a more complex device – for example a striped RAID‐0 device across two extents. Note that the underlying extents could even be from multiple backend storage arrays.




Storage View

VPLEX Federation: ConstructsVPLEX Federation: Constructs

Dev

Extent

Dev

Storage Vol

Extent

Storage Vol

Dev

Port

Port

VPLEX Front‐End Port

Initiator

Initiator

Host

VirtualVol

Top Level Device (TLD)

Extent

Storage Vol

Devices may be layered on top of other devices. For example, we could create a RAID‐1 mirrored device with two dissimilar mirror legs, as shown in this example. Only devices at the top‐level may have a front‐end SCSI personality and be presented to hosts. These are called Top Level Devices.

“Storage View” is the masking construct that controls how virtual storage is exposed through the front‐end. An operational Storage View is configured with three sets of entities as shown next.

First, any hosts that the Storage View must present storage to should have one or more initiator ports (HBAs) in the Storage View. Host initiators should be registered with one of several specifically recognized and supported host personality types within VPLEX, such as “default” which corresponds to most open systems hosts: Windows and Linux, HP‐UX, and VCS. A high‐availability host should have a minimum of two registered initiator ports each within its Storage View.

Second, one or more VPLEX front‐end ports needs to be configured as part of the Storage View. A typical high‐availability configuration would use a minimum of one front‐end port per fabric, each of them servicing a separate host initiator.

Third, a Virtual Volume that maps to the appropriate Top Level Device needs to be created and then configured as part of the Storage View.

Once a Storage View is properly configured as described and operational, the host should be able to detect and use Virtual Volumes after initiating a bus‐scan on its HBAs. Every front‐end path to a Virtual Volume is an active path, and the current version of VPLEX presents volumes with the product ID “Invista”. The host requires supported multi‐pathing software in a typical high‐availability implementation.



© 2010 EMC Corporation. All rights reserved. Module 3: VPLEX Functionality and Management 33



Upon successful completion of this module, you should be able to:• Describe local federation capabilities within a VPLEX cluster • Describe distributed federation capabilities in a Metro‐Plex• Explain the VPLEX internal data flow operations for host‐to‐storage I/O under various scenarios

• Describe key VPLEX administration and maintenance features

This module describes core VPLEX product functionality available at GA.

This module provides a detailed look at the core VPLEX capabilities that are available at GA.




Provisioning: Using the VPLEX Management ConsoleProvisioning: Using the VPLEX Management Console

Tasks

Provisioning Overview

Provision Storage

This is the home section of the EMC VPLEX Management Console. This is a good logical starting point for many VPLEX management operations.

On the right of the screen there are storage provisioning steps. These steps are also links that will redirect a person to the page to implement the step.

On the left of the screen there is a picture showing the task sequence to provision virtual volumes out of VPLEX. To the right of the Home button, there are two more links, “Provision Storage” and “Help” The Provision Storage link will take the user to an alternative page from which provisioning can be implemented. The Help link will take the user to the VPLEX Online Help page.




Brown‐field Implementation: EncapsulationBrown‐field Implementation: Encapsulation

• EMC VPLEX maintains physical separation of metadata from host data VPLEX metadata is stored separately on metadata volumes

Basis for simple data‐in‐place mobility

• High level steps: Present native array LUN with existing data to VPLEX back‐end

Claim the LUN as a storage volume from VPLEX

Create one extent consisting of the entire storage volume

Create a RAID‐0 device on the extent

Create a Virtual Volume on the device

Un‐provision native LUN from host

Present VPLEX Virtual Volume to host

One time disruption to host

Encapsulation: the process of converting existing production SANvolumes on hosts to VPLEX volumes, via “one‐for‐one” mapping

Encapsulation is basically “data‐in‐place” migration of existing production data into VPLEX, and thereforedoes not require any additional storage. Encapsulation is disruptive since you cannot simultaneously present storage both through VPLEX and directly from the storage array without risking data corruption, due to read‐caching at the VPLEX level.

You have to cut‐over from direct array access to VPLEX virtualized access. This implies a period where all paths to storage are unavailable to the application. With proper planning and execution, this downtime can be minimized. When PowerPath Migration Enabler (PPME) support is put in place, it can help eliminate any disruption.

An alternative migration strategy for existing production hosts is to perform host based replication from native‐array volumes to net‐new VPLEX volumes. This is non‐disruptive but requires additional storage. Host‐based copy also consumes cycles on the host, and may need to be planned in a live production environment.



TitleMonth Year

36


Fabric B

Fabric A

Array Storage Volumes found:

VPD83T3:600601606bb02500aab2affa35b5de11

VPD83T3:600601606bb025006a17a18d5bfade11

VPD83T3:600601606bb02500ba7b6b1c49fade11

Host Initiator Ports detected:

UNREGISTERED‐0x10000000c987422a

UNREGISTERED‐0x10000000c987422b

Virtual Volumes detected

Host

Encapsulation: Migrating a Host to VPLEXEncapsulation: Migrating a Host to VPLEX

This example illustrates the process of cutting over from native SAN volumes to VPLEX volumes via encapsulation. Observe the system state transitions as you step through this task sequence.

The basic idea is to logically integrate VPLEX into your production fabrics between your hosts and storage arrays.

To do this, the back‐end ports of VPLEX are first connected to the production fabrics.

Via suitable zoning and LUN masking, VPLEX back‐end ports, which are technically initiators, detect the back‐end storage arrays and volumes. Native array volumes or LUNs are then claimed by VPLEX, allowing your storage administrator to layer VPLEX virtual volumes on them for presentation to hosts.

Front‐end configuration is the next logical step. VPLEX front‐end ports are connected to the fabrics, and the zoning configuration modified to allow hosts to detect these ports as targets.

Once this is done, VPLEX can detect the host initiators (HBAs) which should then be registered with the appropriate host personality.

At this point, by creating a suitable storage view within VPLEX, it becomes possible to present VPLEX volumes to the host initiators. Note that in this process, the original SAN volumes from the array are now repackaged as VPLEX volumes and presented via new FC targets, (i.e. the VPLEX FE ports). The recommendation is to remove host access to the original SAN volumes, before presenting the encapsulating VPLEX volumes.




Storage Provisioning: DevicesStorage Provisioning: Devices

• RAID‐1 – Mirrored VPLEX DeviceUse arrays from the same tier

Ideal for nesting other devices

• RAID‐0 – Striped VPLEX DeviceIdeal for encapsulated devices

Consider stripe depth

Avoid striping striped storage volumes

• RAID‐C – Concatenated VPLEX DeviceMost flexible to grow

Extent

Dev

Extent

Dev

Extent

Dev

Dev Dev

Dev

Dev

The VPLEX “device” construct forms the basis of core RAID capabilities supplied by VPLEX. The key value‐add is that VPLEX can enable RAID functionality across storage arrays.

A RAID‐1 VPLEX Device mirrors data to two extents or devices.

A RAID‐0 VPLEX Device stripes data across multiple extents or devices. Simplest possible device is a RAID‐0 device that uses one extent. This is typically what you’d configure during encapsulation.

A RAID‐C VPLEX Device concatenates multiple extents or devices.

Viewing these as building blocks allows you to consider an organized system of device “nesting” to meet your customer’s specific needs.



VPLEX

38


Provisioning: Multi‐pathing with EMC PowerPathProvisioning: Multi‐pathing with EMC PowerPath

By default, EMC VPLEX volumes appear with vendor ID “EMC” and product ID “Invista”. Thus, any version of PowerPath that can manage Invista volumes, can also recognize and manage EMC VPLEX volumes. This example shows a Virtual Volume on a front‐end Linux host, as reported by PowerPath. Note that the default load‐balancing policy with PowerPath for a VPLEX volume is ADaptive. Other multi‐pathing options, including native OS multi‐pathing are discussed later, in the Planning and Design module.




Extent MobilityExtent Mobility

1010101101

• Mobility of block data across extents, non‐disruptive to the host

• Extent mobility can only be performed within a cluster

• Original extent is freed up for reuse

• Fundamental use: non‐disruptive data mobility across heterogeneous storage arrays

VirtualVol

Storage Vol

Extent

DEV

Extent

Storage Vol

Host

1010101101

VPLEX Local supports mobility of Extents – potentially across storage array frames – that is completely transparent to any layered virtual volume that is actively servicing I/O requests from a host.

As this example shows, the device‐to‐extent mapping changes at the end of a committed Mobility operation. However, the host to which the volume is provisioned is not even aware of this change.

Note that extent mobility requires that both the source extent and the target extent belong to the same VPLEX cluster.




Device MobilityDevice Mobility

Storage Vol

ExtentExtent

VirtualVol

Extent

Storage Vol

Extent

Storage Vol Storage Vol

DEVDEV

Host

1010101101 1010101101

Another Mobility option with VPLEX Local is mobility at the device level. This could be used for example to move data across disparate storage arrays, or even to change the RAID level of a device without disruption.

Device mobility is supported across clusters as well, in a Metro‐Plex environment.



VPLEX


Mobility: Typical Task SequenceMobility: Typical Task Sequence

1. dm migration start –n <name> -f <extent/device> -t

<extent/device>

RAID‐1

Source Device or Extent

1010101101

Target Device or Extent

1010101101

2. dm migration commit -m <name> --force

3. dm migration clean -m <name> --force

4. dm migration remove -m <name> --force

There are four basic operations involved in moving extents or devices. They are: start, commit, clean, and remove. Data mobility is accomplished by using RAID‐1 operations.

The start operation first creates a RAID‐1 device on top of the source device. It specifies the source device as one of its legs and the target device as the other leg. It then copies the source’s data to the target device or extent. This operation can be canceled as long at it is not committed.

The commit operation removes the pointer to the source leg. It is not best practice to commit the operation immediately.

At this point in time the target device is the only device accessible through the Virtual Volume.

The clean operation breaks the source device down all the way to the storage volume level. This operation is optional. However, the data on the source device is not deleted.

The remove operation removes the record from the mobility operation list. Data mobility operations can also be paused and resumed. These commands may be used in conjunction with the VPLEX scheduler to mitigate or eliminate disruption to production I/O.




Batched MobilityBatched Mobility

• Enables scripting of extent and device mobility

• A batch can process either extents or devices, but not a mix of both

Task sequence for batched mobility:

1. Create migration plan: batch-migrate create-plan plan.txt -f <source> -t <destination>

2. Check plan for errors: batch-migrate check-plan plan.txt

3. Start migration, copy data to targets: batch-migrate start plan.txt

4. Commit migration: batch-migrate commit plan.txt

5. Clean up migration: batch-migrate clean –file plan.txt

6. Remove migration record: batch-migrate remove

Batched mobility provides the ability to script large‐scale migrations without having to specify individual extent‐by‐extent or device‐by‐device migration jobs.




AccessAnywhere with VPLEX MetroAccessAnywhere with VPLEX Metro

Remote DeviceDistributed Device


Synchronous Distance

StorageArray

StorageArray

Distributed Device

Virtual Volume

Host Host



Device

StorageArray

StorageArray

Host Host

Virtual Volume

AccessAnywhere provides a logical device with full read/write access to multiple hosts at multiple locations – with the current release, separated by synchronous distance up to 100 km.

A key enabling VPLEX Metro technology for AccessAnywhere is distributed mirroring. It enables you to configure a RAID‐1 mirrored device with two legs, one on each cluster. Hosts at either site may issue I/O to this shared volume concurrently. Distributed coherent shared cache preserves data integrity of this volume.

This mirrored device has the same volume identity at both clusters, while being presented via distinct FC targets (i.e., VPLEX FE ports at each cluster).

Another enabling VPLEX Metro technology for AccessAnywhere is remote access.

This allows a device configured on one site to be presented to initiators on the other site for full read/write access. For remote exports,

VPLEX use of sequential read‐detection logic within the caching layer can significantly improve performance. Feasible configurations therefore include hosts with no SAN storage within their local site.




Distributed Device: I/O OperationDistributed Device: I/O Operation

ACK

FC MAN

VirtualVolume

Host in Cluster‐2/Site B writes data to shared volume.Data is written through cache to Back‐end storage.Data is acknowledged by Back‐end arrays.Data is acknowledged to host once written to disk.

10110…

10110…10110…10110…

ACK

ACK ACK


Distributed device

Host Host

StorageArray

StorageArray

VPLEX Cluster‐1/Site A VPLEX Cluster‐2/Site B

Let’s examine the mechanics of I/O access of each of these enabling technologies in greater detail.

With a distributed device, when a host issues a write to the device, the data is placed in the cache of the ingress Director.

And, then written through to storage arrays at both sites. Only after the storage arrays have acknowledged write completion

does the host get the ack for “write‐complete” from VPLEX.

This design completely eliminates the risk of losing host data in the event of VPLEX component failures.




Remote Device: I/O OperationRemote Device: I/O Operation

Host in Cluster‐2/Site B writes data to volume

10110…

10110…

READ

READ 10110…10110…

11001…

11001… 11001…ACK

ACK

Data is acknowledged to host once written to disk

ACK

Host in Cluster‐1/Site A reads data from volume.Host in Cluster‐1/Site A writes data to volume.Data is acknowledged to host once written to disk.


StorageArray

StorageArray

Host Host

VPLEX Cluster‐1/Site A VPLEX Cluster‐2/Site B

VirtualVol

VirtualVol

FC MAN

With remote access:

Writes from hosts on the same cluster as the exported device work the same as writes to any local device – then written to the back‐end array, before the acknowledgement is sent to the host.

Reads from remote hosts can effectively exploit local cache, remote cache and sequential read‐ahead for near‐local performance.

�For a write from a remote host: the new data is cached at the remote site. Existing data in the local cache is invalidated with an RPC message; then the new data is sent to the local site, and written to the back‐end storage.




Distributed Device: Handling “Split‐brain”Distributed Device: Handling “Split‐brain”

Consider a distributed system with two sites:

From Site A’s perspective the following two conditions are indistinguishable:

Addressing this is fundamental to the design of distributed applications.With Metro‐Plex distributed device: handled with a configurable “detach rule”

Site A Site BFC‐MAN

Site A Site BFC‐MANSite A Site BFC‐MAN

Partition Failure Site Failure

Let’s examine the logistics of failure handling in a Metro‐Plex environment.

There are two types of failures in a Metro‐Plex, partition failures and site failures. Partition failures typically occur more often than site failures. However, from one site’s point of view, both partition failures and site failures are handled the same way. Metro‐Plex handles both types of failures using detach rules, as we’ll see next.




Distributed Device: Configuring Detach RuleDistributed Device: Configuring Detach Rule

Can specify a pre‐defined rule set or customized rule set

Failure handling behavior is configured by tying a specific “detach rule” to each distributed device. In the example shown, the ruleset “cluster‐1 detaches” implies that upon failure, if cluster‐1 survives then it will continue to provide read/write access to the volume, while cluster‐2 will suspend I/O activity to this device at the other site. The detach rule can be changed by selecting the distributed device’s supporting device and then selecting a different cluster to detach from. Detach rules may be customized to meet specific needs.




Distributed Devices: Supported Detach OptionsDistributed Devices: Supported Detach Options

Detach options currently supported with VPLEX distributed devices in a Metro‐Plex:

• Biased‐site detach

• Non‐biased site detach

• Manual detach

Use with automated script on production host(s) to activate read/write access from either site, after a failure event

There are three major categories or approaches to detach rules.

Either biased‐site detach or non‐biased site detach are simple to implement, with pre‐defined rule sets in place. Either of these may adequately address the customer’s needs. For example, when one site can be clearly viewed as the production site while the other is secondary, within the context of a given DR1.

To enable complete control of the VPLEX DR1 environment from a stretched host cluster, the use of “manual detach” with scripting is recommended.




Monitoring: VPLEX PerformanceMonitoring: VPLEX Performance

• Creating monitorsmonitor create --name <name> --period <time> --director <Director_Name> --stats <stat>

• Listing monitors

• Destroying monitorsmonitor destroy <monitor>

Performance data can be collected on the VPLEX system by creating monitors and sinks. Monitors collect performance statistics on various VPLEX components. These monitors are created within the VPlexcli using the monitor command. By default, monitors collect statistics every 30 seconds. This collection time can be modified if desired.

Once a monitor is created, it can be found in the /monitoring directory. Monitors only start collecting data when they have at least one associated “sink”, as we’ll see next. Monitors can be destroyed using the monitor destroy command.




Monitoring: VPLEX Performance (Cont’d)Monitoring: VPLEX Performance (Cont’d)

• Listing statistics available for monitoringmonitor stat-list

• Monitor collect Updates a performance monitor immediately

Ad‐hoc manual collect of data

• Supported monitor “sink” types: console, file, SNMP

• Adding sinks for monitorsmonitor add-file-sink –n <name> -f <file_location> -m <monitor_to_add>

• Removing a sinkmonitor remove-sink <sink>

To be able to activate and view the statistics collected by a monitor, at least one sink must be created. Sinks are files created to hold output from monitors. Sink files can then be uploaded to other programs such as MS Excel to better view the information collected. Three different types of sinks can be created, “console, file, and SNMP”. SNMP sinks are not supported. Sinks are composed of comma separated values and therefore a .CSV extension is a useful file name extension. Console sinks have limited use because they interfere with console typing.




Monitoring: Event Handling and Report GenerationMonitoring: Event Handling and Report Generation

ESRS Gateway

Management Server

ConnectEMCCall Home Listener

SYREMA_Adaptor

VPlexcli

Engine

TCP ports 22, 9010, 443 and

5901

Shown is the high level architecture of event handling and messaging flow from the Engine to the Management server, to a properly configured ESRS gateway.

VPlexcli, which runs on the Management Server, pulls events every second from a process on a Director. The Call Home Listener on the Management server looks at the events and determines, which events should initiate a call home. It then places those events into the /opt/emc/VPlex/Event_Msg_Folder directory as .txt files.

The EMA_adaptor’s job is to take the text files from the Event_Msg_Folder directory and create the required XML files using the EMA API. The EMA_adaptor then places those files into the /opt/emc/connectemc/poll directory. The ConnectEMC process picks up the XML event files and sends them to ESRS Gateway. If the events are successfully sent to the gateway, they are also copied into the /opt/emc/connectemc/archive directory. If transmission fails for some reason, the corresponding events are placed into the /opt/emc/connectemc/failed directory.

TCP ports 22, 9010, 443, and 5901 must be open between the Management Server and the ESRS Gateway. The ESRS Gateway classifies incoming events as belonging to this VPLEX instance via the “Top‐Level Assembly” field within each event. The “Top‐Level‐Assembly” is a cluster‐unique identifier that is preset at the factory on all engines of a VPLEX cluster.




Generating System Reports: SYRGenerating System Reports: SYR

SYR generates a complete report of the VPLEX System

syrcollectManually run SYR

scheduleSYR listList SYR

Task Command

Configure SYRSends a weekly report to the ESRS Gateway

scheduleSYR add -d <day> -t <hour> -m <minute>

SYR is a process that collects VPLEX system reports to send to the ESRS gateway. SYR reports use the same directories as ESRS events. SYR can be run manually using the syrcollect command; or it can be run at a scheduled time using the scheduleSYR command. SYR reports are sent to the ESRS Gateway by the ConnectEMC process. Once SYR has been scheduled, it will run weekly at the scheduled time.




Collecting VPLEX Log FilesCollecting VPLEX Log Files

collect-diagnostics

• Collects logs, cores, and configuration information from the Management Server and the directors

• Places a tar.gz file in /diag/collect-dianostics-out

The collectdiagnostics command can be used when attempting to troubleshoot VPLEX issues. This command will produce a tar.gz file containing logs, cores and configuration information about the Management Server and directors within a VPLEX system. This file is very large and should moved off the system once it has been generated.




Scheduling: “cron”‐styleScheduling: “cron”‐style

schedule

manage and control timing of specific tasks

The VPlexCLI schedule command may be used to run commands in batch mode at an arbitrary time, or periodically on a schedule. This can be particularly useful to offload certain types of activity ‐ for example mobility ‐ to off‐production hours.




Maintenance: Non‐disruptive Code Upgrade (NDU)Maintenance: Non‐disruptive Code Upgrade (NDU)

• NDU process for VPLEX: code upgrades with no disruption to production hosts performing I/O to VPLEX virtual volumes

• Requires best practices to be followed for host connectivity, and supported multi‐pathing software

• Uses a notion of “first upgraders” and “second upgraders” First: Director A of every engine is upgraded, then rebooted

Second: Director B of every engine is upgraded, then rebooted

VPLEX Metro upgrade: Both cluster are upgraded with a single “ndu”operation issued on one Management Server

These are the steps to perform an NDU. I/O will continue while one side of an engine is being upgraded. The time to complete an NDU should be relatively the same regardless of the number of engines in the system. This is because an NDU will upgrade all A directors and then all B directors at once.

First upgraders: Every engine’s A directors are upgraded A directors’ firmware is shutdown during the upgrade

I/O is automatically redirected to B directors

Once upgraded, A directors reboot

A directors begin serving I/O again

Second upgraders: Every engine’s B directors are upgraded B director’s firmware is shutdown during the upgrade

I/O is automatically redirected to A directors

Once upgraded, B directors reboot

B directors begin serving I/O again



© 2010 EMC Corporation. All rights reserved. Module 4: Planning and Design Considerations 56




• Perform planning and design for VPLEX deployment

• State and explain the rationale for recommended best practices with VPLEX implementations

This module covers key planning and design considerations relevant to VPLEX solutions.

This module covers Planning and Design considerations during deployment of a VPLEX solution.




FE

FE

BE

BE

VPLEX Physical Connectivity: SAN Best PracticesVPLEX Physical Connectivity: SAN Best Practices

Volume 2

Volume 1

Hosts

Arrays

• Deploy mirrored fabrics

• Connect every host and every storage array to both fabrics

• For each VPLEX Director, distribute front‐end ports over both fabrics

• For each VPLEX Director, distribute back‐end ports over both fabrics

• For each FE module and BE module, distribute ports over both fabrics

Fabric B

Fabric A

When deploying the VPLEX cluster, the general rule is to use a configuration that provides the best combination of simplicity and redundancy. In many instances connectivity can be configured to varying degrees of redundancy. However, there are some minimal requirements that should be met.

Deploy mirrored fabrics: this is standard EMC practice. In addition, it is preferable to isolate the front‐end fabrics from the back‐end fabrics. This would ensure clean separation of hosts from storage arrays. This is appropriate in environments where all encapsulation of existing production data is complete, and any future provisioning to hosts will be exclusively from VPLEX.

Connect every host and every storage array to both fabrics.

Each Director should be assigned ports to both fabrics otherwise, a fabric failure could reduce the paths and computing power of the VPLEX. This will double the workload for the surviving Directors. Distribute FE ports of each director over both fabrics.

Distribute BE ports of each director over both fabrics.

The above two rules ensure the following: if there is complete outage on one fabric, that does not render a Director completely non‐operational on either the front‐end or on the back‐end.

Thus the processing power of the VPLEX system is not compromised by a fabric outage.

Distribute the four ports of each I/O module over both fabrics.

Again this minimizes loss of VPLEX efficiency and processing power in the event of complete failure on one fabric.




• Each director must be provided access to every BE volume in the cluster

• Active/Active array: For each director, provide at least one BE path to each volume via each fabric

• Active /Passive array: For each director, provide BE paths via both controllers to each LUN via each fabric

• VPLEX BE port “initiator personality” – open systems host, use failovermode=1 with CLARiiON arrays

VPLEX Logical Connectivity: Back‐endVPLEX Logical Connectivity: Back‐end

VMAX

Volume

CX4‐960

LUNA0

A1

B0

B1

Fabric B

Fabric A

It is a requirement that each Director have at least one viable, active path to every Storage Volume in a VPLEX cluster.

This means, to be usable a Storage Volume must be presented to every Director in the same cluster.

For active/passive storage arrays, make sure that a given BE port of a Director has both active and passive paths to the storage volume.




VPLEX Logical Connectivity: Front‐endVPLEX Logical Connectivity: Front‐end

Engine 2

Engine 1

Engine 2

Director B

Engine 1

Director A

Director A

Director B

• Single Engine configuration: For each host, configure FE paths to both Director A and Director B

• Dual Engine and Quad Engine configuration: For each host, configure FE paths to A and B of separate engines

Fabric B

Fabric A

Volume 2

Volume 1

Hosts

Arrays

Front‐end hosts should be configured with paths to VPLEX front‐end ports, which serve as virtualization targets, via separate fabrics. In a single engine system, configure at least one front‐end path to each director. This enables the host to maintain I/O access to VPLEX volumes during an NDU code upgrade.

With dual engine or quad engine systems, additional resiliency can be obtained by using “A” and “B”directors on distinct engines. This would ensure that the host does not lose I/O access to volumes even during planned or unplanned shutdown of one engine.




SAN Volume Requirements: VPLEX Meta Volume SAN Volume Requirements: VPLEX Meta Volume

• One active VPLEX meta volume per cluster

• Used internally for storing meta data

• Failure impact: does not affect production I/O to existing VPLEX volumes

Meta Volume Best Practices:

• Required capacity: 78 GB or larger

• Recommended: run VPLEX meta volume backup periodically

• General requirements for SAN volumes to be used for metas: Highest possible availability

Not demanding of performance:

Low write I/O ‐ only during configuration changes

High read I/O – only during Director boot and NDU

Listed are the requirements and best practices for VPLEX Meta Volumes. I/O throughput capability is not a serious consideration for a meta volume, since it is updated only during configuration changes. Availability is the overriding concern here. It is critical to mirror the Meta Volume onto two different arrays. An additional recommendation is to create meta volumes on two arrays with different refresh timelines, thus avoiding having to migrate the data off both arrays at once. It is important to periodically make backups of the Meta Volume especially after VPLEX configuration changes or upgrades. The reason for this is to eliminate the possibility of the system from ever losing access to newly created VPLEX objects.




SAN Volume Requirements: VPLEX Logging Volume SAN Volume Requirements: VPLEX Logging Volume

• Required only in Metro‐Plex: at least one logging volume per cluster

• Used internally to track changes between legs of distributed RAID‐1 devices during loss of connectivity between clusters

• Required capacity: 1 bit for every 4‐Kbyte page of distributed device One 10 GB logging volume can support 320 TB of distributed devices

• General requirements for SAN volumes to be used for logging: Very high performance requirement

No I/O activity on logging volumes under normal conditions

High random, small‐block write I/O rate during loss of connectivity

High small‐block read I/O rate during incremental re‐synchronization

Highest possible availability

Use striped and mirrored volumes to meet these requirements

Listed are the requirements and best practices for VPLEX logging volumes.

A pre‐requisite for creating a distributed device, or a remote device, is that you must have a logging volume at each cluster. Single‐cluster systems and systems that do not have distributed devices do not require logging volumes. Logging volumes keep track of changed blocks during an inter‐cluster link failure. After a link is restored, the system uses the information in logging volumes to synchronize the distributed devices by sending only changed block regions across the link.

The logging volume must be large enough to contain one bit for every page of distributed storage space. So for example, you only need about 10 GB of logging volume space for 320 TB of distributed devices in a Metro‐Plex. The logging volume receives a large amount of I/O during and after link outages. So it must be able to handle I/O quickly and efficiently.




Storage Views: Best PracticesStorage Views: Best Practices

• Each storage view should have:

At least two registered initiators (HBA ports) from each host

Recommended: HBAs distributed over redundant fabrics

At least two VPLEX FE ports: one from an A director, one from a B director

Recommended: ports from different engines when possible, and distributed over redundant fabrics

• Create one storage view for all the hosts that need access to the same storage

Storage View

V Vol

HostInitiator

HostInitiator FE Port

FE Port

When creating storage views, follow these best practices: Create one storage view for all hosts that need access to the same storage, and then add all required volumes to the view.

Redundancy requirements are based on standard EMC guidelines for SAN configuration. Each host should have at least two registered initiators in the view. Access to the volumes should be enabled via at least two VPLEX front‐end ports in the view. When selecting the front‐end ports for a storage view, make sure to follow the previously‐discussed best practices – use ports from at least one A director and one B director, and whenever possible, from directors in separate engines.

62




Partition AlignmentPartition Alignment

• VPLEX page size = 4K

• VMAX track size = 32K

• Minimum recommended alignment = 64K

• Can’t go wrong with 1M

When creating VPLEX virtual volumes, pay attention to partition alignment in order to avoid host‐to‐storage performance problems in production.

Follow these best practices for partition alignment: Best practices that apply to directly‐accessed storage volumes also apply to virtual volumes

I/O operations to a storage device that cross page, track or cylinder boundaries must be minimized – these lead to multiple read or write operations to satisfy a single I/O request

Misaligned partitions can consume additional resources in VPLEX and the underlying storage array(s), leading to less than optimal performance

Align partitions for any x86‐based OS platform

Align partitions on 32 KB boundaries

TitleMonth Year

63




VPLEX Encapsulation: Best PracticesVPLEX Encapsulation: Best Practices

• “Data‐in‐place” migration: minimizes downtime

• Best Practices: Claim storage volumes using the application consistent flag

Prevents reconfigurations other than “one‐for‐one” (single extent spanning entire SAN volume)

Ensures that production data does not become unavailable or corrupted

Migrate into VPLEX in phases

Divide migrations by hosts or initiator groups

• Limitation: Capacity of encapsulation target must be an integral multiple of 4‐

Kbytes

Avoid concurrent I/O activity from host to the native array volume, and to the VPLEX encapsulated volume

Here are some of the best practices and requirements for encapsulation.

A storage volume to be encapsulated must have capacity that is an integral multiple of 4 Kbytes. Otherwise, encapsulation will render it inaccessible to the host. they will be inaccessible.

During encapsulation hosts should be allowed to perform I/O to virtual volumes or storage volumes, but not both at the same time – that can cause data corruption.

Migrations should be performed on an initiator group basis. This way any necessary driver updates can be conveniently handled on a host by host basis.




VPN and MAN‐COM: Best PracticesVPN and MAN‐COM: Best Practices

• Metro‐Plex requirement: distance <= 100 km; FC‐MAN round trip latency < 5 milliseconds

• Supported distance extension technologies: FC over dark fibre; DWDM

• Best Practice:

Two physical MAN links with similar characteristics, such as latency

Configure long‐distance links between VPLEX clusters using ISLs

Redundant MAN fabrics; one connection to each MAN fabric from every VPLEX Director


Cluster‐1/Site A

Director B

Director A

Director B

Director A

Switch

Switch


Cluster‐2/Site B

Director B

Director A

Director B

Director A

Switch

Switch

WAN

Engine 2

Engine 1

ISL 1

ISL 2

Engine 2

Engine 1

IPsec Tunnel

The diagram illustrates the requirements for IP and FC connectivity between the two clusters in a Metro‐Plex.

A fundamental requirement – without which the Metro‐Plex cannot be installed – is IP connectivity between the VPLEX Management Servers. As part of initial Metro‐Plex install, a VPN tunnel is established for secure connection and interchange of configuration data between these servers.

Additionally, the VPLEX Directors of each cluster need visibility to Directors of the other cluster via their MAN‐COM ports. Currently distances of up to 100 km between clusters is supported. Round‐trip latency on this link must be less than 5 milliseconds. Bandwidth requirement will obviously depend on the specific customer application; in general a minimum of 45 Mbps is the guideline.

The FC‐MAN links can use either dark fibre or DWDM.

When configuring a Metro‐Plex it is best to make use of two fabrics for the FC‐MAN connection, allowing a Director to communicate with all the other Directors on either of the two fabrics. This provides the best possible performance and fault tolerance.

If MAN traffic must share the same physical link as customer production traffic, then logical isolation must be implemented using VSANs or LSANs.

Note that there are specific zoning practices to be followed when exposing Director FC‐MAN ports to each other. Refer to the product installation guide for details.




Mobility RecommendationsMobility Recommendations

• Device Mobility Mobility between dissimilar arrays

Relocate hot devices from one array type to another

Relocate devices across clusters in a Metro‐Plex

• Batch Mobility For non‐disruptive tech refreshes and lease rollovers

For non‐disruptive cross‐Plex device mobility

Only 25 devices or extents can be in transit at one time

Additional mobility will be queued if greater than 25

• Extent Mobility Load balance across storage volumes

Listed are some typical applications for each supported type of Mobility.

Extent mobility can be used for load balancing across the storage volumes. This can also be used for array mobility where source and target arrays have a similar configuration, i.e. same number of storage volumes, identical capacities, etc.

Device mobility can be used for data mobility between dissimilar arrays, relocating a “hot” device from one type of storage to another.




Distributed Devices: Host Connect TopologiesDistributed Devices: Host Connect Topologies

• Local Access Each host accesses volume via FE ports on one cluster only

• Spanned Access (NOT Supported in V4.0) Each host accesses volume via FE ports on both clusters

There are two fundamental models for host access to DR1 volumes in a Metro‐Plex.

With “Local Access”, the fabrics at the two sites remain separate, with hosts at each site accessing DR1 volumes via local VPLEX FE ports only.

With “Spanned Access”, the hosts have access to fabrics at both sites and can therefore access DR1 volumes through FE ports at both sites. This provides for additional resiliency in a stretched host cluster – since with this access model, the host can tolerate loss of an entire VPLEX cluster at either site. Note that Spanned Access is not supported in v4.0.




Scalability and LimitsScalability and Limits

25Active inter‐cluster rebuilds 25Active intra‐cluster rebuilds

8 PBTotal storage provisioned in a systemUp to 32 TBVirtual volume sizeUp to 32 TBStorage volume size

2RAID‐1 mirror legs78 GBMeta volume size

8000 per clusterVirtual volumes 8000 per clusterStorage‐volumes

400Initiators (HBA ports)24000Extents

Maximum #Parameter

Shown are some key design limits; a complete table of all EMC VPLEX‐related design limits is published in the Release Notes. Always refer to the current version of the product Release Notes for these limits, which are subject to change until GA.




Volume Limits in a Metro‐Plex: ExampleVolume Limits in a Metro‐Plex: Example

Cluster‐1/Site A

Hosts

6000 “local” volumes

6000 local‐devices

2000 distributed‐devices

2000 local‐devices 2000 local‐devices

2000 “stretched” volumes

6000 “local” volumes

6000 local‐devices

Cluster‐2/Site B

Hosts

Here is an example to illustrate how the maximum limit of 8000 volumes per cluster can be effectively exploited in a Metro‐Plex solution.

In this scenario, we have 2000 distributed devices with the corresponding 2000 “stretched” volumes that can be presented to hosts at both sites. These volumes can potentially be shared by hosts across sites, for example to accommodate distance VMotion or stretched host clustering applications. Note that our 2000 “top‐level” distributed devices (i.e. devices that are enabled for front‐end presentation) are layered upon 2000 local devices within each cluster.

In addition, you can configure up to 6000 more “top‐level” local devices at each site, that are presented to local hosts only. These would be suitable for data that doesn’t need to be shared across sites.

This example shows how to conform to the 8000 volumes per VPLEX cluster limit, while also maximizing the benefit to the customer.




EMC VPLEX: Solution Design ToolsEMC VPLEX: Solution Design Tools

• Simple Support Matrix (SSM)

• VPLEX Sizing Tool (VST) Currently a calculator to determine cluster size

Plan is to integrate with BCSD in the future

• HEAT Check for host compatibility with VPLEX

• VPLEX Deployment Tool (VDT) Helps to assist with VPLEX

Configurations, implementations, and modifications in VPLEX clusters

Executable that runs on Windows

• SVC Qualifier

These are the current VPLEX solution design tools in active development.

Network quality and latency assessment is recommended.




VPLEX Sizing ToolVPLEX Sizing Tool

The VPLEX Sizing Tool can be used to validate a proposed VPLEX solution – either single cluster or Metro‐Plex.

It requires basic information about the type of workload, volume count, host initiator count etc.

Given this data, the tool checks whether the proposed design is capable of handling the workload from a performance standpoint;

and also whether it conforms to the complete list of configuration limits, as listed in the Release Notes.




Simple Support Matrix (SSM)Simple Support Matrix (SSM)

https://elabnavigator.emc.com/emcpubs/elab/esm/pdf/EMC_VPLEX.pdf

• Current VPLEX SSM is downloadable from:

The Simple Support Matrix provides a comprehensive view of current interoperability statements within a compact layout. It will be accessible through eLab Navigator.

Supported operating system base platforms,

Multi‐pathing options,

Volume management options,

And host clustering options are presented here in an easy‐to‐read format, for quick reference.




Interoperability: Current LimitationsInteroperability: Current Limitations

• Timefinder/Clone/Snap: NOT Supported at this time

• MirrorView/SRDF: can be used only when target or R2 site volumes are not virtualized with VPLEX ONLY support 1:1 mapping between VPLEX virtual volume and array

physical volume because the remote site (target/R2) won't be virtualized

• Currently VPLEX support only thick‐to‐thick data moves Virtual provisioning and support for thick‐to‐thin non‐disruptive mobility

in VPLEX are planned to be added over time

• RecoverPoint: not integrated and supported with VPLEX

Shown are some of the key interoperability limitations at launch time.

In v4.0, Timefinder/Clone/Snap is not supported.

MirrorView/SRDF can be used on VPLEX backend as long as the target or R2 site volumes are not virtualized with VPLEX. This also means that we can ONLY support 1:1 mapping between VPLEX virtual volume and array physical volume because the remote site (target/R2) won't be virtualized.

In v4.0, VPLEX will support only thick‐to‐thick data moves. Virtual provisioning and support for thick‐to‐thin non‐disruptive mobility in VPLEX are planned to be added over time.

RecoverPoint is not integrated and supported with v4.0. This functionality will be added over time.




Course SummaryCourse Summary

• EMC VPLEX represents innovative local and distributed federation technology. It is positioned to address non‐disruptive workload relocation, distributed data access, workload resiliency and simplified storage management.

• VPLEX Local supports local federation including consolidation, heterogeneous pooling and non‐disruptive mobility within a data center.

• VPLEX Metro supports the above, as well as distributed federation across sites or failure domains, within synchronous distances (up to 100 km, latency < 5 msec).

• VPLEX offers AccessAnywhere with the key enablers including: distributed virtual volumes over distance, remote access, and mobility within and across clusters.

This concludes the instructional portion of this training. These are the key points that have been covered in this course.

Please proceed to take the assessment.

VPLEX Architecture and Design

Documents

Transcript of VPLEX Architecture and Design