h11299 Emc Vplex Elements Performance Testing Best Practices Wp

download h11299 Emc Vplex Elements Performance Testing Best Practices Wp

of 24

Transcript of h11299 Emc Vplex Elements Performance Testing Best Practices Wp

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    1/64

    White Paper  

    Abstract

    This white paper describes the performance

    characteristics, metrics, and testing considerations

    for EMC VPLEX family of products. Its intent is to refine

    performance expectations, to review key planning

    considerations, and to describe testing best

    practices for VPLEX Local, Metro and Geo. This

    paper is not suitable for planning for exceptiona

    situations. In configuring for performance, everyenvironment is unique and actual results may vary.

    EMC VPLEX: ELEMENTS OF PERFORMANCE

    AND TESTING BEST PRACTICES DEFINED

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    2/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Copyright © 2012 EMC Corporation. All Rights Reserved.

    EMC believes the information in this publication is accurate of its publication

    date. The information is subject to change without notice.

    The information in this publication is provided “as is”. EMC Corporation makes no

    representations or warranties of any kind with respect to the information in this

    publication, and specifically disclaims implied warranties of merchantability or

    fitness for a particular purpose.

    Use, copying, and distribution of any EMC software described in this publication

    requires an applicable software license.

    For the most up-to-date listing of EMC product names, see EMC Corporation

    Trademarks on EMC.com.

    All other trademarks used herein are the property of their respective owners.

    Part Number h11299

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    3/64

     

    3EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Table of Contents

    Executive summary ....................................................................................................... 5 

    Audience ........................................................................................................................ 5 

    Introduction .................................................................................................................... 6 

    Transaction-based workloads................................................................................... 8 

    Throughput-based workloads ................................................................................... 8 

    The Role of Applications in Determining Acceptable Performance .................. 8 

    Section1: VPLEX Architecture ...................................................................................... 10 

    VPLEX hardware platform ....................................................................................... 10 

    VPLEX GeoSynchrony 5.1 System Configuration Limits ....................................... 10 

    READ / Write IO Limits ............................................................................................... 11 

    Section 2: VPLEX Performance Highlights .................................................................. 12 

    Understanding VPLEX overhead ............................................................................ 12 

    Native vs. VPLEX Local Performance ..................................................................... 12 

    OLTP Workload Example ......................................................................................... 15 

    Native vs. VPLEX Metro Performance .................................................................... 15 

    Native vs. VPLEX Geo Performance ...................................................................... 16 

    Section 3: Hosts and Front-end Connectivity ............................................................. 17 

    Host Environment ...................................................................................................... 17 

    Host Paths .................................................................................................................. 17 

    Host to director connectivity .................................................................................. 20 

    Host Path Monitoring ................................................................................................ 22 

    Policy based path monitoring ................................................................................ 23 

    VPLEX Real-time GUI Performance Monitoring Stats ........................................... 24 

    Remote Monitoring and Scripting .......................................................................... 26 

    Watch4Net ................................................................................................................ 26 

    Perpetual Logs .......................................................................................................... 26 

    Benchmarking Applications, Tools and Utilities .................................................... 26 

    Section 4: Application Performance Considerations ................................................ 31 

    High Transaction environments .............................................................................. 31 

    High Throughput environments .............................................................................. 32 

    VPLEX Device Geometry ......................................................................................... 32 

    Section 5: Back-end Performance Considerations ................................................... 34 

    Storage Considerations ........................................................................................... 34 

    Storage Array Block Size .......................................................................................... 34 

    SAN Architecture for Storage Array Connectivity ............................................... 34 

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    4/64

     

    4EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Active/Active Arrays ................................................................................................ 35 

    Active/ Passive Arrays .............................................................................................. 38 

    Additional Array Considerations ............................................................................ 40 

    Automated Storage Tiering..................................................................................... 40 

    Performance Metrics for Back-end IO ................................................................... 40 

    Back-end Connectivity Summary .......................................................................... 41 

    Section 6: SAN and WAN Performance ...................................................................... 42 

    SAN Redundancy ..................................................................................................... 42 

    Redundancy through Cisco VSANs or Brocade Virtual Fabrics ......................... 42 

    Planning SAN Capacity ........................................................................................... 43 

    ISL Considerations ..................................................................................................... 43 

    FC WAN Sizing ........................................................................................................... 44 

    Brocade switches: .................................................................................................... 44 

    IP WAN Settings VPLEX Metro-IP and VPLEX Geo................................................. 45 

    Areas to Check to Avoid SAN and WAN Performance Issues ........................... 45 

    Section 7: VPLEX Performance Checklist ................................................................... 47 

    Section 8: Benchmarking ............................................................................................ 51 

    Tips when running the benchmarks ....................................................................... 51 

    Take a scientific approach when testing ............................................................. 51 

    Typical Benchmarking Mistakes ............................................................................. 52 

    Real World Testing Mistake Example ..................................................................... 54 

    Understand the Metamorphosis of an IO ............................................................. 54 

    VPLEX Performance Benchmarking Guidelines ................................................... 54 

    IOMeter Example ...................................................................................................... 56 

    Conclusion .................................................................................................................... 61 

    References .................................................................................................................... 62 

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    5/64

     

    5EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Executive summary

    For several years, businesses have relied on traditional physical storage to meet their

    information needs. Developments such as sever virtualization and the growth of

    multiple sites throughout a businesses’ network have placed new demands on how

    storage is managed and how information is accessed.

    To keep pace with these new requirements, storage must evolve to deliver newmethods of freeing data from a physical device. Storage must be able to connect

    to virtual environments and still provide automation, integration with existing

    infrastructure, consumption on demand, cost efficiency, availability, and security.

    The EMC® VPLEX™ family is the next generation solution for information mobility and

    access within, across, and between data centers. It is the first platform in the world

    that delivers both Local and Distributed Federation.

      Local Federation provides the transparent cooperation of physical elements

    within a site.

      Distributed Federation extends access between two locations across distance.

    VPLEX is a solution for federation both EMC and non-EMC storage.VPLEX completely changes the way IT is managed and delivered –  particularly when

    deployed with server virtualization. By enabling new models for operating and

    managing IT, resources can be federated –  pooled and made to cooperate through

    the stack  — with the ability to dynamically move applications and data across

    geographies and service providers. The VPLEX family breaks down technology silos

    and enables IT to be delivered as a service.

    VPLEX resides at the storage layer, where optimal performance vital. This document

    focuses on key considerations for VPLEX performance, performance metrics, and

    testing best practices. The information provided is based on VPLEX Release 5.1. The

    subject is advanced and it is assumed the reader has a basic understanding of the

    VPLEX technology. For additional information on VPLEX best practices and detailed

    technologies see the appendix for a reference list and hyperlinks to relevant

    documents.

    Audience

    This white paper is intended for storage, network and system administrators who

    desire a deeper understanding of the performance aspects of EMC VPLEX, the

    testing best practices, and/or the planning considerations for the future growth of

    their VPLEX virtual storage environment(s). This document outlines how VPLEX

    technology interacts with existing storage environments, how existing environmentsmight impact VPLEX technology, and how to apply best practices through basic

    guidelines and troubleshooting techniques as uncovered by EMC VPLEX

    performance engineering and EMC field experiences.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    6/64

     

    6EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Introduction

    Before we begin our discussion, it is important to know why we are providing

    guidance on interpretation of the performance data provided in this document. The

    Business unit that has delivered VPLEX to the market has a guiding policy to be as

    open and transparent as possible with EMC field resources, partners and customers.

    We believe that all modern storage products have limitations and constraints and

    therefore the most successful and satisfied customers are those that fully understand

    the various constraints and limitations of the technology they intend to implement.

    This approach leads our customers to success because there are fewer surprises and

    the product expectations match the reality. Our intent is to be candid as possible.

    We ask readers to use the information to understand the performance aspects of

    VPLEX implementations and to make better informed judgments about nominal

    VPLEX capabilities rather than use the document as the final word on all VPLEX

    performance (as competitors may be tempted to do). If you have questions about

    any of the content in this document please contact to your local EMC Sales or

    Technical representatives.

    When considering a given solution from any vendor there will undoubtedly be

    strengths and weaknesses that need to be considered. There will always be a

    specific unique IO profile that poses challenges in servicing the application load, the

    key is to understand the overall IO mix and how this will impact real production

    workloads. It is misleading to extrapolate a specific IO profile to be representative of

    an entire environment unless the environment homogeneously shares a single IO

    profile.

    Let’s begin our discussion of VPLEX performance by considering performance in

    general terms. What is good performance anyway? Performance can be

    considered to be a measure of the amount of work that is being accomplished in a

    specific time period. Storage resource performance is frequently quoted in terms of

    IOPS (IO per second) and/or throughput (MB/s). While IOPS and throughput are bothmeasures of performance, they are not synonymous and are actually inversely

    related –  meaning if you want high IOPS, you typically get low MB/s. This is driven in

    large part by the size of the IO buffers used by each storage product and the time it

    takes to load and unload each of them. This produces a relationship between IOPS

    and throughput as shown in Figure 1 below.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    7/64

     

    7EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Figure 1

    For example, an application requests 1,000 IOPS at an 8KB IO size which equals 8MB/s of throughput (1,000 IOPS x 8KB = 8MB/s). Using 200MB/s Fibre channel, 8 MB/s

    doesn’t intuitively appear to be good performance (8MB/s is only 4% utilization of the

    Fibre Channel bus) if you’re thinking of performance in terms of MB/s. However, if the

    application is requesting 1,000 IOPS and the storage device is supplying 1,000 IOPS

    without queuing (queue depth = 0), then the storage resource is servicing the

    application needs without delay –  meaning the performance is actually good.

    Conversely, if a video streaming application is sequentially reading data with a 64MB

    IO size and 3 concurrent streams, it would realize 192MB/s aggregate performance

    across the same 200MB/s Fibre channel connection (64MB x 3 streams = 192MB/s).

    While there’s no doubt that 192 MB/s performance is good (96% utilization of the Fibre

    Channel bus), it’s equally important to note we’re only supporting 3 IOPS in thisapplication environment.

    These examples illustrate the context dependent nature of performance  –   that is,

    performance depends upon what you are trying to accomplish (MB/s or IOPS).

    Knowing and understanding how your host servers and applications handle their IO

    workload is the key to being successful with VPLEX performance optimization. In

    general, there are two types of IO workloads:

    •  Transaction-based

    •  Throughput-based

    As you saw in Figure 1, these workloads are quite different in terms of their objectivesand must be planned for in specific ways. We can describe these two types of

    workloads in the follow ways:

      A workload that is characterized by a high number of IO per second (IOPS) is

    called a transaction-based workload.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    8/64

     

    8EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

      A workload that is characterized by a large amount of data transferred,

    normally with large IO sizes, is called a throughput-based workload.

    What should you expect to see from each type of workload?

    Transaction-based workloads

    High performance transaction-based environments cannot typically be built using

    low cost and consequently low IOPs back end arrays. Transaction processing ratesare heavily dependent on the competency of the backend array. Ultimately the

    number of back-end physical drives that are available within a storage system to

    processing host IO becomes the limiting factor. In general, transaction- based

    processing is limited by the physical spindle count and individual disk IO capabilities

    of the array rather than the size of the connectivity pipes, the transfer buffer sizes, or

    the internal bandwidth of the array.

    Another common characteristic of transaction intense applications is that they use a

    small random data block pattern to transfer data. With this type of data pattern,

    having more back-end drives enables more host IO to be processed simultaneously.

    When transaction-based or any type of workloads are random and write biased, the

    efficacy of read cache is diminished as misses need to be retrieved from the physicaldisks.

    In many cases, slow transaction performance problems can be traced directly to

    “hot” files that cause a bottleneck on a critical component (such as a single physical

    disk). This situation can occur even when the overall storage subsystem sees a fairly

    light workload. When bottlenecks occur, they can present an extremely difficult and

    frustrating task to resolve.

    Throughput-based workloads

    Throughput-based workloads are seen with applications or processes that require

    massive amounts of data to be transmitted in as few IO as possible. Generally these

    workloads use large sequential blocks to reduce the impact of disk latency.

    Applications such as satellite imagery, high performance compute (HPC), video

    streaming, seismic research, surveillance, and the like would fit into this category.

    Relatively speaking, a smaller number of physical drives are needed to reach

    adequate IO performance compared to transaction-based workloads. In a

    throughput-based environment, read operations make use of the storage subsystem

    cache to pre-fetch large chunks of data at a time to improve the overall

    performance. Throughput rates are heavily dependent on the connectivity pipe size,

    IO buffer size, and storage subsystem’s internal bandwidth. Modern storage

    subsystems with high bandwidth internal busses are able to reach higher throughput

    numbers and bring higher rates to bear.

    The Role of Applications in Determining Acceptable Performance

    Regardless of the capability of a given storage frame, it cannot provide more IO

    than the application requests. Ultimately, the application is the real performance

    driver. For example, say an application generates requests for 2,500 IOPS from a

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    9/64

     

    9EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    storage resource - is there any performance difference at the application level

    between a storage frame capable of delivering 2,500 IOPS and another storage

    frame capable of delivering 10,000 IOPS? Obviously, the answer is a resounding “No”.

    Either resource is capable of servicing the 2,500 IOPS requirement. It’s like traveling in

    a car at 65mph on the freeway –  if everyone obeys the 65 mph speed limit, then, any

    car that goes the speed limit will get you there in the same amount of time –  whether

    it’s a Chevy Lumina or Ferrari Enzo.

    The point we are trying to make is that performance is very much dependent on the

    point of view. Ultimately, performance can be considered good if the application is

    not waiting on the storage frame. Understanding the applications performance

    requirements and providing compatible storage resources ensures maximum

    performance and application productivity. It goes without saying to always be

    cautious about performance claims and spec sheet speeds and feeds. If the

    environment that generated the claims is not identical or does not closely

    approximate your environment, you may very well not see the same performance

    results.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    10/64

     

    1EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Section1: VPLEX Architecture

    VPLEX hardware platform

    A VPLEX system with GeoSynchrony 5.1 is composed of one or two VPLEX clusters:

    one cluster for VPLEX Local systems and two clusters for VPLEX Metro and VPLEX Geo

    systems. These clusters provide the VPLEX AccessAnywhere capabilities.

    Each VPLEX cluster consists of:

      A VPLEX Management Console

      One, two, or four engines

      One standby power supply for each engine

    In configurations with more than one engine, the cluster also contains:

      A pair of Fibre Channel switches

      An uninterruptible power supply for each Fibre Channel switch

      As you add engines you add cache, front-end, back-end, and wan-com

    connectivity capacity as indicated in Table 2 below.

    VPLEX GeoSynchrony 5.1 System Configuration Limits

    Capacity Local Metro Geo

    Maximum virtualized capacity No Known Limit No Known Limit No Known Limit

    Maximum virtual volumes 8,000 16,000 16,000

    Maximum storageelements

    8,000 16,000 16,000

    Minimum/maximum virtual

    volume size

    100MB/32TB 100MB/32TB 100MB/32TB

    Minimum/maximum

    storage volume size

    No VPLEX Limit

    / 32TB

    No VPLEX Limit /

    32TB

    No VPLEX Limit /

    32TB

    Number of host initiators 1600 1600 800

    Table 1

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    11/64

     

    1EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Engine Type ModelCache

    [GB]

    FC speed

    [Gb/s]Engines FC Ports Announced

    VPLEX VS1

    Single 64 8 1 32 10-May-10

    Dual 128 8 2 64 10-May-10

    Quad 256 8 4 128 10-May-10

    VPLEX VS2

    Single 72 8 1 16 23-May-11

    Dual 144 8 2 32 23-May-11

    Quad 288 8 4 64 23-May-11

    Table 2

    Table 1 and Table 2 show the current limits and hardware specifications for the VPLEX

    VS1 and VS2 hardware versions. Although the VS2 engines have half the number of

    ports as VS1 the actual system throughput is improved as each VS2 port can supply

    full line rate (8 Gbps) of throughput whereas the VS1 ports are over-subscribed.

    Several of the VPLEX maximums are determined by the limits of the externallyconnected physical storage frames and therefore unlimited in terms of VPLEX itself.

    The latest configuration limits are published in the GeoSynchrony 5.1 Release Notes

    which are available on Powerlink.EMC.com.

    READ / Write IO Limits

    VPLEX with GeoSynchrony 5.1 can be configured with one to four engines per cluster.

    For a fully configured four-engine VS2 VPLEX cluster the maximums work out as

    follows:

      IOPS up to 3 Million IOPS

     

    GB/S up to 23.2 Gigabytes per second

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    12/64

     

    1EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Section 2: VPLEX Performance Highlights

    Understanding VPLEX overhead

    Properly understanding VPLEX performance capabilities and dependencies will

    greatly benefit many of the design decisions for your VPLEX environment. In general,

    with VPLEX's large per-director cache, host reads are comparable to and, in somecases, better than native array performance. Writes on the other hand, will follow

    VPLEX's write-through caching model on VPLEX Local and Metro will inevitably have

    slightly higher latency than native.

    There are many factors involved in determining if and when latency is added by

    VPLEX. Factors such as host IO dispensation size, IO type, VPLEX internal queue

    congestion, SAN congestion, and array congestion will play a role in whether or not

    latency is introduced by VPLEX. In real world production environments, however,

    what do all of these factors add up to? Let’s take a look at the average latency

    impact. We can break these latencies into the following 3 categories based on the

    type of host IO and whether or not the data resides in VPLEX cache:

    For VPLEX read cache hits, the VPLEX read response time typically ranges from 85-150

    microseconds depending on the overall system load and IO size. In many cases this is

    less latency than the latency of the native storage array and can actually be

    considered a reduction in latency. For local devices VPLEX adds a small amount of

    latency to each VPLEX cache read miss and each write operation:

      Typical Read Miss: About 200-400 microseconds

      Typical Write: About 200-600 microseconds

    These latency values will vary slightly depending on the factors mentioned earlier. For

    example, if there are large block IO requests which must be broken up into smallerparts (based on VPLEX or individual array capabilities) and then written serially in

    smaller pieces to the storage array. Further, if you are comparing native array to

    VPLEX performance, it will be heavily dependent on the overall load on the array. If

    you have an array that is under cache pressure, adding VPLEX to the environment

    can actually improve read performance. The additive cache from VPLEX may

    offload a portion of read IO from the array, thereby reducing average IO latency.

    Additional discussion on this topic is provided later in the subsequent host and

    storage sections.

    Native vs. VPLEX Local Performance

    Native performance tests use a direct connection between a host and storage-

    array. VPLEX Local testing inserts VPLEX in the path between the host and array.

    4KB Random Read Hit

    Random read hits are tested over a working set size that fits entirely into array or

    VPLEX cache.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    13/64

     

    1EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Figure 2 VPLEX Random 4KB Read Hit Native vs. VPLEX

    This test actually reveals that VPLEX performs faster than the storage frame used in

    this test. Here the lower the latency the better! Though this may not be typical ofevery situation, it does help illustrate a case where VPLEX read hits can help to

    improve overall storage environment performance.

    4KB Random Read Miss

    Random read misses by-pass VPLEX cache (but can be a cache hit on the array)

    because of their large working set size. Here we see the typical VPLEX read miss

    overhead of about 300 microseconds.

    Figure 3 Random Read Miss Native vs. VPLEX

    http://localhost/var/www/apps/conversion/tmp/scratch_2/%1D%C6%8B/Ze%1C%EF%BE%AB%EF%A0%92%DA%9E%06nBn%D6%9D-QgtN%EA%86%B3%01[y.*n?%DD%B7%C3%90%02%CF%81%0F%E0%B0%AEW%E6%8E%9A%E2%B2%90%E0%B7%BE%E7%A4%9E%C5%BA

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    14/64

     

    1EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Figure 4 Random 4KB Write Native vs. VPLEX

    4KB IO are used in our tests to illustrate a high number of IO operations. In Figure 4 we

    see VPLEX write overhead of about 350 microseconds. This test reveals that VPLEX

    adds a measureable but relatively small impact to each IO. Here the lower the

    latency the better.

    Figure 5 128 KB Sequential Write

    128 KB IO are used in our tests to illustrate high throughput (bandwidth) operations. In

    Figure 5 we see an average VPLEX write overhead of about 500 microseconds. This

    test reveals that VPLEX adds a measureable but relatively small impact to each IO.

    Here the lower the latency the better.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    15/64

     

    1EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    OLTP Workload Example

    Our synthetic OLTP heavy workload (called OLTP2HW below) benchmark workload is

    a mix of 8KB and 64KB IO request sizes, with a 1:1 ratio of reads and writes.

    In this test, the application demonstrates slightly more host latency compared to

    native with VPLEX. The additional latency overhead is about 600 microseconds.

    Native vs. VPLEX Metro Performance

    VPLEX Metro write performance is highly dependent upon the WAN round-trip-time

    latency (RTT latency). The general rule of thumb for Metro systems is host write IO

    latency will be approximately 1x-3x the WAN round-trip-time. While some may viewthis as overly negative impact, we would caution against this view and highlight the

    following points. First, VPLEX Metro uses a synchronous cache model and therefore is

    subject to the laws of physics when it comes to data replication. In order to provide

    a true active-active storage presentation it is incumbent on VPLEX to provide a

    consistent and up to date view of data at all times. Second, many workloads have a

    considerable read component, so the net WAN latency impact can be masked by

    the improvements in the read latency provided by VPLEX read cache. This is another

    reason that we recommend a thorough understanding of the real application

    workload so as to ensure that any testing that is done is applicable to the workload

    and environment you are attempting to validate.

    In comparing VPLEX Metro to native array performance it is important to ensure that

    the native array testing is also synchronously replicating data across an equal

    distance and WAN link as VPLEX. Comparing Metro write performance to a single

    array that is not doing synchronous replication is an apples to bananas comparison.

    http://localhost/var/www/apps/conversion/tmp/scratch_2/%E0%A7%8B%EF%BE%9B%E9%AE%9E%CE%B1%19W%CA%9F%04%E7%92%9B%C3%8A%C5%9F%EC%8A%8E!P%E0%AD%B1%EB%AF%B3%EE%9B%AD%D5%ACm%E4%A9%B6%E7%B7%B1%E6%BB%A8%E3%80%92%DE%B7r%%E3%A7%A3VG=%16uS%E7%96%95%E7%97%A1%E3%9F%B5_.%E2%8F%9C%E3%A9%A2%E2%AD%93%14%EE%8C%B2xSb%CA%BB

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    16/64

     

    1EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Figure 6 Metro WAN Latency Impact

    Figure 6 illustrates the impact of WAN latency on VPLEX Metro. As WAN latency is

    added there is a corresponding impact on write IO. The OLTP (green) lines show a

    simulated OLTP application (8KB and 64 KB IO with roughly equal read and write IO)

    and the overall impact of WAN latency with VPLEX Metro.

    Write throughput-intensive applications such as back-ups need to be aware of

    maximum available WAN bandwidth between VPLEX clusters. If the write workload

    exceeds the WAN link bandwidth, response time will spike, and other applications

    may also see severe performance degradations.

    Native vs. VPLEX Geo Performance

    Given the fundamental architectural differences of VPLEX Geo from Local and

    Metro, namely its write-back caching model and asynchronous data replication, it's

    even more difficult to accurately compare native array performance to VPLEX Geo

    performance.

    In short, VPLEX Geo performance will be limited to the available drain-rate, which is a

    function of the available WAN bandwidth and storage-array performance at each

    cluster. If a VPLEX director's incoming host write rate exceeds what the outgoing write

    rate can achieve, inevitably there will be push back or throttling that occurs on the

    host, which will negatively affect host per operation write latency causing it to rise.Ensure the WAN and arrays are properly configured, and various VPLEX Geo related

    settings are tuned properly.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    17/64

     

    1EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Section 3: Hosts and Front-end Connectivity

    There are certain baseline configuration recommendations when using VPLEX to

    provision virtual storage to hosts. These considerations include how many paths

    through the fabric are allocated to the host, how many host ports to use, how to

    spread the hosts across VPLEX directors, logical unit number (LUN) mapping, and thecorrect size of virtual volumes to use. Maximizing connectivity and following best

    practices for inline devices such as VPLEX will optimize performance for your virtual

    storage environment.

    Host Environment

    When configuring a new host for VPLEX, the first step is to determine the EMC

    supported operating system, driver, firmware, and supported host bus adapters in

    order to prevent unexpected problems due to untested configurations. Consult the

    VPLEX Simple Support Matrix prior to bringing a new host into VPEX for recommended

    levels. The VPLEX support matrix is available at: http://powerlink.emc.com  or at

    Support.EMC.com 

    In addition, verify that all host path management software is enabled and operating

    correctly.

    Path management applications should be set as follows:

    Operating System Recommended Policy

    Hewlett-Packard HPUX  PVLinks set to Failover.

    VMware ESX  Set NMP policy to Fixed

    IBM AIX  Native MPIO set to Round Robin

    All Linux  MPIO set to Round Robin Load Balancing

    All Veritas DMP Balanced Policy with parititonsize set to 16MB

    All platforms using Powerpath  EMC Powerpath policy set to Adaptive

    Table 3

    Note: The most current and detailed information for each host OS is provided in the

    corresponding Host Connectivity Guides on Powerlink at: http://powerlink.emc.com 

    Host Paths

    EMC recommends that you limit the total number of paths that the multipathingsoftware on each host is managing to four paths, even though the maximum

    supported is considerably more than four. Following these rules helps prevent many

    issues that might otherwise occur and leads to improved performance.

    http://localhost/var/www/apps/conversion/tmp/scratch_2/%CA%AF+%E6%85%A2%D4%A1%E1%B3%A9i%D8%BF%E1%A5%98%E5%A9%ACK%E4%95%BCO%0FZB%D1%AC%E2%91%B2iUF%EF%BB%97!%CF%B7%E5%B6%B4I3http://localhost/var/www/apps/conversion/tmp/scratch_2/%CA%AF+%E6%85%A2%D4%A1%E1%B3%A9i%D8%BF%E1%A5%98%E5%A9%ACK%E4%95%BCO%0FZB%D1%AC%E2%91%B2iUF%EF%BB%97!%CF%B7%E5%B6%B4I3http://localhost/var/www/apps/conversion/tmp/scratch_2/ndTO%E3%A6%A8+%C5%AF%E3%A0%A4%03%E0%A2%87%E2%A9%84%EF%A3%AAs%ED%8E%B6%0C~j-a%E8%B2%BF%7F%E8%8D%937%EE%A5%B2%E3%9F%BE9http://localhost/var/www/apps/conversion/tmp/scratch_2/ndTO%E3%A6%A8+%C5%AF%E3%A0%A4%03%E0%A2%87%E2%A9%84%EF%A3%AAs%ED%8E%B6%0C~j-a%E8%B2%BF%7F%E8%8D%937%EE%A5%B2%E3%9F%BE9http://localhost/var/www/apps/conversion/tmp/scratch_2/ndTO%E3%A6%A8+%C5%AF%E3%A0%A4%03%E0%A2%87%E2%A9%84%EF%A3%AAs%ED%8E%B6%0C~j-a%E8%B2%BF%7F%E8%8D%937%EE%A5%B2%E3%9F%BE9http://localhost/var/www/apps/conversion/tmp/scratch_2/ndTO%E3%A6%A8+%C5%AF%E3%A0%A4%03%E0%A2%87%E2%A9%84%EF%A3%AAs%ED%8E%B6%0C~j-a%E8%B2%BF%7F%E8%8D%937%EE%A5%B2%E3%9F%BE9http://localhost/var/www/apps/conversion/tmp/scratch_2/%CA%AF+%E6%85%A2%D4%A1%E1%B3%A9i%D8%BF%E1%A5%98%E5%A9%ACK%E4%95%BCO%0FZB%D1%AC%E2%91%B2iUF%EF%BB%97!%CF%B7%E5%B6%B4I3

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    18/64

     

    1EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    The major reason to limit the number of paths available to a host from the VPLEX is for

    error recovery, path failover, and path failback purposes. These are also important

    during the VPLEX non-disruptive upgrade (NDU) process. The overall time for

    handling path loss by a host is significantly reduced when you keep the total number

    of host paths to a reasonable number required to provide the aggregate

    performance and availability. Additionally, the consumption of resources within the

    host is greatly reduced each time you remove a path from path management

    software.

    During NDU, there are intervals where only half of the VPLEX directors and associated

    front-end ports (on first and second upgraders, respectively) are available on the

    front-end fabric. NDU front-end high availability checks ensure that the front-end

    fabric is resilient against single points of failure during the NDU, even when either the

    first or second upgrader front-end ports are offline.

    From a host pathing perspective there are two types of configurations:

      High availability configurations  –   VPLEX configurations that include sufficient

    redundancy to avoid data unavailability during NDU, even in the event of

    front-end fabric or port failures. The NDU high-availability pre-checks succeedfor these configurations.

      Minimal configurations  –   VPLEX configurations that do not include sufficient

    redundancy to avoid data unavailability in the event of front-end fabric or

    port failures.

    For minimal configurations, the NDU high-availability pre-checks fail. Instead, the pre-

    checks for these configurations must be performed manually. This can take a

    considerable amount of time in large environments and in general EMC believes that

    the benefits in lower port count requirements are not justified based on the increased

    operational impact.

    High availability configurations

    VPLEX Non-Disruptive Upgrade (NDU) automated pre-checks verify that VPLEX is

    resilient in the event of failures while the NDU is in progress.

    In high availability configurations:

      In dual- or quad-engine systems, each view has front-end target ports across

    two or more engines in the first upgrader set (A directors), and two or more

    engines in the second upgrader set (B directors).

      In single-engine systems, each initiator port in a view has a path to at least one

    front-end target port in the first upgrader (A director) and second upgrader (Bdirector). (See 7).

    There are two variants of front-end configurations to consider for high availability that

    will pass the high-availability pre-checks:

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    19/64

     

    1EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

      An optimal configuration for a single engine cluster is one in which there are

    redundant paths (dotted and solid lines in 7) between both front end fabrics

    and both directors. In addition to protecting against failures of an initiator port,

    HBA, front-end switch, VPLEX front-end, or director, these redundant paths also

    protect against front end port failures during NDU.

      A high-availability configuration for a single-engine cluster is one in which

    there is a single path between the front end fabrics and the directors (solid

    lines in 7). Like the optimal configuration described above, a high-availability

    configuration protects against failures of initiator ports, HBAs, front-end

    switches, and director failures during NDU.

    A high availability configuration provides protection against front-end port failures

    during NDU.

    Figure 7 High Availability Front-end Configuration (single-engine)

    Minimal configurations

    A minimal configuration is not considered highly-available, and the automated NDU

    pre-check will not pass. For a single-engine cluster, a minimal configuration is one in

    which each fabric has a single path to a single director. Minimal configurations

    support failover, but have no redundancy during NDU.

    Strict high-availability pre-checks for front-end and back-end connectivity have

    been implemented in VPLEX 5.1 and higher. If the high-availability pre-check detectsone or more storage views do not conform to the front-end high availability

    requirements of NDU, it will specify which storage views are in question. For example:

    Error: Storage view /clusters/cluster-2/exports/storage-views/lsca3195_win2k3 does not have target ports from two or more directorsin the second upgrader set at cluster-2.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    20/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Update these views to satisfy the high availability requirement. Ensure the storage-

    view in question has front-end target ports across two or more engines in the first

    upgrader set (A directors) and second upgrader set (B directors).

    Figure8 illustrates a single-engine cluster with a minimal front-end configuration:

    Figure 8 Minimal Front-end Configuration

    For minimal configurations, the automated high-availability pre-checks fail, and must

    be performed manually. Refer to the VPLEX Procedure Generator documentation onupgrading and the necessary manual pre-checks, commands, and options for

    minimal configurations.

    Host to director connectivity

    VPLEX caching algorithms send cache coherency messages between directors via

    the internal fibre channel networks or via the built-in CMI bus contained within each

    engine chassis. The CMI bus is a low latency, high speed communication bus. It

    allows two directors within the same engine to directly communicate. The

    recommendation when two or more VPLEX engines are available is to connect to

    two directors per host. In addition, ensure each host is connected to an A and a B

    director on different VPLEX engines. There are certainly possible exceptions to thetwo director connectivity rule. For example, a heavy IO workload (OLTP) server with

    4 or more adapter ports the director connectivity would be need to be at least 4 or

    possibly 8 directors. The key take away is VPLEX system performance under normal

    loads will be virtually equivalent whether you use directors in the same engine or two

    directors in different engines. The benefits from the added availability tip the scale in

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    21/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    favor of connecting hosts to one director on two different engines. In general,

    consuming just 2 directors per host will provide the best overall scalability and

    balance of resource consumption for your VPLEX system.

    Figure 9 Current VPLEX NDU Enforced Single and Dual Engine Connectivity

    Note: For code releases through VPLEX GeoSynchrony code version 5.1 Patch 3, thenon-disruptive upgrade pre-check strictly enforces connecting hosts across 4

    directors with 2 and 4 engine VPLEX systems. This restriction will likely be relaxed in

    future releases to better align with the reasoning presented above.

    When considering attaching a host to more than two directors in a dual-engine or

    quad-engine VPLEX configuration, both the performance and the scalability of the

    VPLEX complex should be considered. Though this may contradict what the

    automated NDU will accept, this guidance for the following reasons:

      Utilizing more than two directors per host increases cache update traffic

    among the directors

     

    Utilizing more than two directors per host decreases probability of read cache

    hits on the ingress director.

      Based on the reliability and availability characteristics of VPLEX hardware,

    attaching a host to just two directors provides a high availability configuration

    without unnecessarily impacting performance and scalability of the solution

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    22/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    General Best practice considerations for multipath software:

      With EMC Powerpath the pathing policy should be set up for Adaptive mode.

      Avoid connecting to multiple A directors and multiple B directors with a single

    host or host cluster.

      Avoid a single IO policy for round-robin IO. Alternating every single IO across

    directors is not an efficient situation for cache-coherency traffic, and defeats

    the VPLEX director's read-ahead cache pre-fetch. When using a round-robin

    policy, set the burst or stream count to something greater than one so more

    consecutive I/O's are sent to the same director before another director is

    chosen.

      For Veritas DMP, using the balanced policy with a partitionsize value of 16MB is

    the optimal for VPLEX director cache-coherency.

      Separate latency sensitive applications from each other, preferably using

    independent directors and independent front-end ports.

      For VPLEX Metro-FC cross-connect solutions, be aware of which path(s) the

    hosts are using, and configure the hosts to prefer the local paths over the

    remote paths

    Host Path Monitoring

    Host IO monitoring tools are available across virtually every open systems OS

    supported by VPLEX. In particular, EMC Powerpath provides a consistent set of

    commands and outputs across operating systems such as AIX, Linux, VMware, and

    Windows.

    Individual host path performance can be monitored using the powermt display 

    command:

    Example 1 - Windows path monitoring with Powerpath for Windows 

    powermt display dev=all

    Pseudo name=harddisk12Invista ID=FNM00103600####Logical device ID=6000144000000010A001ED129296E028state=alive; policy=ADaptive; priority=0; queued-IOs=0==============================================================================---------------- Host --------------- - Stor - -- I/O Path - -- Stats ---### HW Path I/O Paths Interf. Mode State Q-IOs Errors==============================================================================

    4 port4\path0\tgt0\lun10 c4t0d10 04 active alive 8 0

    Also, latency by path is available with the powermt display latency command:

    powermt display latency

    Invista logical device count=86==============================================================================

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    23/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    ----- Host Bus Adapters --------- ------ Storage System ---- - Latency (us) -### HW Path ID Interface Current Max==============================================================================3 port3\path0 FNM0010360#### 01 0 04 port4\path0 FNM0010360#### 04 0 0

    Policy based path monitoring

    There are many situations in which a host can lose one or more paths to storage. If

    the problem is isolated to that one host, it might go unnoticed until an upgrade to

    VPLEX or when a SAN event occurs that causes the remaining paths to go offline,

    such as a switch failure, or routine switch maintenance. This can lead to poor

    performance or, worse yet, a data unavailability event, which can seriously affect

    your business. To prevent this loss-of-access event from happening, many users have

    found it useful to implement automated path monitoring using path management

    software like EMC Powerpath or Veritas DMP or to create custom scripts that issue

    path status commands and then parse the output for specific key words that then

    trigger further script action.

    For EMC Powerpath you can turn on path latency monitoring and define a threshold

    to simply stop using a specific path.

    Example 2 –  Automated latency monitoring with Powerpath 

    powermt set path_latency_monitor=on|off

    powermt set path_latency_threshold=

    It is also possible to set an autorestore policy with Powerpath so that any paths that

    drop offline are brought back online if they are healthy.

    Example 3 –  Auto restore paths 

    powermt set periodic_autorestore=on|off

    Each of these commands can provide hosts with self-monitoring and self-recovery to

    provide the greatest resiliency and available possible for each host. This command

    can be combined with a scheduler, such as cron, and a notification system, such as

    an e-mail, to notify SAN administrators and system administrators if the number of

    paths to the system changes.

    For Veritas DMP there are recovery settings that control how often a path will beretried after failure. If these are not the default settings on your hosts, you should set

    the following on any hosts using DMP:

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    24/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Example 4 - DMP Tuning Parameters

    vxdmpadm setattr enclosure emc-vplex0 recoveryoption=throttle iotimeout=30vxdmpadm setattr enclosure emc-vplex0 dmp_lun_retry_timeout=30

    The values shown in Example 4 specify a retry 30s period for handling transient errors.

    When all paths to a disk fail (such as during a VPLEX NDU), there may be certain

    paths that have a temporary failure and are likely to be restored soon. If IOs are notretried for a non-zero period of time, the IO may be failed by the application layer.

    The DMP tunable dmp_lun_retry_timeout can be used for more robust handling of

    such transient errors. If the tunable is set to a non-zero value, I/Os to a disk with all

    failed paths will be retried until the specified dmp_lun_retry_timeout interval or

    until the I/O succeeds on one of the paths, whichever happens first. The default

    value of the tunable is 0, which means that the paths are probed only once.

    VPLEX Real-time GUI Performance Monitoring Stats

    The Unisphere for VPLEX UI contains several key performance statistics for host

    performance and overall health. They can be found on the Performance

    Dashboard tab and can be added to the default performance charts that are

    displayed. Using the data provided, the VPLEX administrator can quickly determine

    the source of performance problems within an environment. Figure 10, below shows

    the performance data included in the GeoSynchrony 5.1 version of VPLEX.

    Figure 10 VPLEX Real-time Performance Data 

    Unisphere for VPLEX Performance Data Details

      Back-end Latency  –  time in microseconds for IO to complete with physical

     storage frames.

      CPU Utilization - % busy of the VPLEX directors in each engine. 50% or less is

    considered ideal

     

    Front-end Aborts  – 

     SCSI aborts received from hosts connected to VPLEX front-end ports. 0 is ideal.

      Front-end Bandwidth - Total IO as measured in MB per second from hosts to

    VPLEX.

      Front-end Latency  –  time in microseconds for IO to complete between VPLEX

    and hosts. Very dependent on backend array latency.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    25/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

      Front-end Throughput  –  Total IO as measured in IO per second.

      Rebuild Status  –  Completion status of local and remote device rebuild jobs.

      Subpage Writes  –  Number of writes that < 4KB. This statistic has taken on a very

    diminished importance for VPLEX Local and Metro systems running

    GeoSynchrony 5.0.1 and later code. For VPLEX Geo, this is still a very relevant

    metric.

      WAN Link Usage  –  IO between VPLEX clusters as measured in MB per second.

    This chart can be further sub divided into system, rebuild, and distributed

    volume write activity.

      WAN Link Performance  –  IO between VPLEX clusters as measured in IO per

     second.

    Figure 11 UniSphere for VPLEX Performance Dashboard

    Figure 11 shows the VPLEX Performance Dashboard which provides continuous real

    time data for 10 key performance metrics over a continuously updated 5 minute

    window. Each of the charts can be added, moved, or removed from the display tomeet a wide variety of monitoring needs.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    26/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Remote Monitoring and Scripting

    VPLEX has a RESTful API and supports SNMP monitoring via third party SNMP

    monitoring tools. The VPLEX MIB is available on the VPLEX Management Server in the

    following directory:

    /opt/emc/VPlex/mibs

    Today there are a limited set of performance categories today for SNMP. Using REST

    API to access VPLEX allows virtually any command that can be run locally on a VPLEX

    to be run remotely. This enables integration with Microsoft PowerShell and with

    VMware vCOPS. Refer to VPLEX 5.1 Administrators Guide -- Performance Monitoring

    Chapter for more details.

    Watch4Net

    Comprehensive historical and trending data along with custom dashboard views will be

    provided for VPLEX by Watch4Net. Watch4Net will likely be supported very shortly after this

    document is published.

    Perpetual Logs

    VPLEX maintains a perpetual log of over 50 different performance statistics on the

    VPLEX management server. There are 10 of these files for each VPLEX director and

    they roll at the 10 MB mark. The perpetual log files contain comma separated data

    that can be very easily imported to MS Excel for aggregation, reporting and historical

    trending analysis. The files are located in the /var/log/VPlex/cli directory as show in

    below:

    service@RD-GEO-2-1:/var/log/VPlex/cli> ll | grep PERPETUAL-rw-r--r-- 1 service users 3374442 2012-11-17 05:41 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log-rw-r--r-- 1 service users 10485855 2012-11-14 18:38 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log.1-rw-r--r-- 1 service users 10485864 2012-09-10 01:25 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log.10-rw-r--r-- 1 service users 10486060 2012-11-07 03:33 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log.2-rw-r--r-- 1 service users 10485825 2012-10-30 12:14 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log.3-rw-r--r-- 1 service users 10485922 2012-10-22 21:38 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log.4-rw-r--r-- 1 service users 10486009 2012-10-15 12:43 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log.5-rw-r--r-- 1 service users 10486000 2012-10-08 05:20 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log.6-rw-r--r-- 1 service users 10486298 2012-10-01 03:31 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log.7-rw-r--r-- 1 service users 10486207 2012-09-24 02:24 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log.8-rw-r--r-- 1 service users 10485969 2012-09-17 00:24 director-1-1-A_PERPETUAL_vplex_sys_perf_mon.log.9-rw-r--r-- 1 service users 2467450 2012-11-17 05:41 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log-rw-r--r-- 1 service users 10485770 2012-11-15 11:39 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log.1-rw-r--r-- 1 service users 10486183 2012-09-07 09:02 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log.10-rw-r--r-- 1 service users 10485816 2012-11-08 01:10 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log.2-rw-r--r-- 1 service users 10485977 2012-10-31 13:49 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log.3-rw-r--r-- 1 service users 10486275 2012-10-24 01:25 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log.4-rw-r--r-- 1 service users 10485793 2012-10-16 07:27 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log.5-rw-r--r-- 1 service users 10486230 2012-10-08 07:56 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log.6-rw-r--r-- 1 service users 10485762 2012-09-30 05:37 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log.7

    -rw-r--r-- 1 service users 10485807 2012-09-22 01:53 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log.8-rw-r--r-- 1 service users 10486077 2012-09-14 14:31 director-1-1-B_PERPETUAL_vplex_sys_perf_mon.log.9

    Benchmarking Applications, Tools and Utilities

    There are good benchmarking applications out there, and there are not-so-good

    ones. Testers use tools they can trust. The following section reviews several

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    27/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    benchmarking tools that are useful (and not so useful) when testing VPLEX

    performance in your environment.

    Good benchmarks

    IOMeter

    IOMeter is one of the most popular public domain benchmarking tools amongstorage vendors, and is primarily a Windows-based tool. It is available

    from http://www.iometer.org.  In the Benchmarking section  of this document we

    provide some examples of IOMeter settings that are used to simulate specific

    workloads for testing.

    The popularity of IOMeter holds true at EMC. Many internal teams, including the

    VPLEX Performance Engineering team use IOMeter and are familiar with its behavior,

    input parameters, and output. That being said, the IO patterns, queue depths, and

    other tunables can be misused and distorted. It’s impor tant to maintain healthy

    skepticism from any benchmark numbers you see until you know the full details of the

    settings and overall testing parameters.

    Warning: It's not recommended to run the IO client (dynamo) on Linux. Dynamo does

    not appear to function completely as expected. It's best to use Windows clients with

    Dynamo.

    IOZone

    Broad operating system support, however primarily file system based. It is available

    for free from http://www.iozone.org. 

    http://localhost/var/www/apps/conversion/tmp/scratch_2/x%1A%E5%80%ACv%E2%80%81%E4%B7%B1%E3%B4%A8http://localhost/var/www/apps/conversion/tmp/scratch_2/x%1A%E5%80%ACv%E2%80%81%E4%B7%B1%E3%B4%A8http://localhost/var/www/apps/conversion/tmp/scratch_2/x%1A%E5%80%ACv%E2%80%81%E4%B7%B1%E3%B4%A8http://localhost/var/www/apps/conversion/tmp/scratch_2/(Bib%EB%AD%B6%20%22o%0141%E3%AA%AC%EF%BF%BD%06%EB%86%8Ed%044+k%E9%BD%8F%E9%81%9F%EF%89%93%E8%9D%AD%EB%87%A3%1A%E3%9A%80http://localhost/var/www/apps/conversion/tmp/scratch_2/(Bib%EB%AD%B6%20%22o%0141%E3%AA%AC%EF%BF%BD%06%EB%86%8Ed%044+k%E9%BD%8F%E9%81%9F%EF%89%93%E8%9D%AD%EB%87%A3%1A%E3%9A%80http://localhost/var/www/apps/conversion/tmp/scratch_2/(Bib%EB%AD%B6%20%22o%0141%E3%AA%AC%EF%BF%BD%06%EB%86%8Ed%044+k%E9%BD%8F%E9%81%9F%EF%89%93%E8%9D%AD%EB%87%A3%1A%E3%9A%80http://localhost/var/www/apps/conversion/tmp/scratch_2/(Bib%EB%AD%B6%20%22o%0141%E3%AA%AC%EF%BF%BD%06%EB%86%8Ed%044+k%E9%BD%8F%E9%81%9F%EF%89%93%E8%9D%AD%EB%87%A3%1A%E3%9A%80http://localhost/var/www/apps/conversion/tmp/scratch_2/x%1A%E5%80%ACv%E2%80%81%E4%B7%B1%E3%B4%A8

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    28/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    iorate

    Initially implemented by EMC, iorate has been released to the public as open source.

    Available for free from http://iorate.org/ 

    fio

    io is an I/O tool meant to be used both for benchmark and stress/hardware

    verification. It has support for 13 different types of I/O engines, I/O priorities (for

    newer Linux kernels), rate I/O, forked or threaded jobs, and much more. It can work

    on block devices as well as files. fio is a tool that will spawn a number of threads or

    processes doing a particular type of I/O action as specified by the user. The typical

    use of fio is to write a job file matching the I/O load one wants to simulate. Available

    for free from http://freecode.com/projects/fio  

    Additional info: http://linux.die.net/man/1/fio 

    Poor benchmarks

    In general, any single outstanding I/O or filesystem focused benchmarks are not

    good choices.

    Unix dd test

    dd is completely single-threaded or single outstanding I/O.

    The dreaded dd test:

    Bonnie

    Bonnie was designed to test UNIX file systems and is over 20 years old.

    Bst5 or "Bart's stuff test"Bst5 is single outstanding I/O. http://www.nu2.nu/bst/ 

    File copy commands

    http://localhost/var/www/apps/conversion/tmp/scratch_2/p2%05%03%7Dn[%EF%90%81%3EmLZM%1Ep%E1%99%8DNI%E1%93%AA%EC%AF%A3%E6%A0%B2%D8%82%EF%A2%91%15-n%E3%9F%A3%C7%AChttp://localhost/var/www/apps/conversion/tmp/scratch_2/p2%05%03%7Dn[%EF%90%81%3EmLZM%1Ep%E1%99%8DNI%E1%93%AA%EC%AF%A3%E6%A0%B2%D8%82%EF%A2%91%15-n%E3%9F%A3%C7%AChttp://localhost/var/www/apps/conversion/tmp/scratch_2/p2%05%03%7Dn[%EF%90%81%3EmLZM%1Ep%E1%99%8DNI%E1%93%AA%EC%AF%A3%E6%A0%B2%D8%82%EF%A2%91%15-n%E3%9F%A3%C7%AChttp://localhost/var/www/apps/conversion/tmp/scratch_2/%EC%89%BFV%E8%A2%B9%E7%9F%83_%EF%AC%99S%D3%B6R%E1%B7%85%E8%8D%91%E2%A1%A5%06%E0%BD%9F%E7%AC%AD%E7%82%B4%E9%9A%B6bR%1C%CA%BD%EB%88%A6%E3%94%9E%EB%8E%AB%E5%8C%9F%E5%8A%94P%EE%80%9Dhttp://localhost/var/www/apps/conversion/tmp/scratch_2/%EC%89%BFV%E8%A2%B9%E7%9F%83_%EF%AC%99S%D3%B6R%E1%B7%85%E8%8D%91%E2%A1%A5%06%E0%BD%9F%E7%AC%AD%E7%82%B4%E9%9A%B6bR%1C%CA%BD%EB%88%A6%E3%94%9E%EB%8E%AB%E5%8C%9F%E5%8A%94P%EE%80%9Dhttp://localhost/var/www/apps/conversion/tmp/scratch_2/%EC%89%BFV%E8%A2%B9%E7%9F%83_%EF%AC%99S%D3%B6R%E1%B7%85%E8%8D%91%E2%A1%A5%06%E0%BD%9F%E7%AC%AD%E7%82%B4%E9%9A%B6bR%1C%CA%BD%EB%88%A6%E3%94%9E%EB%8E%AB%E5%8C%9F%E5%8A%94P%EE%80%9Dhttp://localhost/var/www/apps/conversion/tmp/scratch_2/%0E%0B%DE%BF%E3%BD%A0%5E%EB%82%B0%E9%A5%A0%E0%A8%A9%3C%08%EF%A7%A2%EC%BE%96%%E4%A6%B3%C7%89%05]%E1%AF%BB%CA%A6W%CD%BA%7D%EC%9F%B1%D9%8Ehttp://localhost/var/www/apps/conversion/tmp/scratch_2/%0E%0B%DE%BF%E3%BD%A0%5E%EB%82%B0%E9%A5%A0%E0%A8%A9%3C%08%EF%A7%A2%EC%BE%96%%E4%A6%B3%C7%89%05]%E1%AF%BB%CA%A6W%CD%BA%7D%EC%9F%B1%D9%8Ehttp://localhost/var/www/apps/conversion/tmp/scratch_2/%0E%0B%DE%BF%E3%BD%A0%5E%EB%82%B0%E9%A5%A0%E0%A8%A9%3C%08%EF%A7%A2%EC%BE%96%%E4%A6%B3%C7%89%05]%E1%AF%BB%CA%A6W%CD%BA%7D%EC%9F%B1%D9%8Ehttp://localhost/var/www/apps/conversion/tmp/scratch_2/Sw%E2%8B%86C4B%EB%85%B5%E8%94%9EY%EC%84%BE%E6%96%9FM)XR%02%E2%92%B2![%E9%9B%92%DE%91%1F%E4%A5%A2%E5%88%B9%EC%B6%BAk%E2%80%80http://localhost/var/www/apps/conversion/tmp/scratch_2/Sw%E2%8B%86C4B%EB%85%B5%E8%94%9EY%EC%84%BE%E6%96%9FM)XR%02%E2%92%B2![%E9%9B%92%DE%91%1F%E4%A5%A2%E5%88%B9%EC%B6%BAk%E2%80%80http://localhost/var/www/apps/conversion/tmp/scratch_2/Sw%E2%8B%86C4B%EB%85%B5%E8%94%9EY%EC%84%BE%E6%96%9FM)XR%02%E2%92%B2![%E9%9B%92%DE%91%1F%E4%A5%A2%E5%88%B9%EC%B6%BAk%E2%80%80http://localhost/var/www/apps/conversion/tmp/scratch_2/y%E1%A0%8BQn%E2%97%8D%DD%B2.%EB%88%BF%E9%BC%AE+%1Eo%E3%B5%AB%EC%96%98%15%E1%8B%82%EE%A1%A4Mn%EB%98%A4H%D6%9C%EC%9E%87%D1%B3M5(%E6%B7%B3%E2%A9%A4%E1%A4%8EPk#%s%E2%87%81%DE%B1%EB%86%9BxR%CF%82%E3%AE%B6lM%E5%97%9DP%ED%8E%83^Nf.http://localhost/var/www/apps/conversion/tmp/scratch_2/%0E%0B%DE%BF%E3%BD%A0%5E%EB%82%B0%E9%A5%A0%E0%A8%A9%3C%08%EF%A7%A2%EC%BE%96%%E4%A6%B3%C7%89%05]%E1%AF%BB%CA%A6W%CD%BA%7D%EC%9F%B1%D9%8Ehttp://localhost/var/www/apps/conversion/tmp/scratch_2/%EC%89%BFV%E8%A2%B9%E7%9F%83_%EF%AC%99S%D3%B6R%E1%B7%85%E8%8D%91%E2%A1%A5%06%E0%BD%9F%E7%AC%AD%E7%82%B4%E9%9A%B6bR%1C%CA%BD%EB%88%A6%E3%94%9E%EB%8E%AB%E5%8C%9F%E5%8A%94P%EE%80%9Dhttp://localhost/var/www/apps/conversion/tmp/scratch_2/p2%05%03%7Dn[%EF%90%81%3EmLZM%1Ep%E1%99%8DNI%E1%93%AA%EC%AF%A3%E6%A0%B2%D8%82%EF%A2%91%15-n%E3%9F%A3%C7%AC

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    29/64

     

    2EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    These are single threaded and single outstanding I/O. They use a host memory file

    cache, so it is not known when or if a particular file IO hits storage. It is also not clear

    what I/O size the filesystem will happen to choose and so it might be reading and

    writing with inefficient IO sizes. In theory, a multiple file copy benchmark could be

    constructed; however it requires careful parallelism and multiple independent source

    and target locations.

    It is best to separate reads and writes in performance testing. For example, a slow

    performing read source device could penalize a fast write target device. The entire

    copy test would show up as slow. Without detailed metrics into the read and write

    times (not always gathered in a simple "how long did it take" file copy test), the

    wrong conclusions can easily be drawn about the storage solution.

    Note: See Section 8: Benchmarking  for specific testing recommendations and

    example results.

    Benchmarking Applications

    The list of possible application level benchmarking programs is numerous. Some that

    are fairly well known and understood are:

      Microsoft Exchange - JetStress

      Microsoft SQL Server - SQLIO

      Oracle - SwingBench, DataPump, or export/import commands

      VMware - VMbench, VMmark - virtual machine benchmarking tools

    These particular benchmarking applications are potentially one step closer to a

    production application environment; however as all artificially crafted benchmarking

    applications suffer from the fact that at the end of the day they likely are notrepresentative of your environment.

    Engage EMC's application experts when you are interested in a specific application

    benchmark. We stress that these benchmarks also exercise more of the application

    and host IO stack and so they may not be representative of the underlying storage

    devices and could be affected by a lot of things outside the storage layer.

    Application Testing

    Testing with the actual application is the best way to measure storage performance.

    Production-like environment that can stress storage limits is desirable.

    Measure performance of different solutions:

     

    Compare OLTP response times.

      Compare batch run times.

      Compare sustained streaming rates.

    Operating system and application tools can help monitor storage performance.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    30/64

     

    3EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Production Testing

    Ultimately, there must be a level of trust in the solution and the deployment the

    solution in your production environment. When are considering moving an

    application into product there are some risks and rewards.

    Risk vs. Reward:

     

    Risk: taking an unsupported, well-traveled evaluation unit and putting it in aproduction environment could compromise application availability and

    expose unexpected system problems.

      Reward: sometimes this is the only way to know for certain that storage

    performance is acceptable for an application.

    In order to minimize the risk side of the equation, consider a staged approach

    whereby at first non-business critical applications can be virtualized with VPLEX. This is

    a similar approach recommended by VMware in the early stages of host

    virtualization. Go for the low hanging fruit first and then closely monitor the

    performance throughout the process.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    31/64

     

    3EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Section 4: Application Performance Considerations

    When gathering data for planning from the application side, it is important to first

    consider the workload type for the application. If multiple applications or workload

    types will share the system, you need to know the type of workloads of each

    application, and if the applications have both types -- are mixed (transaction-based

    and throughput-based), which workload will be the most critical? Manyenvironments have a mix of transaction-based and throughput-based workloads;

    generally, the transaction performance is considered the most critical. However, in

    some environments, for example, a backup media server environment, the streaming

    high throughput workload of the backup itself is the critical part of the operation. The

    backup database, although a transaction-centered workload, is a less critical

    workload.

    High Transaction environments

    So, what are the traits of transaction-based and high throughput applications? In the

    following sections, we explain these traits in more detail.

    Applications that use high transaction workloads are better known as Online

    Transaction Processing (OLTP) systems. Examples of these systems are database

    servers and mail servers. If you have a database, you tune the server type

    parameters, as well as the database’s logical devices, to meet the needs of the

    database application. If the host server has a secondary role of performing nightly

    backups for the business, you may choose to use a different set of logical devices,

    which are tuned for high throughput for the best backup performance.

    As mentioned in the introduction, you can expect to see a high number of

    transactions and a fairly small IO size in OLTP environments. Different databases use

    different IO sizes for their logs, and these logs vary from vendor to vendor. In allcases, the logs are generally high write-oriented workloads. For table spaces, most

    databases use between a 4 KB and a 16 KB IO size. In certain applications, larger

    chunks (for example, 64 KB) will be moved to host application cache memory for

    processing. VPLEX currently has a fixed 4KB page size and IO in this size range is not

    appreciably impacted by the introduction of VPLEX.

    Understanding how your application is going to handle its IO is critical to laying out

    the data properly at the storage layer. In many cases, the table space is generally a

    large file made up of small blocks of data records. The records are normally

    accessed using small IOs of a  random nature, which can result in about a high

    cache miss ratio. It is important to ensure the backend storage array is able to keepup with the IOPS requirement as well.

    Another point to consider is whether the typical IO is a read or a write. In many OLTP

    environments, there is generally a mix of about 70% reads and 30% writes. However,

    the transaction logs of a database application have a much higher write ratio and,

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    32/64

     

    3EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    therefore, perform better if they are isolated onto dedicated storage volumes.

    VPLEX’s large read cache  benefits this sort of a read portion workload but the log

    volumes will likely not benefit from cache, and therefore will need underlying storage

    volumes that can keep pace with the write workload.

    High Throughput environments

    With high throughput workloads, you have fewer transactions, but much larger IO pertransaction. IO sizes of 128 K or greater are normal, and these IOs are generally

    sequential in nature. Applications that typify this type of workload are imaging,

    video servers, seismic processing, high performance computing (HPC), and backup

    servers.

    When running applications that use larger size I/O, it is important to be aware of the

    extra IO impact that VPLEX will add as a result of breaking up write IO that are larger

    than 128KB. For example, a single 1MB host write would require VPLEX to do 8 x 128KB

    writes out to the backend storage frame. When practical, maximum host and

    application IO size and allocations units for thigh throughput systems should be set

    to128KB or less. An increase the maximum back-end write size to 1MB is expected inthe next major VPLEX code release.

    Best practice: Database table spaces, journals, and logs should not be placed on

    virtual volumes that reside on extents from the same backend storage volume.

    VPLEX Device Geometry

    The typical consumption methodology for back-end storage for the majority of

    applications is to create 1:1 mapped (physical:virtual) devices. If striping (raid-0) or

    concatenated (raid-c) geometries are used then VPLEX devices should beconstructed using storage volumes of similar raid protection and performance

    characteristics. This general purpose rule is applicable to most VPLEX backend

    storage configurations and simplifies the device geometry decision for storage

    administrators. This type of physical storage consumption model enables the

    continued use of array based snap, clones, and remote replication technologies (for

    example, MirrorView, SnapView, SRDF, and TimeFinder)

    It is also important to consider where the failure domains are in the backend storage

    frames and take them into consideration when creating complex device

    geometries. In this context, we define a failure domain  as the set of storage

    elements that will be affected by the loss of a single storage component. We

    strongly advise against the creation of VPLEX devices consisting of striped orconcatenated extents across different backend storage frames. Using different sets

    of back-end storage frames makes the failure domain wider and makes it more

    susceptible to being affected by a single failure. This can also unbalance the I/O,

    and will limit the performance of those striped volumes to the slowest back-end

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    33/64

     

    3EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    device. It is acceptable to use striped (raid-0) volumes with applications and storage

    frames that do not already stripe their data across physical disks.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    34/64

     

    3EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Section 5: Back-end Performance Considerations

    Storage Considerations

    It is of great importance that the selected storage subsystem model is able to support

    the required IO workload. Besides availability concerns, adequate performance

    must be ensured to meet the requirements of the applications, which includeevaluation of the physical drive type (EFD, FC, SATA) used and if the internal

    architecture of the storage subsystem is sufficient. For example, a high speed, Fibre

    Channel 15K rpm drives or Enterprise Flash drives are typically selected for use with

    transaction-based (OLTP) workloads. As for the subsystem architecture, newer

    generations of storage subsystems have larger internal caches, higher bandwidth

    busses, and more powerful storage controllers.

    Storage Array Block Size

    Today VPLEX supports communicating to back-end storage-arrays that advertise a

    512 byte block size. Within VPLEX, the block-size parameter that you see for a

    storage-volume within VPLEX is not the underlying storage-array's supported block-size, but rather it is the VPLEX associated 4KB block size. Each and every volume

    reported by VPLEX today will have 4KB. This has the implications for host to VPLEX IO

    size that were discussed in Section 4.

    Note: VPLEX can and does read/write to back-end arrays at I/O sizes as

    small as 512 bytes as of GeoSynchrony 5.0. 

    SAN Architecture for Storage Array Connectivity

    For back-end (storage) connectivity the recommended SAN topology consists ofredundant (A/B) fabrics. Though EMC does support the use of direct storage to

    VPLEX connectivity, this practice is extremely limited in terms of cost efficiency,

    flexibility and scalability. Direct connect is intended for proof of concept, test,

    development, and / or specific sites that only have a single storage frame. Direct

    connect allows for backend connectivity while reducing the number of required

    switch ports, but as mentioned earlier, the sacrifices in terms of scale and flexibility

    make this a fairly uncommon connectivity scheme. Sites with multiple arrays, existing

    SAN fabrics, or large implementations should plan to utilize dual redundant SAN

    connectivity as it provides the most robust overall solution.

    Note: Direct connect applies only to backend connectivity. Front-end (direct hostto VPLEX) connect is not supported.

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    35/64

     

    3EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Active/Active Arrays

    With Active/Active storage platforms such as EMC VMAX and Symmetrix, Hitachi VSP,

    IBM XIV, and HP 3PAR each director in a VPLEX cluster must have a minimum of two

    paths to every local back-end storage array and to every storage volume presented

    to VPLEX. Each VPLEX director requires physical connections to the back-end

    storage across dual fabrics. Each director is required to have redundant paths to

    every back-end storage array across both fabrics. Otherwise this would create a

    single point of failure at the director level that could lead to rebuilds that

    continuously start/restart and never finish. This is referred to as asymmetric backend

    visibility. This is detrimental when VPLEX is mirroring across local devices (RAID-1) or

    across Distributed Devices (Distributed RAID-1).

    Each storage array should have redundant controllers connected to dual fabrics,

    with each VPLEX controller having a minimum of two ports connected to the back-

    end storage arrays through the dual fabrics (required).

    VPLEX allows a maximum of 4 back-end paths per director to a given LUN. This is

    considered optimal because each director will load balance across the four paths to

    the storage volume. Maximum because VPLEX using more paths to any givenstorage volume or the Initiator, Target, LUN (ITL) would potentially lead to an excess

    ITL nexus per storage volume resulting in the inability to claim or work with the device.

    Exceeding 4 paths per storage volume per director can lead to elongated backend

    path failure resolution, ndu pre-check failures, and decreased scalability.

    High quantities of storage volumes (i.e 1000+ storage volumes) or entire arrays

    provisioned to VPLEX should be divided up into appropriately sized groups (i.e.

    masking views or storage groups) and presented from the array to VPLEX via groups

    of four array ports per VPLEX director so as not to exceed the four active paths per

    VPLEX director limitation. As an example, following the rule of four active paths per

    storage volume per director (also referred to as ITLs), a four engine VPLEX clustercould have each director connected to four array ports dedicated to that director.

    In other words, a quad engine VPLEX cluster would have the ability to connect to 32

    ports on a single array for access to a single device presented through all 32 ports

    and still meet the connectivity rules of 4 ITL’s per director. This can be accomplished

    using only two ports per backend I/O module leaving the other two ports for access

    to another set of volumes over the same or different array ports.

    Appropriateness would be judged based on the planned total IO workload for the

    group of LUNs and the limitations of the physical storage array. For example, storage

    arrays often have limits around the number of LUNs per storage port, storage group,

    or masking view they can have.

    Maximum performance, environment wide, is achieved by balancing IO workload

    across maximum number of ports on an array while staying within the IT limits.

    Performance is not based on a single host but the overall impact of all resources

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    36/64

     

    3EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    being utilized. Proper balancing of all available resources provides the best overall

    performance.

    Storage Best Practices: Create separate port groups within the storage frame for

    each of the logical path groups that have been established. Spread each group

    of four ports across storage array engines for redundancy. Mask devices to allow

    access to the appropriate VPLEX initiators for both port groups.

    Figure 12 shows the physical connectivity from a quad-engine to a hex-engine VMAX

    array.

    Figure 12 –  Active/Active Storage to VPLEX Connectivity

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    37/64

     

    3EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    Similar considerations should apply to other active/active arrays as well as ALUA

    arrays. Follow the array best practices for all arrays including third party arrays.

    The devices should be provisioned in such a way as to create ―digestible ‖  chunks

    and provisioned for access through specific FA ports. The devices within this device

    grouping should restrict access to four specific FA ports for a VPLEX A director port

    group and a different set of FA ports for a VPLEX B director port group.

    The VPLEX initiators (backend ports) on a single director should spread across engines

    to increase HA and redundancy. The array should be configured into initiator groups

    such that each VPLEX director acts as a single host per four paths.

    This could mean four physical paths or four logical paths per VPLEX director

    depending on port availability and whether or not VPLEX is attached to dual fabrics

    or multiple fabrics in excess of two.

    For the example above following basic limits on the VMAX:

      Initiator Groups (IG) (HBAs); max of 32 WWN's per IG; max of 8192 IG's on a

    VMax; set port flags on the IG; an individual WWN can only belong to 1 IG.

      Cascaded Initiator Groups have other IG's (rather than WWN's) as members.

      Port Groups (PG) (FA ports): max of 512 PG's; ACLX flag must be enabled on

    the port; ports may belong to more than 1 PG  Storage Groups (SG) (LUNs / Symm Devs); max of 4096 Symm Devs per SG; a

    Symm Dev may belong to more than 1 SG; max of 8192 SG's on a VMAX

      Masking View consists of an Initiator Group, a Port Group, and a Storage

    Group

    We have divided the backend ports of the VPLEX into two groups allowing us to

    create four masking views on the VMAX. Ports FC00 and FC01 for both directors are

    zoned to two FA‘s each on the array. The WWN‘s of these ports are the members of

    the first Initiator Group and will be part of Masking View 1. The Initiator Group created

    with this group of WWN‘s will become the member of a second Initiator Group which

    will in turn become a member of a second Masking View. This is called CascadingInitiator Groups. This was repeated for ports FC02 and FC03 placing them in Masking

    Views 3 and 4. This is only one example of attaching to the VMAX and other

    possibilities are allowed as long as the rules are followed.

    VPLEX virtual volumes should be added to masking views containing initiators from a

    director A and  initiators from a director B. This translates to a single host with two

    initiators connected to dual fabrics and having four paths into two VPLEX directors.

    VPLEX would access that host‘s storage volumes via eight FA‘s on the array through

    two VPLEX directors (an A director and a B director). The VPLEX A director and B

    director each see four different FA‘s across at least two VMAX engines if available.

    This is an optimal configuration that spreads a single host‘s I/O over the maximum

    number of array ports. Additional hosts will attach to different pairs of VPLEX directors

    in a dual-engine or quad -engine VPLEX cluster. This will help spread the overall

    environment I/ O workload over more switches, VPLEX, and array resources. This

  • 8/20/2019 h11299 Emc Vplex Elements Performance Testing Best Practices Wp

    38/64

     

    3EMC VPLEX: ELEMENTS OF PERFORMANCE AND TESTING BEST PRACTICES

    DEFINED

    would allow for the greatest possible balancing of all resources resulting in the best

    possible environment performance.

    Figure 13 ITL per Storage Volume

    Figure 13 shows the ITLs per Storage Volume. In this example the VPLEX Cluster is a

    single engine and is connected to an active/active array with four paths per Storage

    Volume per Director giving us a total of eight logical paths. The Show ITLs buttondisplays the ports on the VPLEX director from which the paths originate and which FA

    they are connected to.

    Active/ Passive Arrays

    When using a storage array that is operating in active-passive model, each director

    needs to have logical (zoning and masking) and physical connectivity to both the

    active and the passive storage controller. This ensures that VPLEX does not lose

    access to storage volumes if the active controller should fails or is restarted.

    Additionally, arrays like the CLARiiON® have limitations on the size of initiator or

    storage groups. It may be necessary to have multiple storage groups to

    accommodate provisioning storage to the VPLEX. Follow the logical and physical

    connectivity guidelines described earlier in this section.

  • 8/20