srdf latency

SRDF Topology Discussion Document for

Deutsche Bank

of 21

James Ridley /Jan JedynakEMC Corporation VERSION 1.2

of 21

Introduction..........................................................................................................................3

Executive Overview................................................................................................4

SRDF Overview..............................................................................................................6

What is SRDF?............................................................................................................................6

Why SRDF?...................................................................................................................................6

How does SRDF work?..........................................................................................................6

SRDF Storage Protocol used by Deutsche Bank.....................................................7

Where SRDF is used at Deutsche Bank........................................................................7

DWDM Technology..................................................................................................8

Overview.........................................................................................................................................8

Nortel Networks OPTera Metro........................................................................................8

Latency......................................................................................................................................10

SRDF induced delays.............................................................................................................10

Example..........................................................................................................................................11

Recommendations for Handling High Activity Data.........................................12

Types of applications not suited to synchronous replication............................................................................................................................13

SRDF best practices in use at Deutsche Bank......14

Alternative strategies........................................................................................15

1) SRDF Semi-Synchronous mode................................................................................15

2) SRDF Adaptive Copy mode........................................................................................16

3) SRDF Multi-Hop mode..................................................................................................17

4) Oracle8I Automated Standby Database................................................................19

of 21

IntroductionRecent engineering work by EMC has validated the Nortel Optera DWDM (Dense Wave Division Multiplexer) for use with EMC SRDF to a distance of 200KM. This document explains which applications can benefit from this extended distance mirroring, and which cannot. It also offers alternative solutions to support the protection of data if the application cannot support extended distance mirroring.

of 21

Executive OverviewEMC Engineering has recently validated the Nortel Optera DWDM for use with EMC SRDF up to 200KM.

DWDM technology allows for the ‘packaging up’ of multiple SRDF links into a smaller number of physical telecomms fibre cable thus reducing the number of telecomms fibre cables required without reducing the bandwidth or efficency. This leads to greatly reduced costs.

The Nortel Optera DWDM is not the DWDM currently selected by Deutsche Bank ‘New World’.

It is envisaged that this 200KM distance will be increased very shortly after further validation by EMC engineering.

EMC SRDF works by forwarding write IO from a host to a Symmetrix onto a second remote Symmetrix. This is done transparently to the host, which only ‘sees’ a slightly slower write IO.

During normal non-BCP operation, only write IO is sent to the remote Symmetrix. A general rule of thumb is an application does 90-95% reads and only 10-5% writes.

In order to calculate the additional latency we have to add the fixed ‘overhead’ of writing to 2 Symmetrix units, as opposed to 1, plus a variable value according to distance to be replicated. The variable value is proportionate to the speed of light, and it is not envisaged that EMC engineering will be able to improve on this in the near future. The DWDM units and Connectrix switch units required have negligible overhead. This overhead is per write IO.

Applications with a heavy write IO, or during batch runs, may experience ‘IO Queuing’ as write IOs queue to be sent across the SRDF link to the second Symmetrix.

Various host best practises can greatly reduce the potential for IO Queuing, and these are in use by Deutsche Bank.

SRDF can operate in a number of modes, and these modes can be interactively switched on a very granular basis. These different modes can again greatly reduce or even eliminate IO Queuing.

Thirdly, SRDF in conjunction with EMC TimeFinder can be architected into various ‘multi-hop’ and ‘time sync enabled’ solutions.

In summary, SRDF can be extended across greater distances than previously possible using DWDM technology, but this extra distance is not without cost to the efficiency of the applications using it. Detailed examination of the service levels required and the IO profile of the application must be examined to see if

of 21

it is practical to use SRDF over extended distances in synchronous mode. If the application is not suitable for synchronous mode SRDF over the distance then there are other architected solutions available which may provide the required level of protection.

of 21

SRDF Overview

What is SRDF?SRDF generates a mirror image of the data at the logical volume level in one or more remote Symmetrix systems. These remote volumes can be made addressable to remote hosts via software commands. SRDF Synchronous mode (which is the default mode of operation at Deutsche Bank) was first developed for Disaster Recovery within the customer’s campus. SRDF Adaptive Copy modes were later developed to support long distance bulk data transfers for data center relocations and content replication. Technology has evolved to support Wide Area Networking (WAN) and multiple transports, thus increasing distance and throughput for a wider variety of applications of SRDF. Additional customer uses for SRDF include remote data warehousing, remote test beds, remote report generation, remote backup and workload sharing between hosts at the same or geographically remote sites. Why SRDF?SRDF is deployed in several key areas, delivering real benefits to their organizations allowing companies to maintain access to data, so that revenue producing or supporting applications continue to serve business functions. SRDF can be used in several key areas including, but not limited to:Business continuance: business applications continue running despite possible disk failures.Disaster recovery: data recovery at the disaster recovery site in minutes rather than days.Data centre migrations: application outage reduced to minutes instead of hours.Work load migrations: similar to the data centre migrations; especially useful for minimizing outages during preventative maintenance of hardware or software, or even data center powerdowns.Shortening or eliminating backup windows: eliminate the backup window by utilizing SRDF’s second data copy.

How does SRDF work?SRDF works in 3 different modes; synchronous, semi-synchronous, and adaptive copy.

- Synchronous. Data on the source (R1) and target (R2) volumes are always fully synchronized at the completion of an I/O sequence

- Semi-synchronous. Data on remotely mirrored volumes are always synchronized between the source (R1) and the target (R2) prior to initiating the next write operation to these volumes.

- Adaptive copy. Adaptive Copy modes transfer data from the source (R1) volume to the target (R2) volume and do not wait for receipt acknowledgment and synchronization to occur.

of 21

SRDF writes are from cache to cache, hence when data is written from local Symmetrix cache to remote Symmetrix cache over the SRDF link, the production Symmetrix waits for an acknowledgement from the remote Symmetrix before data is written to local disk.

SRDF Storage Protocol used by Deutsche BankSRDF at Deutsche Bank uses a storage protocol based upon either the ESCON™ or Fibre Channel FC-4 specifications to remotely mirror data between Symmetrix units. The host attachment, I/O protocol, and disk data structures required by each host are independent to the SRDF operation between Symmetrix units. All existing production implementations at Deutsche Bank use ESCON, though all future implementations, including the new datacentre at Hayes, will use Fibre Channel.

The benefits of SRDF over Fibre Channel Point-to-Point include increased SRDF throughput for all host types and increased connectivity options for Open Systems. In addition, Fibre Channel maintains a peer-to-peer relationship as opposed to the ESCON channel and control unit relationship used at the ESCON RA director level. This increases the flexibility of SRDF in cases where it is desired to have primary and secondary volumes located at each side of the SRDF link.

Where SRDF is used at Deutsche BankSRDF is deployed between all the major MERs in the London campus, in point-to-point configurations.

of 21

DWDM Technology

OverviewDense Wavelength Division Multiplexing (DWDM) is a process in which multiple different or multiple individual channels of data are carried at different wavelengths over one pair of fiber links. This contrasts to conventional fiber optic systems in which just one channel is carried over a fiber pair.

For EMC customers this means that multiple SRDF channels and server channels can be transferred over one pair of fiber links along with traditional network traffic! This is especially important in locations where fiber links are at a premium. For example, a customer may be leasing fiber, so the more traffic they can run over a single link, the more cost effective the solution. With today’s technology, the capacity of a single pair of fiber strands is virtually unlimited. The limitation comes from the DWDM itself. Optical to electrical transfers for switching and channel protection are required and limit the input traffic per channel.

SRDF over Fibre Channel does not currently support direct connections between RF directors using WDM or DWDM unit port connections, due to performance limitations and the relatively variable latencies of such links over long distances.

DWDM units, however, are supported for SRDF traffic via ISL connections using Fibre Channel switches such as the Connectrix family of Fibre Channel switches.

Nortel Networks OPTera MetroHigh capacity is inherent in Nortel Networks OPTera Metro DWDM (Dense Wave Division Multiplex) solution. Each wavelength can support up to .5Gb/s, while 32 or more such wavelengths can be multiplexed onto a single fiber. The resulting aggregate supports capacities of 80Gb/s to provide high capacity trunks between network elements.

of 21

Nortel Networks OPTera Metro provides the ability to route wavelengths, and therefore has the same survivability capabilities as current TDM rings when deployed in a ring topology. OPTera Metro provides a reliable DWDM platform for enterprises with large-scale connectivity requirements. OPTera’s transparent capabilities enable these enterprises to control the cost and

management requirements of connectivity, ensure network integrity, Increase network robustness, and easily accommodate emerging communications protocols.

Features and Benefits

•Support of SONET/SDH and non-SONET/SDH interfaces•Protocol and bit-rate independence•32 protected wavelengths,64 unprotected wavelengths•P r-wavelength flexible protection switching•Scalable from 16 Mbps to 2.5 Gbps per wavelength•Point-to-point and survivable ring up to 120km•In-band, per wavelength Optical Service Channel•Point and click GUI management system•Open systems management platform•NEBS and ETSI compliant

of 21

Latency

SRDF induced delaysSynchronous or even semi-synchronous mirroring of data can cause impacts to customer workloads. The impact to any given workload will vary according to:

- The blocksize of the data being remote mirrored- The distance over which the remote mirroring is being done- The remote mirroring mode used (e.g.. Synchronous, semi-

synchronous, adaptive copy)- The type of connection between the source and target Symmetrix units- The arrival rate of the write IOs at the source Symmetrix

The degree to which a customer workload is impacted by delays induced by SRDF mirroring will not only vary according to the amount of the delay, but also due to the nature of the workload. Some workloads will not be impacted by extended response times on workload components that are critical for recovery. Other workloads could be severely impacted if the affected component is on the critical path for end user transaction response time. (e.g.. An increase in response time to the online Redo logs in an Oracle environment will invariably cause end user transaction response time to degrade.)

In order to approximate the amount of delay likely to be introduced by SRDF’ing the data for any given workload, one should:

- Determine the type of SRDF implementation that is likely to be installed- Calculate the propagation delay induced by the link (calculated by

multiplying the round trip link distance in kilometres by 0.005 msec/km, and then by 3 if campus ESCON is to be used, or by 1 if a telco link (e.g. T3, ATM, etc) is to be used, or by 2 for SRDF over Fibre Channel. To this it will be necessary to add an allowance for protocol time within the both the source and target Symmetrix, as well as allowances for delays induced by protocol converters, network equipment, etc.)

- Add the approximated SRDF link delay times to the current or anticipated non SRDF’ed IO response times.

- Determine the likely impact on the customer workload, remembering that the impact will inevitably follow Little’s Law1.

1 Little’s Law is the basis upon which a lot of queuing theory is built. In general terms, Little’s Law relates the average queue length (Q) to the arrival rate of transactions (a) and the average response time (R). Specifically, Little’s Law states:

Q = a * R.Consequently, it can be seen that any increase in IO response time may well cause a significant blowout in the queue length within the application, which may or may not be supportable from a customer business perspective.

of 21

ExampleThis document is concentrating on SRDF over Fibre Channel. Write IO is transmitted using SCSI over Fibre Channel, and so according to the SCSI protocol every IO to be transmitted actually requires 2 round trips; the first is the SCSI command word (for SRDF this will be WRITE), the remote Symmetrix then returns the acknowledgement. The second trip is for the actual data, followed by the acknowledgement from the remote Symmetrix that the data has been written to cache and confirmed. This leads to the X2 propagation delay described above.

The picture above illustrates the host response time without SRDF (Baseline), and the overhead of running SRDF over zero distance (Campus) for 4K and 27K blocksize.

Working through a 4K blocksize example, we have a 2.0MS host response time for zero distance. Add to this a 100KM distance – the approximate distance from London to Milton Keynes - ((100KM + 100KM + 100KM + 100KM)*0.005)=2.0 – a total of 4MS response time per write IO.

Heavy write activity on 1 volume may mean that IOs are queued waiting for the previous IO to be acknowledged from the remote Symmetrix, and so you may get IO elongation, with IOs waiting on IOs on IOs (see Little’s Law above).

Note: There is no significant Latency through Switches or DWDMs

of 21

2.1MS 3.9MS

Recommendations for Handling High Activity DataAs a general rule of thumb, and depending on the nature of the application being supported, the distance over which the data is to remote mirrored, etc, in order to ensure acceptable overall IO response times it is desirable that no single logical volume involved in a remote mirroring relationship be required to handle more than 100 write IOs/sec at 200KMs. This figure is derived from the maximum number of IOs that a logical volume can sustain at that distance (4K blocksize – max 175 write IOs per second, 27K blocksize – max 125 write IOs per second). It must be remembered that only 1 IO for a volume can be in the SRDF ‘pipe’ at a time, though multiple IOs can be in the ‘pipe’ at the same time.

In order to reduce the IO rate to any given logical volume to this sort of level, it may be necessary to implement some of the following.

- Wherever possible high activity data should be spread over as many logical volumes as possible, so as to reduce the overall IO rate per volume, ie host level striping.

- If possible, increase host level buffering and blocksizes so as to reduce the number of IOs done by the application.

- When dealing with high activity IO caused by large, single address space tasks (e.g. database control regions, etc), it may be necessary to break the tasks into multiple smaller tasks, so as to reduce the amount of data generated on a per region basis to more manageable levels. This is a non-trivial task, as it may have significant impact on the customer’s application architecture, and will require significant involvement from customer personnel such as Data Base Analysts, etc.

- If necessary, re-design the application so as to achieve the desired IO rate on a per volume basis.

of 21

Types of applications that may not be suited to synchronous replication.

1) Database applications which exhibit very high transaction throughput and therefore a high number of log writes.

2) Database Applications that have a high transaction rate and perform excessive number of Consistency Points operations (perhaps as a result of frequent log switch operations)

3) Applications which exhibit high volumes of I/O writes. 4) Applications that are highly sensitive to synchronous write I/O performance (non-

buffered synchronous writes)5) Any highly time-bound write intensive application process where any elongation

of write I/O would impact application performance

of 21

SRDF best practices in use at Deutsche BankVarious best practices can reduce the impact of IO Queuing and IO elongation.

The simplest is to make sure that all filesystems are built on host level striped volumes. The reason for this is that the SRDF 'pipe' or queue can only have 1 IO for a Symmetrix volume going across it at any time. The pipe can contain more than 1 IO, but not for the same Symmetrix volume. By creating a striped volume set at the host level you get 2 immediate effects when the host writes an IO. If we were to write IOs to a striped filesystem spread over 4 Symmetrix volumes then the 2 benefits would be:

1) the host knows it is writing to a striped set and issues more IOs to the disk subsystem, as it knows it is actually writing to 4 volumes

2) more IOs can go across the SRDF 'pipe' to the remote Symmetrix as the IOs are to 4 Symmetrix volumes rather just 1. This reduces queuing for pipe.

Host level LVM striping is being used as a best practice by nearly all projects based on EMC Symmetrix.

of 21

Alternative strategiesThe latency overhead can also be masked from the user if an alternative replication strategy is adopted namely, Semi Synchronous or Multi-Hop replication.

Another strategy would be combining the benefits of SRDF with an Oracle automated standby database. This solution requires only that the online redo logs be synchronously replicated, thus drastically reducing communication needs.

The following strategies could help alleviate latency overhead with SRDF deployed over extended distances.

1) SRDF Semi-Synchronous mode This is used primarily in extended distance environments. In this mode of operation, data on the remotely mirrored volumes are always synchronized between the source (R1) volume and the target (R2) volume prior to initiating the next write operation to these volumes.The sequence of operations is:1.An I/O write is received from the host/server into the cache of the source.2. An ending status is presented to the host/server.3.The I/O is transmitted to the cache of the target.4. A receipt acknowledgment is provided by the target back to the cache of the source.Semi-Synchronous mode masks the impact of distance in the general case, because it allows read operations while write operations are in transit. SRDF uses a first-in, first-out queue.

of 21

2) SRDF Adaptive Copy mode SRDF Adaptive Copy mode is used primarily for data migrations and data centre moves. This operational mode is not recommended for use when mirroring for disaster recovery.

SRDF Adaptive Copy mode allows the source (R1) volumes and target (R2) volumes to be a few or many I/Os out of synchronization. The number of tracks out of synchronization (skew) is user selectable.

There are two types of adaptive copy: Write Pending mode and Disk mode. The sequence of operations is:1. An I/O write is received from the host/server into the cache of the source Symmetrix2. The I/O is acknowledged as completed to the host/server 3. The I/O is placed in the SRDF queue4. The I/O is de-staged from cache to the source (R1) volume, and an issue request is sent to the SRDF link 5. The I/O is transmitted to the cache of the target6. A receipt acknowledgment is provided by the target back to the cache

of the source.

Adaptive Copy Write Pending mode allows the transmission to take place before the data is de-staged from cache to the R1 disk volumes.

Adaptive Copy Disk mode de-stages the data from the cache to the R1 volume and then keeps track-level information as to what data is owed to the remote side so that information can be subsequently sent a track at a time.

SRDF Adaptive Copy mode is used primarily for data migrations, data center moves, and in conjunction with SRDF over Internet Protocol (IP) links. This mode of operation also can be used in an SRDF Multi-Hop configuration to mirror TimeFinder Business Continuance Volumes (BCVs)/R1 changed tracks between the intermediate target site and the final (Multi-Hop) target site.

N.b Thresholds for how far out of synch the volumes are allowed to be is selectable by the user with the “skew” command.

of 21

3) SRDF Multi-Hop mode

TimeFinder software works by configuring multiple, independently addressable online Business Continuance Volumes (BCVs) for information storage. The BCV is a Symmetrix device with special attributes created when the Symmetrix is configured. It can function either as an additional mirror to aSymmetrix logical volume or as an independent, host-addressable volume.Establishing BCV devices as mirror images of active production volumes allows you to run multiple simultaneous business continuance tasks in parallel. The principal device, known as the standard device, remains on line for regular Symmetrix operation from the original production server. Each BCVcontains a unique host address, making it accessible to a separate backup/recovery server. When you establish a BCV as a mirror of a standard device, that relationship is known as a BCV pair. The BCV is temporarily inaccessible to its host until you split the BCV pair.

The multi-hop restart solution is applicable when you want zero data loss in the event of a disaster at the local site. Zero data loss means that the state of the data at the Hop 2 restart site (after being propagated from the Hop 1 bunker site) is the same as it is at the local source site at the the beginning of a rolling disaster.

Automated replication with the BCVs at Hop 2 is applicable if you want a zero data loss solution but cannot risk the loss of both the local source site and Hop 1 bunker site at the same time. With this configuration, there are two possible disaster restart possibilities:

- If only the local source site is lost, the result is zero data loss at the Hop 2 restart site.

- If both the local source site and the Hop 1 bunker site are lost, the result is a DBMS restartable copy at the Hop 2 restart site with controlled data loss. The amount of data loss will be a function of the replicate copy cycle time between the Hop 1 bunker site and the Hop 2 restart site.

of 21

EM C

S YM M ETRIX

2EM C

S YM M ETRIX

2

EM C

S YM M ETRIX

2

Local

R1R2

R1BCV

BCV

R2

1

1

2 3 4

Hop1 Hop2

is another approach to the issues introduced by distance-based latency. Here,TimeFinder is used to create a point-in-time BCV of the production volume. SRDF Multi-Hop would then treat the BCV as an R1 or source device. Its R2 target would be at the other end of the link.In Multi-Hop scenarios, the links between the first location and the intermediate location are run synchronously. Then the TimeFinder software performs the splits described above. The links between the intermediate site and the distant site are usually Adaptive Copy mode due to the issues of latency.Multi-Hop is the best of both worlds: fully synchronous for performance between sites A and B but Adaptive Copy to keep line costs down between B and C, the disaster recovery site.

of 21

4) Oracle8I Automated Standby Database.The automated standby database is one of the prime solutions to ensure business continuity after a disaster. It achieves this with reduced amounts of inter site traffic by only shipping Archived redo logs. In the event of a disaster, a standby database can take over the processing and data serving responsibility from the primary database, providing near continuous database availability. The Oracle 8I automated Standby database and SRDF provide the means to create and automatically maintain, one or more copies of a Production database against disasters. A standby database is initially created by copying, or cloning the Production database at a remote site. Archived Redo Logs are copied by SRDF to the remote site. The Standby database is able to begin managed recovery when the next archived log generated by the Primary database is applied in managed recovery mode.

of 21

Primary DBFailover DB

On-LineRedoLogs

ArchivedRedoLogs

Logs Applied

ArchivedRedoLogs

Logs Copied over SRDF Link

Conclusion

EMC Engineering has validated the Nortel Optera DWDM for use with EMC SRDF up to 200KM in a point-to-point configuration.For Deutsche Bank to replicate data in a Synchronous copy mode between sites, careful consideration must be given as to whether the nature and characteristics of the application are suited to a Synchronous copy mode configuration, or whether the application user response times will be adversely effected by the latency issues described in this document.If an application or its components exhibit high I/O writes, or high transaction rates, then alternative SRDF replication modes should be considered to avoid these latency issues.

of 21

srdf latency

Documents

Transcript of srdf latency