Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis...

21
© 2005 EMC Corporation. All rights reserved. Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems CSG389: Special Topics in Computer Systems Peter Lauterbach Performance Engineering Manager © 2005 EMC Corporation. All rights reserved. 2 Agenda Scalability Workloads Storage Array Interconnects Cache Management Basic I/O operations Read and Write Optimizations Load Balancing Advanced Topics © 2005 EMC Corporation. All rights reserved. 3 Overview How do you manage ½ million I/O operations per second, over 5,000 volumes, accessing 576 disk drives (74 TB), that weighs as much as an Ford F150 pickup truck, without dropping a bit? Enterprise Storage Subsystems are SMP operating systems. © 2005 EMC Corporation. All rights reserved. 4 Performance basics Objectives of performance analysis Benchmarking determine performance of individual components and the system Bottleneck analysis spot trouble, bottlenecks, or hot-spots Once hot spots/bottlenecks are discovered and basic component perf. Is known one can change the system behavior with the goal to eliminate them Changing the behavior can mean many things: Change workload characteristics Change system behavior by replacing/adding components changing algorithms, policies … Tune performance to optimize for certain types of workloads Benchmarks Microbenchmarks determine basic perf. of individual components Small controlled tests System-level benchmarks Designed to simulate the effects on the whole system Tuning for bandwidth vs. latency Which metric matters when (ask the class)

Transcript of Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis...

Page 1: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved.

Performance Analysis and Architecture of Enterprise Storage

Enterprise Storage SystemsCSG389: Special Topics in Computer Systems

Peter LauterbachPerformance Engineering Manager

© 2005 EMC Corporation. All rights reserved. 22

Agenda

� Scalability

� Workloads

� Storage Array Interconnects

� Cache Management

� Basic I/O operations

� Read and Write Optimizations

� Load Balancing

� Advanced Topics

© 2005 EMC Corporation. All rights reserved. 33

Overview

� How do you manage ½ million I/O operations per second, over 5,000 volumes, accessing 576 disk drives (74 TB), that weighs as much as an Ford F150 pickup truck, without dropping a bit?

� Enterprise Storage Subsystems are SMP operating systems.

© 2005 EMC Corporation. All rights reserved. 44

Performance basics� Objectives of performance analysis

– Benchmarking• determine performance of individual components and the system

– Bottleneck analysis• spot trouble, bottlenecks, or hot-spots

– Once hot spots/bottlenecks are discovered and basic component perf. Is known one can change the system behavior with the goal to eliminate them

• Changing the behavior can mean many things:– Change workload characteristics– Change system behavior by replacing/adding components changing algorithms,

policies …– Tune performance to optimize for certain types of workloads

� Benchmarks– Microbenchmarks determine basic perf. of individual components

• Small controlled tests– System-level benchmarks

• Designed to simulate the effects on the whole system

� Tuning for bandwidth vs. latency– Which metric matters when (ask the class)

Page 2: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 55

Scalability of a storage system

� IOps vs. MBps

� Bandwidth

� I/Os

© 2005 EMC Corporation. All rights reserved. 66

Front-End Envelope Tests (microbenchmarking)

Test Description Meta Name:One FA processor - One LUNOne FA processor - Five LUNsOne FA processor - All LUNsOne FA - One LUN per portOne FA - Five LUNs per portOne FA - All LUNs per portAll FAs - One LUN per portAll FAs - Five LUNs per portAll FAs - All LUNs per port

baseline

� Controlled experimentation– Workload is the same for all microbenchmarks– Successive tests change only one variable at a time

FA – singe point of access to disk array, FibreChannelcontroller with two ports (processors)

LUN – logical volume consists of two slices for different disks

Workload characterization:

see next slide

© 2005 EMC Corporation. All rights reserved. 77

Front-end Envelope Scaling

Random Read Hits - IOsps @ 512Byte I/Os

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9© 2005 EMC Corporation. All rights reserved. 88

Scalability of a storage system

Write Performance

0

30

60

90

120

150

180

210

240

270

300

1KB

5KB

10KB

15KB

20KB

50KB

100KB

200KB

500KB

1MB

5MB

10MB

20MB

Object size

ThroughputObjects/s

0

10

20

30

40

50

60

70

80

90

100Bandwidth

(MB/s)

Page 3: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 99

Workloads are made of a variety of IO types and IO sizes

OLTP1 Simulation - To synthesize a medium workload of an OLTP database application, the following I/O access ratios and patterns were executed:

IO Profile Block Size Alignment % of TotalRandom Read Hit: 4K 4K 40%Random Read Miss: 4K 4K 24%Random Write: 4K 4K 16%Sequential Read: 4K 4K 10%Sequential Write: 4K 4K 10%(50 sequential IO’s were executed at each occurrence)

Percentages obtained by profiling (observing the behavior of) a “typical” system running OLTP workload.

© 2005 EMC Corporation. All rights reserved. 1010

Understanding your workloads

� Database Systems Workloads– Two major types of workloads

• OLTP (on-line transactional processing)– For workload characteristics see previous slide

• DSS (decision support systems)– Give the breakdown, if possible

– Other types• Multi-dimensional queries for geographic/scientific data

– OLAP

• Exercise for the class – ask a few people to observe some interesting patterns from the workloads descriptions

– R/W ratios– What is being exercised the most (backend vs. cache)– What type of optimizations would the students suggest?

� Sequential access -> prefetching (we talked about it in class before)

� What about writes and background write-back (again we talked about it in the class wrt file systems)

© 2005 EMC Corporation. All rights reserved. 1111

Understanding your workloads

Teradata Informix XPS

LowRandomLarge(1MB)

Parallel DB

TPC-DOracleRed Brick

HighSequentialLarge(32K – 1MB)

DSS

TPC-CMS ExchangeSybase, Oracle

High (>80%)RandomSmall(2K – 16K)

OLTP

ExamplesHit RateRandom or Sequential

I/O sizeType

© 2005 EMC Corporation. All rights reserved. 1212

Catozzi J. and Rabinovici S., "Operating System Extensions for the Teradata Parallel VLDB", VLDB, 2001, 679-682, http://www.vldb.org/conf/2001/P679.pdf

Teradata – Massively Parallel Database

Page 4: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 1313

Understanding your workloads

� Traces– Capturing traces and replay– Trace simulators– Trace libraries– Trace tools– Trace example

• I/O size address, inter-arrival time, read vs. write ratios, sequentially.

© 2005 EMC Corporation. All rights reserved. 1414

Understanding your workloads –Trace Tools

� Solaris Trace Normal Form

� Low Level Linux Disk Trace Patch - LL_TRACE

� Linux Trace Toolkit - LTT

� Sysinternals DiskMon

© 2005 EMC Corporation. All rights reserved. 1515

Understanding your workloads - Traces

0.243035 Read 14a 0161 58367072 160.243113 Read 4b 0091 45906176 320.243664 Write 14a 0161 58367056 320.243711 Read 4b 018D 6248672 16

Timestamp -SS:mS

-or- mS:uS Read/WriteFE

Port VolumeStartingAddress

I/O Size(blocks)

© 2005 EMC Corporation. All rights reserved. 1616

Elapsed Time (seconds)

VolumeAddress

(GBs)

Locality of Reference for Random, Non-Sequential I/Os

Page 5: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 1717

Elapsed Time (Seconds)

Patterns with Random I/Os

VolumeAddress

(GBs)

0.6

0.7

0.8

0.9

390 392 394 396 398 400

0.6

0.7

0.8

0.9

390 392 394 396 398 400

© 2005 EMC Corporation. All rights reserved. 1818

Balance in a system

� Number of drives

� Number and speed of CPUs

� Number and speed of channels

� Number and speed of internal buses

© 2005 EMC Corporation. All rights reserved. 1919

Bus Interconnect

© 2005 EMC Corporation. All rights reserved. 2020

Bus interconnect – Symmetrix 8000

directors

bus

disks

Page 6: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 2121

Cluster InterconnectIBM Shark DS8000

© 2005 EMC Corporation. All rights reserved. 2222

Switch Interconnect

© 2005 EMC Corporation. All rights reserved. 2323

Switch InterconnectHitachi Lightning

© 2005 EMC Corporation. All rights reserved. 2424

Matrix interconnect

Page 7: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 2525

EMC Symmetrix DMX-3000

© 2005 EMC Corporation. All rights reserved. 2626

Hardware interface – Data pipes

© 2005 EMC Corporation. All rights reserved. 2727

Front-end protocol and data handling

� Mezzanine card– Dedicated CPU handles protocol specifics

• iSCSI (recall the complexity from previous lectures)• ESCON/FICON (IBM-proprietary protocol)• Fibre Channel – instead of CPU, there is protocol-specific ASIC

� Processor– Dedicated CPU for DMA-like activities

• More complex than DMA in a PC

� Describe the “message inbox” etc.

© 2005 EMC Corporation. All rights reserved. 2828

Hardware interface - Control vs. Data paths

Page 8: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 2929

Internal Bus transfers

One transfer

Random Read Hit

Two transfers

Seq. ReadRandom Read Miss

Three transfersBus xfers

Delayed Writes

© 2005 EMC Corporation. All rights reserved. 3030

Cache Management

� Pipes and pools– Similarity to file systems and databases.

© 2005 EMC Corporation. All rights reserved. 3131

The Metamorphosis of an I/OHow different software layers change I/O attributes

HARDWARE

� �� � � �� �

� �� � � �

LVM

� � �� �� � � �

6WRUDJH

�� � �� �� � �

��� �

� ! " # $

% " &'

()*+ ,-. /. . / /* 0. ) 123 - 4 567 89 : : 56 5; <

) 1 ,3 ) 1. 23 2 ) 2 , = /2> . * ? ) 1. @ AB

CDE FG DE H F H IJK HE L M N H O PQ D LE HRS H L M H FT N H NVU N H NVU N H NW

XE JK HE YDZ [E HD \S ] O PQ T YD^ _ ] NZ L P` M N HE ]D E D Y H M HE LW

a ` G HE aD M N D b b Hc M L O PQ ]D M NVU L` Y H MJ Y H L LJd H

efg [E HD \ LS ] D c c `E F Jhi M` L ME J ] H G J F M Nj O P Q ]D E D Y H M HE L H M MJ h i L

kJ I H lZ L M H Y c ND hi H L O PQ M` c ` h b`E Y M` kJ I H lZ L M H Y[ I ` c \ LJd HU bE Di Y Hh M LJd HU `E Y H Y`E Z ]Di H LJ d H

XD MD [D L H c ND h i H L O PQ LS [m Hc M M` F [ _ [ I ` c \ _ LJd HUYS I MJ [ I ` c \ _E HD F _c ` S h MU D h F ` M N HE ]DE D Y H M HE Lonpq �q rq �

s ] ] I J c D MJ ` h ] HE b`E Y L D h O P Q© 2005 EMC Corporation. All rights reserved. 3232

Cache Managementt Cache Types

Page 9: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 3333

73.5 TB36.8 TB18.4 TB7.6–15.3 TBCapacity (RAID 5)

256 GB

Four to eight

32 x 2 Gb64 x 2 Gb (P2)

42 TB

288

DMX2000 DMX2000-P DMX3000

DMX1000DMX1000-PDMX800

57614460–120Drives

84 TB21 TB8.75–17.5 TBCapacity (raw)

64 x 2 Gb 16 x 2 Gb32 x 2 Gb (P2)

8–16 x 2 GbDrive channels

Four to eightTwo to fourTwoCache directors

256 GB128 GB64 GBMaximum cache

Symmetrix DMX SeriesIndustry’s Broadest High-End Storage Family

© 2005 EMC Corporation. All rights reserved. 3434

Cache Management – Cache Types, Big and small

© 2005 EMC Corporation. All rights reserved. 3535

Cache Management - Levels of caching

access time scaledCPU cycle (2 GHz) 0.5 ns 1 second

registers 2 ns 4 secondsL1 cache 6 ns 12 secondsL2 cache 12 ns 24 seconds

main memory 60 ns 2 minutesdisk (cache hit) 1 -2 ms 30 minutes to 1 hour

disk (cache miss) 6 - 9 ms 3 to 5 hours

© 2005 EMC Corporation. All rights reserved. 3636

Symmetrix Units of Measure

Eight 512 byte blocks in onesector (4K)

block 7block 1 block 2 block 5 block 6block 3 block 4 block 8

CRC is calculated per Sector

s2 s3 s4 s5 s6 s7 s8s1Eight 4K sectors in

one track (32k)

Virtual CylinderFifteen 32Ktracks in one

cylinder (480K)

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15

Cylinders

Tracks

Sectors

For historical reasons, we use disk terminology. Naturally, the sizes no longer correspond to disk’s physical characteristics.

Page 10: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 3737

Alignment - Cylinder

t Meta Volumes

t A 64K I/O that starts on the last track of the stripe width (x) will overlap into the next metadevice member.

x

Cylinder 1 Cylinder 2 Cylinder 1

Member 1 - Physical Drive #1 Member 2 –Physical Drive #2

© 2005 EMC Corporation. All rights reserved. 3838

Cache Management – single and multi-LRU

© 2005 EMC Corporation. All rights reserved. 3939

Gorman M., "Understanding the Linux Memory Manager", Prentice Hall, p 124, 2004

Cache Management - Slab based allocation

© 2005 EMC Corporation. All rights reserved. 4040

Older Symmetrix generation –Cache consumed by volume metadata

Data cache slots

Global Dataand

mailbox

Track Tables

Mem board 1

Mem board 2

Mem board 3

Mem board 4

Page 11: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 4141

Newer Symmetrix Configuration –Cache consumed by volume metadata

Memboard

1

Memboard

2

Memboard

3

Memboard

4

Memboard

5

Memboard

6

Memboard

7

Memboard

8

Global Data Area

Track Tables

Data Cache slots

© 2005 EMC Corporation. All rights reserved. 4242

Cache Management

t Cache eviction

© 2005 EMC Corporation. All rights reserved. 4343

Cache Management – Power Failure and Vaulting

� Random Write 5 MBps Sequential Writes 35-50 MBps

’ DWD

* OREDO&DFKH

7−

7�

7�

7�

7�

7�

7�

7�

'DWD 'DWD 'DWD

© 2005 EMC Corporation. All rights reserved. 4444

Inter-process communication

One CPU director

Fibre channel

� Two CPU director

� FICON and ESCON

Page 12: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 4545

Task Scheduling

t When Everything is Important, Nothing Is Important

t When is something idle

t Normal tasks– Read and write I/Os

t High priority tasks

t Background tasks– Disk scrubbing, anticipatory reads, memory scrubbing

t Probability matrix

© 2005 EMC Corporation. All rights reserved. 4646

Basic I/O Operations

t Locality of Reference / Temporal locality

t Random Read Hit

t Random Write Hit

t Sequential Reads

t Sequential Writes

t Random Read

t Random Writes

© 2005 EMC Corporation. All rights reserved. 4747

Read Hit

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

1) Hostsends readrequest to

FA

2) Data in cache issent to host

(transfer beginswithin 1-2 ms.)

© 2005 EMC Corporation. All rights reserved. 4848

Read Hit

’ $

* OREDO&DFKH

) $

5 $

’ $

6 5 ’ )

) URQWŁHQG˛''&ˇ / RJLFDO9ROXPH˛/ 9'&ˇ 3K\ VLFDOGLVN˛) '&ˇ

5 �

−ˇ˜+ RVWVHQGV UHDGUHTXHVWWR˜

) $

� ˇ˜'DWDLQFDFKHLVVHQWWR˜KRVW

˛WUDQVIHUEHJLQVZ LWKLQ−Ł� ˜P Vłˇ

Page 13: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 4949

Read Miss

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

1) Hostsends readrequest to

FA

2) Datamiss incache

4) DA pipes datafrom drive intoglobal cache

(drive respondswithin 6-12 ms.)

3) DAsends readrequest to

drive

5) Data incache issent to

host

© 2005 EMC Corporation. All rights reserved. 5050

Read Miss

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

1) Hostsends readrequest to

FA

2) Datamiss incache

4) DA pipes datafrom drive intoglobal cache

(drive respondswithin 6-12 ms.)

3) DAsends readrequest to

drive

5) Data incache issent to

host

© 2005 EMC Corporation. All rights reserved. 5151

Read Miss

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

1) Hostsends readrequest to

FA

2) Datamiss incache

4) DA pipes datafrom drive intoglobal cache

(drive respondswithin 6-12 ms.)

3) DAsends readrequest to

drive

5) Data incache issent to

host

© 2005 EMC Corporation. All rights reserved. 5252

Read Miss

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

1) Hostsends readrequest to

FA

2) Datamiss incache

4) DA pipes datafrom drive intoglobal cache

(drive respondswithin 6-12 ms.)

3) DAsends readrequest to

drive

5) Data incache issent to

host

Page 14: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 5353

Read Miss

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

1) Hostsends readrequest to

FA

2) Datamiss incache

4) DA pipes datafrom drive intoglobal cache

(drive respondswithin 6-12 ms.)

3) DAsends readrequest to

drive

5) Data incache issent to

host

© 2005 EMC Corporation. All rights reserved. 5454

Sequential Read (before prefetch starts)

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

1) Hostsends readrequest to

FA

2) Datamiss incache

4) DA pipes datafrom drive intoglobal cache

(drive respondswithin 6-12 ms.)

3) DAsends readrequest to

drive

5) Data incache issent to

host

© 2005 EMC Corporation. All rights reserved. 5555

Sequential Read (after prefetch starts)

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

2) Hostsends readrequest to

FA

3) Data hitin cache

1) DA start prefetch jobto keep global cachefilled for future front-

end requests

4) Data in cache issent to host

(data transfer beginswithin 1-2 ms.)

© 2005 EMC Corporation. All rights reserved. 5656

Sequential Read (after prefetch starts)

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

2) Hostsends readrequest to

FA

3) Data hitin cache

1) DA start prefetch jobto keep global cachefilled for future front-

end requests

4) Data in cache issent to host

(data transfer beginswithin 1-2 ms.)

Page 15: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 5757

Sequential Read (after prefetch starts)

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

2) Hostsends readrequest to

FA

3) Data hitin cache

1) DA start prefetch jobto keep global cachefilled for future front-

end requests

4) Data in cache issent to host

(data transfer beginswithin 1-2 ms.)

© 2005 EMC Corporation. All rights reserved. 5858

Sequential Read (after prefetch starts)

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

2) Hostsends readrequest to

FA

3) Data hitin cache

1) DA start prefetch jobto keep global cachefilled for future front-

end requests

4) Data in cache issent to host

(data transfer beginswithin 1-2 ms.)

© 2005 EMC Corporation. All rights reserved. 5959

Write Hit

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

1) Hostsends writerequest to

FA

2) FA findsavailable slot andpipes data fromhost into global

cache

3) Status issend to

host(2-3 ms.)

4) DA destageswrite pending slots

based upon load

© 2005 EMC Corporation. All rights reserved. 6060

Delayed Write

DA

GlobalCache

FA

RA

DA

SRDF

Front-end (DDC) Logical Volume (LVDC) Physical disk (FDC)

R2

1) Hostsends writerequest to

FA

2) FA cannot findavailable data slot(device or system

write ceiling)

5) FAreconnects to

pipe hostdata into

global cache(8-20 ms.)

3) DA is furiouslydestaging cache

slots to makeavailable for front-

end writes

4) FA findsavailable cache slot

via polling

Page 16: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 6161

Algorithmic optimizations

� Global vs. local optimizations

© 2005 EMC Corporation. All rights reserved. 6262

Read optimizations

� Reads before writes

� Pre-fetching– Multiple sequential read streams look random– Avoiding cache pollution

� Mirror policies

© 2005 EMC Corporation. All rights reserved. 6363

Symmetrix Behaviors –Dynamic Mirror Service Policy

� DMSP looks at multiple statistics – Physical drive– DA director

� DMSP solves for: – Load balancing – Seek minimization

© 2005 EMC Corporation. All rights reserved. 6464

Symmetrix Behaviors –DMSP with 4 GB hypers

M1

P1

4 Hyper M1 Only

M2

P2M2

P1

4 Hyper Interleaved

P2

M1

P1

4 Hyper Mixed Example

only on odd middle hyper

P2

M2M1

Page 17: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 6565

Symmetrix Behaviors –Mirror Service Policy

Random Read Miss2 GB 4 GB range

MSP iops iopsInterleaved 245 230M1 Only 169 161

Mixed 173 243

RQH�K\SHU�DFWLYH�����E\WH�EORFNV

© 2005 EMC Corporation. All rights reserved. 6666

Symmetrix Behaviors –Write Pending Limit

80% of cache slot memorySystem write pending limit

(TASK 2)

Cache Slot Memory

0%

100%

40% DA High PriorityDestage (50% of 80%)

Dynamic Devicewrite pending limitUp to 3 times default Value. (TASK 8)

© 2005 EMC Corporation. All rights reserved. 6767

Symmetrix Behaviors – Dynamic Device Write Ceiling

1000 I/Ops for 15 seconds = 15,000 slots

0 20 40 60 80

20%

40%

60%

80%

100%

Seq. Write 32768

Delayed Fast Write

Elapsed Time in Seconds

Fast Write

© 2005 EMC Corporation. All rights reserved. 6868

Symmetrix Behaviors – System Write Pending Ceiling

Elapsed Time in Seconds

IOps

0 20 40 60 80

25%

50%

Seq. Write 32768

(9728 - 3000) IOps for 45 seconds = 320,000 slots75%

Delayed Fast Write

Fast Write

High Priority Destaging 50% limit

100%

Page 18: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 6969

System Write CeilingWrite pending ceiling problems

0

10

20

30

40

50

60

70

80

90

0 2000 4000 6000 8000 10000 12000 14000

I/Os per second

Responsetime in ms.

bottlenecked on 16 active drives

no bottleneck - all 96 drives

© 2005 EMC Corporation. All rights reserved. 7070

Write optimizations

� Write merging

© 2005 EMC Corporation. All rights reserved. 7171

Load Balancing - I/O Optimization without PowerPath

�� �� �

� �� � �

Host Bus AdapterHBAHBA

Request

Request

Request

Request

HBAHBA

Request

HBAHBA HBAHBA

Request

HBAHBA

Host Applicatio

n(s)

Host Applicatio

n(s)

Host Applicatio

n(s)

Host Applicatio

n(s)

Host Applicatio

n(s)

Host Applicatio

n(s)

HBAHBA

Request

Request

Request

Request

Request

Request

Request

© 2005 EMC Corporation. All rights reserved. 7272

Load Balancing - I/O Optimization with PowerPath

�� �� �

� �� � �

Host Bus AdapterHBAHBA HBAHBAHBAHBA HBAHBA HBAHBA

Host Applicatio

n(s)

Host Applicatio

n(s)

Host Applicatio

n(s)

Host Applicatio

n(s)

Host Applicatio

n(s)

Host Applicatio

n(s)

HBAHBA

PowerPath

RequestRequest Request

Request RequestRequest

Request

Request

Request

Request

Request

Request

Request

Storage Array

6

Page 19: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 7373

Load Balancing - Metavolumes

© 2005 EMC Corporation. All rights reserved. 7474

Load Balancing - Optimizer

© 2005 EMC Corporation. All rights reserved. 7575

Quality of service

� Favorite LUNs

� Service Level Agreements

© 2005 EMC Corporation. All rights reserved. 7676

Benchmarking methodologies� lmbench

� SPECFS

� Characterization

� Workload replay

� Scalability of the driver

Page 20: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 7777

Queuing

� Little’s Law

� Throttling web servers.

© 2005 EMC Corporation. All rights reserved. 7878

The Language of Performance Analysis Little’s Law

How much time spent each oper per second is VERY interesting

Just how many things happen per second is NOT interesting!

throughput iscompletions perunit time(i.e. MB / sec)

servicetime

response time

average queue length

© 2005 EMC Corporation. All rights reserved. 7979

Modeling

Analytical Models

� Easy to construct

� Low calculation times

� Reasonably accurate (~10%)

� Can run many combinations before deciding which discrete simulations to build

Discrete Simulation

� Very accurate (~1-3%)

� Every cycle and operations is accounted for

� Many hours of simulation to generate several seconds of runtime.

© 2005 EMC Corporation. All rights reserved. 8080

Problems to solve� Similar to z/OS Workload Manager

� Real-time monitoring and analysis

� Scaling up to even greater powers of storage

Page 21: Performance Analysis and Architecture of Enterprise …€¦ ·  · 2005-03-23Performance Analysis and Architecture of Enterprise Storage Enterprise Storage Systems ... Performance

© 2005 EMC Corporation. All rights reserved. 8181

Cluster InterconnectIBM Shark DS8000

© 2005 EMC Corporation. All rights reserved. 8282

Switch InterconnectHitachi Lightning

© 2005 EMC Corporation. All rights reserved. 8383

EMC Symmetrix DMX-3000

© 2005 EMC Corporation. All rights reserved. 8484

References� Neil Gunther

Analyzing Computer Systems Performance: With Perl: PDQSpringer, 2004

� Raj JainThe Art of Computer Systems Performance Analysis : Techniques for Experimental Design, Measurement, Simulation, and ModelingWiley, 1991

� Edward TufteThe Visual Display of Quantitative InformationGraphics Press, Cheshire, Connecticut, 1983.

� Daniel A. Menasce, Lawrence W. Dowdy, Virgilio A.F. Almeida Performance by Design : Computer Capacity Planning By ExamplePrentice Hall PTR, 2004

� Adrian Cockcroft and Richard PettitSun Performance and Tuning: Java and the Internet (2nd Edition)Sun Microsystems Press, Prentice Hall PTR, 1998