95013690 a Practical Introduction to Storage Disk Subsystem Performance V1 55

© 2010 IBM Corporation

IBM Power Systems and Storage Symposium, Wiesbaden, Germany – May 10-12, 2010

A practical Introduction toDisk Storage System Performance

Gero Schmidt, ATS System StorageIBM European Storage Competence Center


A practical Introduction to Disk Storage System Performance

2 IBM Power Systems and Storage Symposium, Wiesbaden, Germany – May 10-12, 20102010-09-13

Agenda

• Disk Storage System Selection & Specs

• Application I/O & Workload Characteristics

• Hard Disk Drive (HDD) Basics – It‟s all mechanic

• HDD Performance & Capacity Aspects (SATA vs FC)

• RAID Level Considerations (RAID-5 / RAID-6 / RAID-10)

• New Trends & Directions: Solid State Drive (SSD)

• Basic Principles for Planning Logical Configurations

• Performance Data Collection and Analysis




Agenda












IBM System Storage Disk Subsystems – Making a Choice

Selecting a storage subsystem:

entry-level, midrange or enterprise class

support for host systems and interfaces

overall capacity & growth considerations

overall box performance

advanced features and copy services

price, costs / TCO, footprint, etc.

needs to meet client & application requirements

Subsystem performance:

overall I/O processing capability

overall bandwidth

choosing the right

number and type of

Disk Drives

Enterprise

DS3000 DS5000 DS6000 DS8000

MidrangeEntry-level

XIV




Storage Subsystem Specs – DS4000 Data Rate

max. throughput may be achieved with a relatively low no. of disk drives

subsystem architecture: frontend / backend bandwidth capabilities are key

SATA may be considered for applications requiring throughput

Note: Results as of 6-26-2006. Source of information from Engenio and not confirmed by IBM. Performance results achieved under ideal circumstances in a benchmark

test environment. Actual customer results will vary based on configuration and infrastructure components.

The number of drives used for MB/s performance does not reflect an optimized test config. The number of drives required could be lower/higher.




Storage Subsystem Specs – DS4000 I/O Rate

Note: Results as of 6-26-2006. Source of information from Engenio and not confirmed by IBM. Performance results achieved under ideal circumstances in a benchmark

test environment. Actual customer results will vary based on configuration and infrastructure components.

Drives were “short-stroked” to optimize for IOPs performance. Real-life may take more drives to achieve the numbers listed..

max. IOps performance requires a high no. of fast FC/SAS disk drives

subsystem architecture: I/O processing capability >> disk drives IOps capability

SATA is not a good fit for enterprise class applications requiring transaction performance




Storage Performance Council (SPC) - Benchmarks

The Storage Performance Council (SPC) is a vendor-neutral standards body focused on the

storage industry. It has created the first industry-standard performance benchmark targeted at

the needs and concerns of the storage industry. From component-level evaluation to the

measurement of complete distributed storage systems, SPC benchmarks will provide a

rigorous, audited and reliable measure of performance.

http://www.storageperformance.org




Agenda












Avg. Access Time for an I/O operation:

CPU cycle < 0.000001ms

MEMORY < 0.001ms

DISK (HDD) < 10ms

Application I/O – An Overview

HardwareHardwareSetup

Software

System Software

ApplicationSoftware

↔ ↔

Application

File Systems

SANFC

iSCSI

IB

SAS

SATA

SCSI

S

E

R

V

E

R

M

E

M

O

R

y

S

T

O

R

A

G

E

C

A

C

H

EDevice Drivers

Volume Manager

Storage subsystem

Cache hit: ~ 1 ms

Physical HDD: ~ 5...15 ms

Storage I/O performance:

Proper data placement is key!

Application I/O performance: Efficient memory usage is key!

Access to memory is >10000 times faster than disk access!

Disk access is ‚SLOW‘

compared to

CPU and MEMORY




Application I/O – On a typical System Time Scale

CPU 1ns (1GHz) = 0.000000001s

MEMORY 100ns = 0.000000100s

DISK 10ms = 0.010000000s

CPU 1 cycle := 1 second

MEMORY 1:40 minutes

DISK 115.74 days‚SLOW‘




Application I/O – Where does it come from?

Transaction Processing

A single end-user is capable of initiating only a moderate number of transactions with a limited amount of data changes per minute

Thousands of end-users can already initiate thousands of transactions and generate high I/O rates with only low data rates

End-users are directly affectedby the application response time

People„s work time is expensive

Excellent overall response time of the application is business criticaland requires low I/O response times at high I/O rates

Batch Jobs

A single batch job can alreadygenerate a considerable amount of disk I/O operations in terms ofI/O rate and data rate

Multiple batchjobs can createa huge amount of disk activity

Batch jobs should not interact with end-user transactionsand are typically run outsideend-user business hours

Time frames for batch jobseven during nights / weekendsare limited

Overall job runtime is critical andmostly dependent on achievedoverall data rate

I/O I/O




Throughput dependent workloads

Application I/O – Workload Characteristics

I/O rate in IO/s (IOps) Data rate in MB/s (MBps)

! Time to data is critical

! Dependent on number and type of

disk drives

! Data transfer rate enables

performance

! Dependent on internal

controller bandwidth

typical for transaction processing workloads

with random, small-block I/O requests, e.g.

OLTP – online transaction processing,

databases, mail servers – the majority of

enterprise applications

avg. I/O response time is most important here

(RT < 10ms is a good initial choice)

number and speed of of disk drives is essential

(e.g. 73GB15k FC drives as best choice)

SATA disk drives not generally recommended,

high speed FC/SAS/SCSI disk drives preferred

balanced system configuration and volume

layout is key to utilize all disk spindles

typical for throughput dependent workloads

with sequential, large-block I/O requests, e.g.

HPC, seismic processing, data mining,

streaming video applications, large file

access, backup/restore, batch jobs

avg. I/O response time is less important

(high overall throughput required)

bandwidth requirements (no. of adapters and

host ports, link speed) must be met

not necessarily a high number of disk drives

required

SATA disk drives may be a suitable choice

balanced system configuration and volume

layout is important to utilize full system

bandwidth

Transaction processing workloads




Application I/O – Workload Performance Characteristics

Basic workload performance characteristics:

I/O rate [IOps] (transactions) or data rate [MBps] (throughput)

Random access or sequential access workload pattern

Read:write ratio (percentage of read:write I/O requests, e.g. 70:30)

average I/O request size (average I/O transfer size or block size, e.g.

8kB for Oracle DB, 64kB or larger for streaming applications, 256kB for TSM)

Additional workload performance characteristics:

Read cache hit ratio (percentage of read cache hits)

average response time (RT) requirements (e.g. RT < 10ms)




Agenda












Hard Disk Drive (HDD) Basics – It‟s all mechanic...

Read / write cache hits are in the range ~ 1ms

Physical disk I/O operations are in the range of > 5ms because mechanicalcomponents such as head movements and spinning disks are involved

Each hard disk drive (HDD) can only process a limited no. of I/O operationsper second, mainly determined by :

– Average Seek Time [ms] (head movement to required track)

– Rotational Latency [ms] (disk platter spinning until the first sector addressed passes under the r/w heads; avg. time = half a rotation)

– Transfer Time [ms] (read/write data sectors, 1 sector = 512 Byte)

Start Seek

Time

Rotational

Latency

Transfer

Time




Simple IOps Calculation per Hard Disk Drive (HDD)

Avg. Seek Time = see manufacturer specs (typical: 4-10ms)

Rotational Latency = ½ (60000/RPM) [ms] (typical: 2-4ms)

Transfer Time = 1000 sectors sector size / avg. Transfer Rate [ms]

(typically << 1ms for small I/O request sizes < 16kB)




Manufacturer Specs for Hard Disk Drives

Source: www.seagate.com (2008)

This is just an example for getting a view on typical disk drive characteristics. The chosen disk types above do not necessarily represent

the characteristics of the disk drive modules used in IBM System Storage systems.

http://www.seagate.com/




A single disk driveis only capable of processing a

limited number of I/O operations per second!

Example Random IOps Calculation per Hard Disk Drive

Disk Drive SpeedRotational

Latency

Avg. Seek

TimeIOps

FC

146GB15k15000 rpm 2 ms 4 ms 167

FC

146GB10k10000 rpm 3 ms 5 ms 125

SATA2

500GB7.2k7200 rpm 4.2 ms 9 ms 76

Rules of Thumb - Random IOps/HDD (conservative estimate to start with):

• FC 15k DDM : ~160 IOps

• FC 10k DDM : ~120 IOps

• SATA2 7.2k DDM: ~75 IOps




Efforts to improve HDD Performance

Efforts to reduce HDD access times (mechanical delays)

Disk Drive: Introduce Command Queuing and Re-Ordering of I/Os

SATA: NCQ (Native Command Queuing)

SCSI: TCQ (Tagged Command Queuing)

Disk Drive Usage: 'Short Stroking' of HDDs

Disk Subsystem: Subsystem Cache

Intelligent Cache Page Replacement & Prefetching Algorithms

Standard: LRU (least recently used) / LFU (least frequently used)

IBM System Storage DS8000 - Advanced Caching Algorithms

2004 – ARC (Adaptive Replacement Cache)

2007 – AMP (Adaptive Multi-stream Prefetching)

2009 – IWC (Intelligent Write Caching)

IBM Almaden Research Center - Storage Systems Caching Technologieshttp://www.almaden.ibm.com/storagesystems/projects/arc/technologies/

Seek latency optimization

Caching / Cache Hits

http://www.almaden.ibm.com/storagesystems/projects/arc/technologies/




Increase HDD Performance - Command Queuing

Tagged Command Queuing (TCQ, SCSI-2) & Native Command Queuing (NCQ, SATA2)

further improves disk drive random access performance by re-ordering the I/O commands so that workloads can experience seek times which are considerably less than the nominal seek times

Queue Depth: SATA2 (NCQ): 32 in-flight commands, SCSI (TCQ): 2^64 in-flight commands




Increase HDD Performance - Short Stroking

Short Stroking:

Approach to achieve maximum possible

performance from an HDD by

limiting the overall head movement and

thus minimizing the average seek time.

Implementation:

- Use only a small portion of overall capacity

- Use tracks on outer edge with higher data density

Disadvantage:

- Typically large number of HDDs involved

- Only small portion of storage capacity used

Typical usage:

Applications with high access densities (IOps/GB) that require high random I/O rates at

low response times but with only a comparatively small amount of data.




Increase HDD Performance - Subsystem Cache

Disk Subsystem Cache

–Read Cache Hits

–Write Cache Hits / Write behind

–Sequential Prefetch Algorithms

Intelligent Cache Page Replacement & Prefetch Algorithms

–What data should be stored in cache based upon the

recent access and frequency needs of the hosts (LRU/LFU)?

–Determine what data in cache can be removed to

accommodate newer data.

–Predictive algorithms to anticipate data prior to a host request and

loading it into cache.




Sample Random IOps Calculation with reduced ⅓ Seek Times

Disk Drive SpeedRotational

Latency

Avg. Seek

Time

⅓ Reduced

Seek Time

IOps

Red. Seek

FC

146GB15k15000 rpm 2 ms 4 ms 4/3 ms 300

FC

146GB10k10000 rpm 3 ms 5 ms 5/3 ms 214

SATA2

500GB7.2k7200 rpm 4.2 ms 9 ms 9/3 ms 138

Even with reduced average seek times you cannot expect more than

a few hundred random I/O operations per second from a single HDD.

So a single HDD can only process a limited number of random IOps with

average access times in the typical range of 5...15ms due to the mechanical

delays associated with spinning disks (HDDs).




Storage Disk Subsystem – Typical I/O Rate & Response Time Relation

Response Time versus I/O Rate

0

5

10

15

20

25

30

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

Total I/O [IO/s]

Resp

on

se T

ime [

ms]

+/- 10% change in I/O rate




Agenda












Subsystem Sizing – Meeting Performance and Capacity Requirements

Capacity:

– number of disk drives to meet capacity requirements

– only low no. of large capacity disks required to meet capacity needs

Performance:

– number and speed of disk drives (spindles) to meet IOps requirements

– high no. of fast, low capacity drives required to meet performance needs

Cost:

COST

CapacityPerformance

higher lower 146GB15k drives are

an excellent trade off

between performance

and capacity needs

IOps

GB

no. of drives drive capacity




1x 1TB 7.2k SATA

(75 IOps/HDD; 9.8W)

75 IOps

1000 GB

9.8 W

7x 146GB15k FC

(160 IOps/HDD; 15W)

1120 IOps

1022 GB

105 W

Subsystem Sizing – Meeting Performance and Capacity Requirements

Application: Capacity 1000GB; Performance 1000 IOps (1.0 IOps/GB)

FC

SATA

14x 1TB 7.2k SATA

(75 IOps/HDD; 9.8W)

1050 IOps

14000 GB (!)

137.2 W

SATA SATA




2005

0.7

Average Access Density over recent Years

Access Density is a measure of I/O throughput per unit of usable storage capacity (backstore). The primary use of access

density is to identify a range on a response time curves to give the typical response time expected by the average customer,

based on the amount of total usable storage in their environment. The average industry value for access density in the year

2005 is thought to be approximately 0.7 I/Os per second per GB. Year-to-year industry data is incomplete, but the value has

been decreasing as companies acquire usable storage faster than they access it.

Source: IBM data, other consultants

IOps

GB

= Access Density

[IOps/GB]

cold

hot

data




Average Access Density – Customer Distribution

Cumulative Customer Percentages vs Access Density

0%

20%

40%

60%

80%

100%

120%

0.01

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

1.10

1.20

1.30

1.40

1.50

1.60

1.70

1.80

1.90

2.00

Average access density ~ 0.7 IO/sec/GB (2005)

Note: Chart is based on a survey of 58 customers in 2005.

FC

14

6G

B1

5k

FC

73

GB

15

k

FC

30

0G

B1

5k

SA

TA

1T

B7.2

k

SA

TA

50

0G

B7

.2k

Access Density [IOps/GB]

datacold hot




SATA vs FC - HDD Performance Positioning

up to 80%

around 45% of FC 15kSATA 7.2k

Fibre Channel (FC) disk drives / Serial Attached SCSI (SAS)

Offer highest enterprise-class performance, reliability, and availability for business-critical applications requiring high I/O transaction performance

Serial Advanced Technology Attachment (SATA) disk drives

Price-attractive alternative to the enterprise class FC drives for near-line applications with lower production costs, larger capacities but also lower specifications (e.g.rotational speeds, data rates, seek times)

SATA vs. FC Drive Positioning & Considerations

sequential workloads:

SATA drives perform quite well with only about 20% reduction inthroughput compared to FC drives.

random workloads:

SATA drive transaction performance is considerably below FCdrives and their use in environments with critical online transactionworkloads and lowest response times is not generally recommended!

SATA drives typically are very well suited for various fixed content, data archival, reference data, and near-line applications that require large amounts of data at low cost, e.g. bandwidth / streaming applications, audio/video streaming, surveillance data, seismic data, medical imaging or secondary storage. They also can be a reasonable choice for business critical applications in selected environments with less critical IOPS performance requirements (e.g. low access densities).




Agenda












RAID Level Comparison - RAID5 vs RAID10

RAID5

• cost-effective with regard to performance and usable capacity (87.5% usable capacity for 7+P)

• provides fault tolerance for one disk drive failure

• data is striped across all drives in the array with the parity being distributed across all the drives

• A single random small block write operation typically causes a “RAID5 write penalty”, initiating four I/O operations to the disk back-end by reading the old data and the old parity block before finally writing the new data and the new parity block (this is kind of a worst-case scenario – it may take less operations when writing partial or even full stripes dependent on the I/Os in cache).

• On modern disk systems write operations are generally cached by the storage subsystem and thus handled asynchronously so that RAID5 write penalties are generally shielded from the users in terms of disk response time. However, with steady and heavy random write workloads, the cache destages to the back-end may still become a limiting factor so that either more disks or a RAID10 configuration might be required to provide sufficient disk back-end write performance.

RAID10

• best choice for fault-tolerant, write-sensitive environments at the cost of 50% usable capacity

• can tolerate at least one, and in most cases even multiple disk failures.

• data is striped across several disks and the first set of disk drives is mirrored to an identical set.

• each write operation initiates two write operations at the disk back-end




RAID5 – Writing a single data block

ParityRAID5 (7+P) ARRAY

RAID5 - Read-Modify-Write: RAID5 Write Penalty

Worst case scenario with one write operation requiring four disk operations to array

(1) read old data (2) read old parity [ MODIFY ] (3) write new data (4) write new parity

= data being read from disk = data being written to disk

(1) (2)(3) (4)Cache

performing XOR calculation




Cache

RAID5 – Writing a full stripe

ParityRAID5 (7+P) ARRAY

RAID5 - Full Stripe Write

Especially with large I/O transfer sizes or sequential workoads full stripe writes

can be accomplished with RAID5 where the parity can be calculated on the fly

without the need to read any old data from the array prior to the write operation

= data being read from disk = data being written to disk




RAID5 vs RAID10 – Backend I/O rate calculation example

Sustained front-end I/O rate: 1000 IOps (70:30:50)

Example for a typical 70:30:50 random, small-block application workload

(Read:write ratio = 70:30; Read cache hit ratio = 50%)

Sustained back-end I/O rate: 1550 IOps RAID5 vs 950 IOps RAID10

RAID5 : 1000 logical random IOps

– 700 reads 50% cache hits = 350 reads

– 300 writes 4 (write penalty: read old data/parity, write new data/parity)

= 1200 reads & writes

– a total of 1550 physical IOps on the disks at the physical backend

RAID10 : 1000 logical random IOps

– 700 Reads 50% Cache Hits = 350 Reads

– 300 Writes 2 (two mirrored writes) = 600 Writes

– a total of 950 physical IOps on the disks at the physical backend

RAID10 already outperforms RAID5 in a typical 70-30-50 workload.!!! Consider using RAID10 if random write percentage is higher than 35% !!!




RAID5 vs RAID10 – Performance summary

RAID5 vs RAID10 - Performance

RAID5 and RAID10 basically deliver a comparable performance for read operations.

RAID5 typically performs better than RAID10 for large block sequential writes.

RAID10 always performs better than RAID5 for small block random writes.

RAID5 vs RAID10 - Selection

RAID5 is a good choice for most environments requiring high availability and fewer writes than reads (e.g. multi-user environments with transaction database applications and a high read activity).

RAID10 should be considered for fault-tolerant and performance-critical, write-sensitive transaction processing environments with a high random write percentage above 35%.

RAID levelRandom

Read

Random

Write

Sequential

Read

Sequential

Write

Capacity

8-DDMs

RAID5 + o + + 87.5%

RAID10 + + + o 50.0%




RAID6 - Overview

RAID LevelReliability

(#Erasures)Space efficiency

Write penalty

(Disk ops)

RAID-5, 7+P 1 87.5% 4

RAID-10, 4+4 at least 1 50% 2

RAID-6, 6+P+Q 2 75% 6

RAID6: Dual parity RAID– DS8000: 5+P+Q+S or 6+P+Q arrays (using modified EVENODD code)

– Survives 2 “erasures”

• 2 drive failures

• 1 drive failure plus a medium error, such as during rebuild

(especially with large capacity drives)

– Like RAID5, parity is distributed in stripes, with the parity blocks in a different place in

each stripe

– RAID6 does have a higher performance penalty on write operations than RAID5 due to

the additional parity calculations.

RAID Level Comparison:




DS8000 - Single Rank RAID Performance (1/2)

DS8000 R4.0

(no IWC)

full stroke




DS8000 - Single Rank RAID Performance (2/2)

RAID6 RAID10RAID5

DS8000 R4.0

(no IWC)

full stroke




Agenda












Processing Capabilities and Disk Performance over 50 years

1956 IBM RAMAC (1st disk drive) 5 MB storage, 1200 RPMdata transfer rate 8800 characters per second

2010 Enterprise FC Hard Disk Drive (HDD)600GB storage capacity, 15000 RPMdata transfer rate 122 to 204 MB/s

Last 50 years of HDD technology:

HDD RPM: 12.5 x

HDD Capacity 120 000 x

Op

era

tio

ns p

er

se

co

nd

Time

Performance

Gap

New: SSD drives (STEC-inc):

0.1 MHz

4 GHz

http://www.archive.org/details/SearchAtSanJose_IBM_RAMAC




What are solid-state drives?

• Semiconductor (NAND flash, non-volatile)

• No mechanical read/write interface, no rotating parts:

i.e. no seek time or rotational delays

• Electronically erasable medium

• Random access storage

• Capable of driving tens of thousands of IOps

with response times less than 1ms

• Absence of mechanical moving parts makes SSDs

significantly more reliable than HDDs

• Wear issues are overcome through over-provisioning

and intelligent controller algorithms (Wear-Levelling)

Application benefits

Increased performance for transactional applications with high random IO rates (IOps):

Online Banking / ATM / Currency Trading, Point-of-Sale Transactions / Processing, Real-time data mining

Solid state disks in DS8000 offer a new higher performance option for enterprise applications.

Best suited for cache-unfriendly data with high access densities (IOps/GB) requiring low response times

Additional benefit of lower energy consumption, cooling and space requirements (data center footprint)

New Trends & Directions - Solid State Drives (SSD)




Solid State Drive (SSD) - DS8000 R4.2 Single Rank Performance

Single RAID5 Rank - Random Read Single RAID5 Rank - Random I/O

Single RAID5 Rank - Random Read

Single RAID5 Rank - Sequential I/O

Random I/O: SSDs >> HDDs

Sequential I/O: SSDs ~ HDDs

RAID5 Write-Penalty

(1:4 Backend Ops)

SSDs show exceptionally low response times

Source: IBM Whitepaper, IBM System Storage DS8000 with SSDs - An In-Depth Look at SSD Performance in the DS8000,

http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101466






Solid State Drive (SSD) - Tiered Storage Concepts

Solid State Drive technology remains more expensive than traditional spinning disks, so the two

technologies will coexist in hybrid configurations for several years.

Tiered storage is an approach of utilizing different types of storage throughout the storage infrastructure.

Using the right mix of tier 0, 1, and 2 drives will provide optimal performance at the minimum cost, power,

cooling and space usage.

Data Placement is key! To maximize the benefit of SSDs it is important to analyze application workloads and

only place data which requires high access densities (IOps/GB) and low response times on them.

IBM System Storage DS8000 with SSDs - An In-Depth Look at SSD Performance in the DS8000


Driving Business Value on Power Systems with Solid State Drives

ftp://ftp.software.ibm.com/common/ssi/sa/wh/n/pow03025usen/POW03025USEN.PDF

Tier 0Solid State Drives (SSD):

Highest performance and cost/GB

Tier 115k RPM HDDs (FC/SAS):

High performance lower cost/GB

Tier 27200 RPM HDDs (SATA):

Lowest performance and cost/GB

SSD

whitepapers

cold

hot

data


ftp://ftp.software.ibm.com/common/ssi/sa/wh/n/pow03025usen/POW03025USEN.PDF




The challenges with SSDs

SSDs are considerably more expensive than

traditional disks

Without optimization tools, clients have been over-

provisioning them

And administrators spend too much time

monitoring, reporting, and tuning tiers

Inefficient use of a very expensive asset is difficult to justify

Result: Many clients feel they can‟t afford solid-state storage yet




Solid-state drives (SSDs) offer significantly improved performance compared to

mechanical disk drives... but it takes more than just supporting SSDs in a disk

subsystem for clients to achieve the full benefit:

Task: Optimizing data placement across tiers of drives with different price and

performance attributes can help clients operate at peak price/performance.

Implementing this type of optimization is a three-step process:

(1) Data performance information must be collected.

(2) Information must be analyzed to determine optimal data placement.

(3) Data must be relocated to the optimal tier.

Solution: With DS8700 R5.1 IBM introduced IBM System Storage Easy Tier

which automates data placement throughout the DS8700 disk pool (including

multiple drive tiers) to intelligently align the system with current workload

requirements. This includes the ability for the system to automatically and

nondisruptively relocate sub-volume data (at the extent level) across drive tiers,

and the ability to manually relocate full volumes or merge extent pools. Easy

Tier enables smart data placement and optimizes SSD deployments with

minimal costs. The additional Storage Tier Advisor Tool provides guidance

for SSD capacity planning based on existing client workloads on the DS8700.

IBM System Storage DS8700 R5.1 Announcement Letter (Easy Tier)

http://www.ibm.com/common/ssi/rep_ca/5/877/ENUSZG10-0125/ENUSZG10-0125.PDF

IBM Redpaper: IBM System Storage DS8700 Easy Tier

http://www.redbooks.ibm.com/abstracts/redp4667.html?Open

IBM DS8700 R5.1 Solid-State Storage Optimization with Easy Tier






http://www.redbooks.ibm.com/abstracts/redp4667.html?Open




Easy Tier optimizes SSD deployments by balancing performance AND cost requirements

Easy Tier delivers the full promise of SSD performance while balancing the costs associated

with over provisioning this expensive resource

“Just Right”“Slower, inexpensive” “Fast, expensive”

IBM Easy Tier

LUN Heatmap




Smart data placement with Easy Tier: SPC-1 (SATA/SDD)

First ever Storage Performance Council (SPC-1) benchmark

submission with SATA and SSD technology

Source:

Storage Performance Council, April 2010: http://www.storageperformance.org/results/benchmark_results_spc1#a00092

IBM Whitepaper, May 2010: IBM® System Storage™ DS8700™ Performance with Easy Tier®, http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101675

Easy Tier

0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00

Time

Th

ro

ug

hp

ut (IO

/s)

Over 3x IOPSImprovement

Increase of

over 3X!

System configuration:

16x SSD + 96x 1TB SATA

http://www.storageperformance.org/results/benchmark_results_spc1









Smart data placement with Easy Tier: SPC-1 (SATA/SDD) SSD + SATA + Easy Tier Config vs. FC 15K HDDs Config

0.00

5.00

10.00

15.00

0 10000 20000 30000 40000 50000 60000

Throughput (IO/s)

Re

sp

on

se

Tim

e (

ms

)

192 FC HDD 96 SATA + 16 SSD

Improves RT in rangeof ordinary use

Dual frames Single Frame




Smart data placement with Easy Tier: SPC-1 Backend I/O Migration

0

1

2

3

4

5

6

0 10 20 30 40 50 60 70 80 90

% Backend IO migrated

% C

ap

acit

y m

igra

ted




Agenda












Logical Configuration - Basic Principles

Three major principles for the logical configuration to optimize storage subsystem performance:

Workload isolation (e.g. on extent pool and array level)– dedicate a subset of hardware resources to a high priority workload in order to reduce impacts of

less important workloads (protect the loved ones) and meet given service level agreements (SLAs)– limit low priority workloads which tend to fully utilize given resources to only a subset of hardware

resources in order to avoid impacting other more important workloads (isolate the badly behaving ones)

– provides guaranteed availability of the dedicated hardware resources but also limits the isolated workload to only a subset of the total subsystem resources and overall subsystem performance

Workload resource sharing– multiple workloads share a common set of subsystem hardware resources, such as arrays,

adapters, ports– single workloads now can utilize more subsystem resources and experience a higher performance

than with only a smaller subset of dedicated resources if the workloads do not show contention with each other

– good approach when workload information is not available, with workloads that do not try to consume all the hardware resources available, or with workloads that show workload peaks at different times

Workload spreading– most important principle of performance optimization, applies to both isolated workloads and

resource-sharing workloads– simply means using all available resources of the storage subsystem in a balanced manner by

spreading the workload evenly across all available resources that are dedicated to that workload, e.g. arrays, controllers, disk adapters, host adapters, host ports

– host-level striping and multi-pathing software may further help to spread workloads evenly

(1) Workload isolation (2) Workload resource-sharing (3) Workload spreading




Logical Configuration – DS8000 Examples




Agenda












Analyzing Disk Subsystem I/O Performance

Questions to ask when a performance problem occurs: What exactly is considered to perform poorly? Which application, server, volumes?

Is there a detailed description of the performance problem and environment available?

What is the actual business impact of the performance problem?

What was the first occurrance of the problem and were there any changes in the environment?

When does the problem typically occur, e.g. during daily business hours or nightly batch runs?

What facts indicate that the performance problem is related to the storage subsystem?

What would be the criteria for the problem to be considered as solved? Any expectations?

Data to collect and analyze:

description & config of the architecture (application – server – SAN – storage)

application characteristics, logical and physical volume layout (usage, mapping server/storage)

I/O performance data collection during problem occurrance on server and storage subsystem:

(a) Server Performance Data Collection:

AIX # iostat (–D) [interval] [no. of intervals]# filemon –o fmon.log –O lv,pv; sleep 60; trcstop

Linux # iostat –x [interval] [no. of intervals]

Windows # perfmon → GUI, then select Physical Disk Counters

(b) Storage Subsystem Performance Data Collection:

DS3k/DS4k/DS5k (SMcli), XIV (XCLI), DS6k/DS8k and other (TPC for Disk)




DS3000/4000/5000 Performance Monitor

• only counters for quantity of processed I/Os up to current point in time

• no counters for quality of processed I/Os as, for example, I/O service times

• additional host system performance statistics required for I/O response times




DS3000/4000/5000 Performance Data Collection

SMcli script for continuous performance data collection over given time frame:

on error stop; set performanceMonitor interval=60 iterations=1440; upload storageSubsystem file="c:\perf01.txt" content=performanceStats;

>smcli [IP-Addr. Ctr.A] [IP-Addr. Ctr.B] -f perfmon.scrPerforming syntax check...Syntax check complete.Executing script...Script execution complete.SMcli completed successfully.

Always collect the

Performance Statistics

together with latest

Subsystem Profile

to document the actual

subsystem configuration

used during data collection

perfmon.scr




DS3000/4000/5000 Performance Data Collection Example

"Performance Monitor Statistics for Storage Subsystem: DS4700_PFE1 -Date/Time: 12.02.08 10:29:13 - Polling interval in seconds: 20"

"Storage Subsystems ","Total IOs ","Read Percentage ","Cache Hit Percentage ","Current KB/second ","Maximum KB/second ","Current IO/second ","Maximum IO/second"

"Capture Iteration: 1","","","","","","","""Date/Time: 12.02.08 10:29:14","","","","","","","""CONTROLLER IN SLOT A","0.0","0.0","0.0","0.0","0.0","0.0","0.0""Logical Drive Data_1","0.0","0.0","0.0","0.0","0.0","0.0","0.0""Logical Drive Data_3","0.0","0.0","0.0","0.0","0.0","0.0","0.0"[...]"CONTROLLER IN SLOT B","0.0","0.0","0.0","0.0","0.0","0.0","0.0""Logical Drive Data_2","0.0","0.0","0.0","0.0","0.0","0.0","0.0""Logical Drive Data_4","0.0","0.0","0.0","0.0","0.0","0.0","0.0"[...]"STORAGE SUBSYSTEM TOTALS","0.0","0.0","0.0","0.0","0.0","0.0","0.0"[...]

Example of performance statistics file collected on DS4000 with v7.xx firmware

For more information about how to collect and process these DS4000 performance statistics please see:

How to collect performance statistics on IBM DS3000 and DS4000 subsystems (on IBM Techdocs)

IBMers http://w3.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD103963

IBM BPs http://partners.boulder.ibm.com/src/atsmastr.nsf/WebIndex/TD103963

(same format as DS3000/DS5000 performance statistics)




DS3000/4000/5000 – Performance Data Analysis

Subsystem total IOps / MBps (average / peak)

Controller A and B total IOps / MBps

Identify busiest volumes

Identify busiest arrays

Verify if

- Array/volume configuration

- RAID level

- Disk type

is appropriate for the workload

Verify if workload distribution

is balanced across all arrays

and both controllers

Evaluate response times with

appropriate Disk Magic models




XIV – XIVGUI Performance Data Collection




XIV – XCLI Performance Data Collection

XCLI (one command line):

>xcli -m IPADDR -u USER -p PASSWD -s -y

statistics_get start=2009-10-07.11:00 count=300

interval=1 resolution_unit=minute > C:\xiv_20091007.csv




DS6000/DS8000 – DSCLI Performance Metrics Examplesdscli> showfbvol -metrics 2000

Date/Time: 24. April 2007 14:32:15 CEST IBM DSCLI Version: 5.2.2.224 DS: IBM.2107-7503461

ID 2000

Date 04/24/2007 14:30:25 CEST

normrdrqts 17

normrdhits 5

normwritereq 121050

normwritehits 121050

seqreadreqs 0

seqreadhits 0

seqwritereq 151127

seqwritehits 151127

cachfwrreqs 0

cachfwrhits 0

cachefwreqs 0

cachfwhits 0

inbcachload 0

bypasscach 0

DASDtrans 29

seqDASDtrans 0

cachetrans 33315

NVSspadel 0

normwriteops 0

seqwriteops 0

reccachemis 2

qwriteprots 0

CKDirtrkac 0

CKDirtrkhits 0

cachspdelay 0

timelowifact 0

phread 25

phwrite 33420

phbyteread 5

phbytewrite 2082

recmoreads 2

sfiletrkreads 0

contamwrts 0

PPRCtrks 0

NVSspallo 272177

timephread 28

timephwrite 40138

byteread 0

bytewrit 8508

timeread 4

timewrite 4061

dscli> showrank -metrics r2


ID R2

Date 04/24/2007 14:35:53 CEST

byteread 587183

bytewrit 287002

Reads 1176760

Writes 315629

timeread 2509716

timewrite 392892

dscli> showioport -metrics I001


ID I0001

Date 04/24/2007 14:39:56 CEST

byteread (FICON/ESCON) 0

bytewrit (FICON/ESCON) 0

Reads (FICON/ESCON) 0

Writes (FICON/ESCON) 0

timeread (FICON/ESCON) 0

timewrite (FICON/ESCON) 0

bytewrit (PPRC) 0

byteread (PPRC) 0

Writes (PPRC) 0

Reads (PPRC) 0

timewrite (PPRC) 0

timeread (PPRC) 0

byteread (SCSI) 56586

bytewrit (SCSI) 454426

Reads (SCSI) 414404

Writes (SCSI) 4906333

timeread (SCSI) 2849

timewrite (SCSI) 111272




TPC for Disk – Subsystem Performance Monitoring

The IBM Tivoli Storage Productivity Center (TPC) is a suite of storage infrastructure management tools for storage environments by centralizing, simplifying and automating storage tasks associated with storage systems, Storage Area Networks (SAN), replication services and capacity management.

IBM Tivoli Storage Productivity Center for Disk (TPC for Disk) is an optional component of TPC, that is designed to manage multiple SAN storage devices and to monitor the performance of SMI-S compliant storage subsystems from a single user interface.

IBM Tivoli Storage Productivity Center Standard Edition includes three components of the TPC suite as one bundle at a single price: TPC for Data, Fabric and Disk.

New customers with IBM System Storage Productivity Center (SSPC) which includes the pre-installed (but separately purchased) IBM Tivoli Storage Productivity Center Basic Edition only need to purchase the additional „TPC for Disk“ component to be able to collect performance statistics from their supported IBM storage subsystems.

TPC for Disk is the official IBM product for clients requiring performance monitoring of their IBM storage subsystems (e.g. DS4k, DS5k, DS6k, DS8k, SVC, ESS, 3584 Tape, ...)

TPC V4.1 introduces Tivoli Common Reporting (TCR) & BIRT (Business Intelligence Reporting Tools) for creating customized reports from TPC database




TPC for Disk – Subsystem Performance Reports

Select to initiate the report creation

3

1

2

4





Select for creating a chart




TPC for Disk – Export Subsystem Performance Reports

Select to export performance data as CSV output fileusing ‚File > Export Data„ dialog




TPC for Disk – Analyzing Reports in a Spreadsheet




TPC for Disk – Reports of Interest by Subsystem

ESS, DS6000 and DS8000:

By Storage Subsystem

By Controller

By Array

By Volume

By Port

SAN Volume Controller:


By IO Group

By Node

By Managed Disk Group

By Volume

By Managed Disk

By Port

DS4000 and other supported SMI-S compliant storage subsystems:


By Volume

By Port

Some reports may give more

or less data, depending on

the exact level of SMI-S

compliance by the vendor

supplied CIM agents.

Don‟t forget to export a complete set of reports for the subsystem of interest, e.g. for a DS8000:

20080131-75APNK1-subsystem.csv,

20080131-75APNK1-controller.csv,

20080131-75APNK1-ports.csv,

20080131-75APNK1-arrays.csv,

20080131-75APNK1-volumes.csv

Limit the reports to a representative time frame as the amount of data especially for the volume report can be

extremly large!




TPC for Disk – How to start with Performance Monitoring Simply start monitoring and thus understanding the current workload patterns (workload range

and workload profile) developing over the day/week/month for normal operation conditions where no end-user complaints are present. Develop an understanding of the expected behaviour. I/O rates and response times may vary considerably from hour to hour or day to day simply due to various application loads, business times and changes in the workload profile. You may even experience times with high I/O rates and extremly low response times (e.g. high cache hit ratios) as well as times with only moderate I/O rates but higher response times (e.g. lower cache hit ratios) still not being of any concern. Appropriate thresholds for I/O rates and response times can be derived from these statistics based on particular application and business requirements.

Regularly collect selected data sets for historical reference and do projections of workload trends. Evaluate trends in I/O rate and response time and plan for growth accordingly. Typically response times increase with increasing I/O rates. Historical performance data is the best source for performance and capacity planning.

Watch for any imbalance of the overall workload distribution across the subsystem resources. Avoid single resources from becoming overloaded (hot spots). Redistribute workload if needed.

When end-user performance complaints arise simply compare current and historical data and look for appropriate changes in the workload that may lead to performance impacts.

Additional performance metrics may help to better understand the workload profile behind the changes in I/O rates and response times:

Read:Write ratio

Read Cache Hit Percentage [%]

avg. Read/Write/Overall Transfer Size [kB] per I/O operation

Required for appropriate

Disk Magic models and

performance evaluations




TPC for Disk – Basic Performance Metrics There are lots of performance metrics available. Which ones are best to start with?

Most important metrics for a storage subsystem are:

I/O Rate: number of I/O operations per second [IOps or IO/s]

Response Time (RT): average service time per I/O operation in milliseconds [ms]

These metrics are typically available

for read operations, write operations and the total number of processed I/O operations

on subsystem, controller, port, array, volume, I/O group, node, mdisk & mdisk group level

Basic performance statistics to look at for storage subsystems are in principle:

front-end I/O statistics on subsystem level for overview of system overall workload

front-end I/O statistics on volume level for selected critical applications / host systems

backend I/O statistics on array level (i.e. on the physical disk level / spindles)

General thresholds for front-end statistics are difficult to provide, because

I/O rate thresholds depend on workload profile and subsystem capabilities

RT thresholds depend on application, customer requirements, business hours

Additional metric is Data Rate: throughput in megabytes per second [MBps]

on subsystem level for overview of overall throughput

on port level together with Port RT for overview of port and I/O adapter utilization




In general, there do not exist typical values or fixed thresholds for all performance metrics as they typically strongly depend on the nature of the workload:

Online Transaction Processing (OLTP) workloads (e.g. database)

- small transfer sizes (4kB...16kB) with high I/O rates

- low front-end response times around 5ms commonly expected

Backup, batch or sequential-like workloads

- large transfer sizes (32kB...256kB) with low I/O rates but high data rates

- high front-end response times even up to 30ms still can be acceptable

Subsystem level front-end metrics (subsystem total average):

- Overall Response Time < 10ms

Array level back-end metrics (physical disk access):

- Back-end Read Response Time < 25ms

- Disk Utilization Percentage << 80%

- I/O rate: depends on RAID level, workload profile, number and speed of DDMsconsidered very busy with I/O rates near or above 1000 I/Os (DS8000/DS6000)

Volume level front-end metrics (I/O performance as experienced by the host systems):

- Overall Response Time < 15ms (depends on application requirements and workload)

- Write-cache Delay Percentage < 3% (typically should be 0%)

TPC for Disk – Basic Guidelines for DS8000




TPC – Customized reports with BIRT

Two BIRT components: Report Designer and Report Runtime Engine

Report Designer Report Engine

Presentation

ServicesReport Design Engine

HTML

PDF

CSV

Print

Eclipse

Report

Designer

Generation

Services

Data

Transform.

Services

Charting

Engine

Data

Data

Chart

Designer

Eclipse

DTP

ODA

Custom

Designer

XML

Report

Design

Report

Document

TPC V4.1

Redbook: IBM Tivoli Storage Productivity Center V4.1 Release Guide

http://www.redbooks.ibm.com/redpieces/abstracts/sg247725.html

Chapter 10, Customized Reporting through Tivoli Common Reporting (TCR) / BIRT




IBM Dynamic Infra-

structure Leadership

Center for Information

Infrastructure

Business, Channel & Skill

Enablement & Training

DI Education & Briefings

Demos & Showcases

IT Transformation Road-

maps & Workshops

BP Certification

IBM European Storage

Competence Center

& Systems Lab Europe

IBM Executive Briefing

Center & TMCC

Business, Channel & Skill


Customer and Group

Briefings

Product & SW Demos

Integrated Solution Demos

Exhibition Support &

Organization

IBM STG Europe Storage

Software Development

Software Development

Storage & Tape

Linux

Mainframe

File Systems

Storage Competence at the Mainz Location

IBM Germany„s fourth

largest location offers

you a broad portfolio of

IBM System Storage

Services Business, Channel & Skill


End-to-end client support

Workshops

Solution Design

Lab Services

Customer Relationship

Management




Our Services

Client Briefings &

Education

Systems Lab Services &

Training

Customized Workshops

System Storage Demos

Advanced Technical

Support

Solution Design

Proof of Concepts

Benchmarks

Product Field Engineering

Our Expertise

Skilled technical storage

experts covering the whole

IBM System Storage

Portfolio

Information Infrastructure:

- Compliance

- Availability

- Retention

- Security

HW / SW & Performance

IBM System Storage Solutions Center of Excellence

We offer technical

support from the

planning phase through

well after installation

Our Systems Lab Europe

1500 sqm lab space

IBM & heterogenous hardware




DisclaimerCopyright © 2010 by International Business Machines Corporation.

No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation. Product data has been

reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This information could include technical

inaccuracies or typographical errors. IBM may make improvements and/or changes in the product(s) and/or programs(s) at any time without notice.

Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services

available in all countries in which IBM operates or does business. Any reference to an IBM Program Product in this document is not intended to state or

imply that only that program product may be used. Any functionally equivalent program, that does not infringe IBM's intellectually property rights, may

be used instead. It is the user's responsibility to evaluate and verify the operation of any on-IBM product, program or service.

The performance information contained in this document was derived under specific operating and environmental conditions. The results obtained by

any party implementing the products and/or services described in this document will depend on a number of factors specific to such party‟s operating

environment and may vary significantly. IBM makes no representation that these results can be expected in any implementation of such products

and/or services. Accordingly, IBM does not provide any representations, assurances, guarantees, or warranties regarding performance.

THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IBM

EXPRESSLY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT.

IBM shall have no responsibility to update this information. IBM products are warranted according to the terms and conditions of the agreements (e.g.,

IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM is not

responsible for the performance or interoperability of any non-IBM products discussed herein.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents or

copyrights. Inquiries regarding patent or copyright licenses should be made, in writing, to:

IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY, 10504-1785, U.S.A.




Trademarks

The following are trademarks of the International Business Machines Corporation in the United States, other countries, or both.

* All other products may be trademarks or registered trademarks of their respective companies.

Notes:

Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.

IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.

This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.

All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

The following are trademarks or registered trademarks of other companies.

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.

Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel

Corporation or its subsidiaries in the United States and other countries.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.

IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency, which is now part of the Office of Government Commerce.

LSI is a trademark or registered trademark of LSI Corporation.

AS/400®, e business(logo)®, eServer, FICON, IBM®, IBM (logo)®, iSeries®, OS/390®, pSeries®, RS/6000®, S/30, VM/ESA®, VSE/ESA, WebSphere®, xSeries®, z/OS®, zSeries®, z/VM®, System i, System i5, System p, System p5, System x, System z, System z9®, BladeCenter®, System Storage, System Storage DS®, TotalStorage®

For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml

Not all common law marks used by IBM are listed on this page. Failure of a mark to appear does not mean that IBM does not use the mark nor does it mean that the product is not

actively marketed or is not significant within its relevant market.

Those trademarks followed by ® are registered trademarks of IBM in the United States; all others are trademarks or common law marks of IBM in the United States.

95013690 a Practical Introduction to Storage Disk Subsystem Performance V1 55

Documents

Transcript of 95013690 a Practical Introduction to Storage Disk Subsystem Performance V1 55