Exchange 2003 Storage Design Brad Carter Rapid Response Engineer EMEA Exchange Centre of Excellence.
-
date post
21-Dec-2015 -
Category
Documents
-
view
220 -
download
3
Transcript of Exchange 2003 Storage Design Brad Carter Rapid Response Engineer EMEA Exchange Centre of Excellence.
Exchange 2003 Storage Design Brad Carter
Rapid Response Engineer
EMEA Exchange Centre of Excellence
Welcome to this TechNet Event
FREE bi-weekly technical newsletter
FREE regular technical events hosted across the UK
FREE weekly UK & US led technical webcasts
FREE comprehensive technical web site
Monthly CD / DVD subscription with the latest technical tools & resources
FREE quarterly technical magazine
We would like to bring your attention to the key elements of the TechNet programme; the central information and community resource for IT professionals in the UK:
To subscribe to the newsletter or just to find out more, please visit www.microsoft.com/uk/technet or speak to a Microsoft representative during the break
Purpose of Session
– Introduce Exchange Storage Concepts
– Improve understanding of Storage Design Best Practices
–Validating design and monitoring performance
–Objective: To provide attendees with enough knowledge to ensure they deploy Exchange 2003 optimally on their chosen Storage Platform.
OverviewOverview
Introduction
Disk I/O and Exchange Server
Best Practices for Optimizing your Storage Architecture
Storage Design
- Establishing your Disk I/O Requirements
- Establishing your Disk Capacity Requirements
Using Jetstress to verify Sub-Storage System Performance & Reliability
Basic Primary Partitions
Key Recommendations Summary
PERF Counters
Examples
Session Structure and ContentSession Structure and Content
Exchange Server 2003 is a disk-intensive application that requires a fast, reliable disk subsystem to function correctly.
Storage subsystem bottlenecks cause more performance problems than any other server-side component; e.g.... CPU or RAM. + CRITSITS!
A poorly designed disk subsystem will provide extremely negative performance for your users . Specifically, your disk subsystem is performing poorly if it is experiencing:
- Average read and write latencies over 20 ms for database drives
- MSExchangeIS/RPC Average Latency above 50ms
High disk latency = High RPC latency
High RPC latency = Slow performance
IntroductionIntroduction
Disk I/O and Exchange ServerDisk I/O and Exchange Server
Every time data is read from or written to Exchange, disk I/O is Every time data is read from or written to Exchange, disk I/O is generated. generated.
Exchange Data Components (.EDB, .STM, Logs)Exchange Data Components (.EDB, .STM, Logs)
Component I/O Pattern
Jet database (.edb file) Read from and write to at random 4 KB page size
Streaming database (.stm file) Normally read from and write to sequentially Variable page size that averages 8 KB in productionNote There are significant numbers of seek operations, so the I/O
pattern is neither entirely random nor entirely sequential.
Transaction log files (.log files) 100 percent sequential writes during normal operations 100 percent sequential reads during recovery operations Writes vary in size from 512 bytes to the log buffer size
Best Practices for Optimizing your Storage ArchitectureBest Practices for Optimizing your Storage Architecture
Database Files
- .edb and .stm file placement
- fast random access speeds
Content Indexing Files
- Never place content indexing files on the same disk as the page file (although that is the default location).
- Random-access file, should be placed on the same volume as the databases
Transaction Log Files
- Most important write performance drive in terms of latency
- Sequential write pattern
- Instance placement (in terms of ESE)
Cont… Best Practices for Optimizing your Storage ArchitectureCont… Best Practices for Optimizing your Storage Architecture
SMTP Queue
- Should never be on any spindle that performs another function (due to very different I/O patterns).
Page File
- place your page file on separate spindles
- If you lose the disk with the page file, the server will experience a stop error.
MTA Queue
- MTA queues should never reside on a log or database volumes.
- If your server handles a significant amount of SMTP and/or MTA traffic, you should provide a separate set of spindles for the SMTP and MTA queues. (MBX Servers, Bridgeheads, etc).
Cont… Best Practices for Optimizing your Storage ArchitectureCont… Best Practices for Optimizing your Storage Architecture
All devices on the storage system must be listed on the Windows Hardware Compatibility List (WHCL). HCL Web site: http://go.microsoft.com/fwlink/?LinkId=23194.
Exchange should be run only against storage that is certified for Windows.
Cluster Certified Geo-Cluster/Multi-Cluster Certified.
Drivers and firmware must be up-to-date;
- Server BIOS/firmware
- SCSI/Array Controller firmware and driver
- Fiber Host Bus Adapter (HBA) firmware and driver
- Fiber switch/hub firmware
- SAN (Storage Area Network) enclosure Operating System/Microcode/firmware
- Hard disk firmware
Verify that the HBA/SAN specific configuration is set correctly. HBAS use registry keys to customize the configuration to a specific SAN platform (for example, Queue Depth and Queue Target).
Impact on sharing sequential and random I/O can be significant within the same disk group
– Results in excessive latency if sequential and random are shared on the same disks – “Exchange mail stores and backup content” !! + TRANY LOGS
Use Dispar to align disks
Setting up and Configuring the Storage System:
Cont… Best Practices for Optimizing your Storage ArchitectureCont… Best Practices for Optimizing your Storage Architecture
Segment Size
- Stripe size specifies the segment's size when written to each disk in a RAID array.
Controller Cache.
If your controller allows you to configure the cache page size. Configure this for 4K pages to accommodate Exchange. Set this to 100% write cache. Make sure this is battery-backed cache.
Setting up and Configuring the Storage System:
Storage DesignStorage Design
Required information
– Total IOPS required (IOPS/mailbox x # Mailboxes)
– Read/Write Ratio
– Disk capabilities (10K, 15K, 72GB, 146GB, 300GB?)
– Disk transfer capability of the storage enclosure (consider throughput with failed components)
– Backup window (understand the workloads)
– Restore window
– Near immediate restore is viable with VSS
– Don’t forget OLM and recovery utilities as part of your design considerations)
– Always design for performance first, then capacity
Storage Design …Peak IO RequirementsStorage Design …Peak IO Requirements
Profiling
–Utilize existing infrastructure to determine peak IO requirements
–Use Windows System Performance Monitor to trend following counters:
– Disk Transfers/sec
– Disk Reads/sec
– Disk Writes/sec
– Trend during peak period
– Monday is typically the busiest day in Microsoft.
Storage Design: I/O ProfileStorage Design: I/O Profile
This image is a six hour peak This image is a six hour peak period representation of disk period representation of disk activity on a production server in activity on a production server in Microsoft supporting 4700 Microsoft supporting 4700 mailboxesmailboxes
The image is based on a 10 second The image is based on a 10 second sample rate for a period of over 6 sample rate for a period of over 6 hourshours
The average rate defined is ~4300 The average rate defined is ~4300 transfers/sec for the six hourstransfers/sec for the six hours
The rate we use for profiling is The rate we use for profiling is between 10am and 12pm which is between 10am and 12pm which is below ~5000 transfers/secbelow ~5000 transfers/sec
Mailbox IOP is defined by dividing Mailbox IOP is defined by dividing the average peak IO by total the average peak IO by total mailboxes.mailboxes.
5000 / 4700 = 1.075000 / 4700 = 1.07
We profile at 1.2 IOPS per mailbox We profile at 1.2 IOPS per mailbox for this server based on historical for this server based on historical datadata
Peak IO Read Write Mix (R:W)Peak IO Read Write Mix (R:W)
The read write mix profile The read write mix profile in Microsoft is typically in Microsoft is typically based on a 2:1 R:W ratiobased on a 2:1 R:W ratio
Within the MSIT Within the MSIT deployment this is not a deployment this is not a critical value for our design critical value for our design methodology as we utilize methodology as we utilize RAID 1_0 for our RAID 1_0 for our production devices.production devices.
Customers considering Customers considering storage requirements for storage requirements for Exchange with the intent of Exchange with the intent of using RAID 5 should trend using RAID 5 should trend their peak period R:W mixtheir peak period R:W mixRAID 5 has a significant RAID 5 has a significant write penalty and disk write penalty and disk allocations will vary allocations will vary substantially based on R:W substantially based on R:W mixmix
Storage Design …Select RAID and Disk TypeStorage Design …Select RAID and Disk Type
RAID 10 or RAID 5
– Different Write Penalties (WP)
– Very different performance profiles under heavy load
36GB, 72GB or 146GB
– Disks are getting larger, performance is not changing
– 300GB will be available soon !!
10K or 15K RPM
Disk
Speed
IO measured at the host IO measured behind the controller
10K ~100 ~130
15K ~150 ~180
Table represents throughput based on 80% capacity utilization Table represents throughput based on 80% capacity utilization under a 4K random load delivering below 20ms latencies.under a 4K random load delivering below 20ms latencies.
These values can be controller specific so testing is requiredThese values can be controller specific so testing is required
12 10K disks delivering 1200 transfers 12 10K disks delivering 1200 transfers ~100 transfers/disk~100 transfers/disk Transfers at the disk using backend tools Transfers at the disk using backend tools
~130 transfers/disk~130 transfers/disk
Disk IO measured at the Disk IO measured at the host using Jetstress host using Jetstress
Disk IO measured behind Disk IO measured behind controller during the same controller during the same
Jetstress testJetstress test
Storage Design …Select Correct RAID TypeStorage Design …Select Correct RAID Type
Consider Microsoft’s scenario:Consider Microsoft’s scenario:4000 mail boxes per server4000 mail boxes per server200MB limits for most users, some exceptions200MB limits for most users, some exceptions1.2 IOPS (From Trending Profile)1.2 IOPS (From Trending Profile)2:1 Read / Write mix2:1 Read / Write mixDeleted item retention of 3 days Deleted item retention of 3 days Fluff Factor of 1.4 (overhead for provisioning mailbox storage)Fluff Factor of 1.4 (overhead for provisioning mailbox storage)
Simple Math :Simple Math :
(IOPS X READ RATIO) + [RAID PENALTY](IOPS X WRITE RATIO)(IOPS X READ RATIO) + [RAID PENALTY](IOPS X WRITE RATIO)------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SPINDLE SPEED AT THE CONTROLLERSPINDLE SPEED AT THE CONTROLLER
4800 X 0.66) + [2](4800 X 0.34)4800 X 0.66) + [2](4800 X 0.34)----------------------------------------------------------------------------------
130 OR 180130 OR 180
10K disk = 6432 / 130 (disk capability) = ~ 10K disk = 6432 / 130 (disk capability) = ~ 49 disks49 disks15K disk = 6432 / 180 = ~ 15K disk = 6432 / 180 = ~ 35 disks35 disks
Select Correct RAID TypeSelect Correct RAID Type
Consider requirement behind the controller– Use “130” for 10K and “180” for 15K disks
– Assume 80% capacity utilization on disk
RAID 1_04000 X 1.2 = 4800 transfers/sec
3200 Reads + (1600 Writes * 2 (write penalty)) = ~ 6400
10K disk = 6400 / 130 (disk capability) = ~ 49 disks – round down to 48
15K disk = 6400 / 180 (disk capability) = ~ 35 disks – round down to 34
RAID 54000 X 1.2 = 4800 transfers/sec
3200 Reads + (1600 Writes * 4 (write penalty - 2 R and 2 W per Host W)) = ~ 9600
10K disk = 9600 / 130 (disk capability) = ~ 73 disks
15K disk = 9600 / 180 (disk capability) = ~ 53 disks
RAID 1_0 is the obvious choice for internal deployment. Some controllers do a better job handling RAID 5 than others by reducing the impact of the write penalty with effective caching.
Ensure Cache is EnabledEnsure Cache is Enabled
The white line deviation represents the change in IO capability as measured at the host when write cache is disabled on the storage controller.
In this case a reduced throughput of ~400 transfers/sec with a corresponding impact on read and write latency characteristics.
Write latency was below 2ms prior to disabling the cache and shot to below ~16ms when disabled.
When designing for Exchange you should try and achieve the best level of latency for both read and write
Be Careful with RAID 5Be Careful with RAID 5
This is a representation of total This is a representation of total disk transfers on an optimized disk transfers on an optimized RAID 5 configuration. RAID 5 configuration.
System sustaining averages of System sustaining averages of 1600 transfers/sec until a drive 1600 transfers/sec until a drive failed and forced rebuild to a failed and forced rebuild to a hot-sparehot-spare
System lost 400 Transfers or System lost 400 Transfers or 33% capability !! 33% capability !!
This representation is extracted using an array This representation is extracted using an array performance analyzer to show the impact during rebuild.performance analyzer to show the impact during rebuild.
Rebuild took over 3 hours on a 146GB disk. Rebuild took over 3 hours on a 146GB disk.
Allocate more disks than required to cater for failed disk Allocate more disks than required to cater for failed disk Do Not rebuild during peak time to minimize impact on Do Not rebuild during peak time to minimize impact on users.users.
Note:Note: Some storage enclosures will vary on rebuild time Some storage enclosures will vary on rebuild time
Understand QUEUEDEPTHUnderstand QUEUEDEPTH
Representation of a system running Jetstress displaying Disk Transfers, Read and Representation of a system running Jetstress displaying Disk Transfers, Read and Write latencyWrite latency
Host is connected via a single 2GB FCA using StorPort driverHost is connected via a single 2GB FCA using StorPort driver
Gradual reduction in sustainable throughput with corresponding impact in latency as Gradual reduction in sustainable throughput with corresponding impact in latency as a result of reducing Queuedeptha result of reducing Queuedepth
Queuedepth can throttle IO back at the host making the result look like storage Queuedepth can throttle IO back at the host making the result look like storage contentioncontention
Use JETSTRESSUse JETSTRESS
Use Jetstress to validate storage capabilityhttp://www.microsoft.com/downloads/thankyou.aspx?FamilyID=94b9810b-670e-433a-b5ef-b47054595e9c&displaylang=en
Determine total system throughput under normal operational conditions
Determine capability and resilience of system during simulated component failure.
– Host Bus Adapter (HBA) failures
– Array controller failures
– Etc.
Ensure peak activity is viable under all conditions, many customers fail to plan for these occurrences
Testing Categories (Performance & Long Haul)
Importance of Database sizes
JETSTRESS UI - InterfaceJETSTRESS UI - Interface
Jetstress set to simulate Jetstress set to simulate 4000 mailboxes at 1.2 4000 mailboxes at 1.2 IOPS per mailboxIOPS per mailbox
Databases can take an Databases can take an extended time to create!extended time to create!
Create Databases based Create Databases based on expected profile in the on expected profile in the example using:example using:
4 -4 - Storage Groups (SG) Storage Groups (SG)
40GB40GB - Databases (edb) - Databases (edb)
66 – edb’s per SG – edb’s per SG
System will generate 4K System will generate 4K random Read Write activity random Read Write activity to simulate an Exchange to simulate an Exchange load for predicting storage load for predicting storage capabilitycapability
Short & Long Stroking
Outer Track will produceHigher level of performance
Inner Track will produceLower level of performance
Data Plateau
Spindle Arm & Head
JETSTRESS with Full StrokeJETSTRESS with Full Stroke
• Create test databases that are sized to represent what you will have in Create test databases that are sized to represent what you will have in productionproduction
• Jetstress UI does a good job on this today BEWARE OF CMDJetstress UI does a good job on this today BEWARE OF CMD
• Using ~70%+ of actual disk capacity results in a more realistic performance Using ~70%+ of actual disk capacity results in a more realistic performance characteristiccharacteristic
• In this example 48 10K disks provisioned over 4800 transfers/sec with 40GB In this example 48 10K disks provisioned over 4800 transfers/sec with 40GB EDBsEDBs
• Expected IO is perceived to be ~100 IOPS per disk at the hostExpected IO is perceived to be ~100 IOPS per disk at the host
(Full stroke seek time) -(Full stroke seek time) - The time it takes to seek over all tracks The time it takes to seek over all tracks i.e., from the innermost to the outermost or vice versa of a diski.e., from the innermost to the outermost or vice versa of a disk
JETSTRESS with Short StrokeJETSTRESS with Short Stroke
• Original JetStress recommendation suggested using 1/20Original JetStress recommendation suggested using 1/20thth of expected of expected database size for testing storage designdatabase size for testing storage design
• Using small amount of physical disk capacity can result in better performance Using small amount of physical disk capacity can result in better performance than expectedthan expected
• In this example 48 10K disks provisioned over 10,000 transfers/sec using In this example 48 10K disks provisioned over 10,000 transfers/sec using 4GB EDB’s4GB EDB’s
• Expected IO is perceived to be ~200 IOPS per disk at the host. This is Expected IO is perceived to be ~200 IOPS per disk at the host. This is ~100% more throughout that can be expected when the disk is ~80% ~100% more throughout that can be expected when the disk is ~80% utilized from a capacity perspective.utilized from a capacity perspective.
Basic Primary PartitionsBasic Primary Partitions
Mater Boot Record (MBR) creates an alignment offset
– Utilize Diskpar from the Windows 2000 Resource kit
– Provides ~10% improvement in sustainable throughput when corrected
– This will not resolve excessive latencies
– Data destructive
– Creates RAW partition
– Assign drive letter or mount point and then format in “Disk Manager”
Disk –i Disk –i
Disk –s Disk –s
Key Recommendations SummaryKey Recommendations Summary
- Understand user IO profiles- Understand user IO profiles- Select the correct RAID type, Disk capacity and speed - Select the correct RAID type, Disk capacity and speed
-Design for performance, and then capacityDesign for performance, and then capacity
- Isolate Exchange disks – Keep other applications awayIsolate Exchange disks – Keep other applications away(SHARED STROAGE SHOULD BE AVOIDED) (SHARED STROAGE SHOULD BE AVOIDED)
- Dedicated disk spindles for data !!- Dedicated disk spindles for data !!- Dedicated disk spindles for logs !!Dedicated disk spindles for logs !!
- Align you disks using diskparAlign you disks using diskparThis wont correct a bad designThis wont correct a bad design
- Validate storage design using Jetstress- Validate storage design using Jetstress- Scale mailboxes for performance efficiency with strong consideration - Scale mailboxes for performance efficiency with strong consideration towards maintaining a backup and restore SLAtowards maintaining a backup and restore SLA
The PERF Counters to watchThe PERF Counters to watch
Physical Disk->Average Disk sec/Read->Instances
Physical Disk->Average Disk sec/Write->Instances
Physical Disk->Current Disk Queue Length->Instances
Physical Disk->Disk Bytes/Sec->Instances
Physical Disk->Disk Writes/Sec->Instances
Physical Disk->Disk Reads/Sec->Instances
Physical Disk->Disk Transfers/Sec->Instances
Physical -> Average Disk Bytes/Transfer->Instances
Database\Log Record Stalls/sec
– Log Record Stalls/sec is the number of log records that cannot be added to the log buffers per second because they are full. The average for this value should be below 25. There shouldn’t be spikes (maximum values) higher then 250.
Database\Log Threads Waiting
– Log Threads Waiting is the number of threads waiting for their data to be written to the log in order to complete an update of the database
Under MSExchangeIS there are RPC counters that are extremely useful for understanding users latencies
RPC Requests – number of operations currently being processed by the store
RPC operations/sec – number of incoming operations
RPC Average Latency – average of latency of the last 100 RPC operations (sliding window)
RPC Num. of Slow Packets – number of packets in the past 1024 that have latencies longer than 2 seconds
Data Device BaselineData Device Baseline
Flat response times for read and write activity independent of I/O loadFlat response times for read and write activity independent of I/O load
A ROCK’IN System!
Questions and Answers
© 2005 Microsoft Corporation. All rights reserved.© 2005 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in
this summary.this summary.Data in this presentation is current as of its publish date (see title slide).Data in this presentation is current as of its publish date (see title slide).
Optimizing Storage for Exchange Server 2003
http://www.microsoft.com/technet/prodtechnol/exchange/2003/library/optimizestorage.mspx
Exchange Development Team Blog (storage):
http://blogs.msdn.com/exchange/archive/2004/10/11/240868.aspx
Troubleshooting Microsoft Exchange Server 2003 Performance
http://www.microsoft.com/downloads/details.aspx?FamilyID=8679F6BD-7FF0-41F5-BDD0-C09019409FC0&displaylang=en
Exchange Best Practices Analyzer
http://www.microsoft.com/exchange/downloads/2003/exbpa/default.asp.
TechNet
http://www.microsoft.com/uk/technet
Links