System Tuning Perf Final
Transcript of System Tuning Perf Final
-
8/3/2019 System Tuning Perf Final
1/84
Delivering Performance on Sun: SystemTuning
Technical White Paper
Greg Schmitz and A llan Esclamado
-
8/3/2019 System Tuning Perf Final
2/84
Please
Recycle
1999 Sun Microsystems, Inc.
901 San Antonio Road, Palo Alto, California 94303 U.S.A
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth insubparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 and FAR 52.227-
19.
The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications.
TRADEMARKS
Sun, Sun Microsystems, the Sun logo, Solaris, NFS, Ultra, UltraComputing, Ultra Enterprise, XGL, XIL Creator, Creator3D,
SunVTS, and OpenWindows are trademarks or registered trademarks of Sun Microsystems, Inc.in the United States and other
countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International,
Inc. in the United States and other countries. Products bearing SPARC trademarks are based upon an architecture developed by
Sun Microsystems, Inc.
THIS PUBLICATION IS PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE, OR NON-INFRINGEMENT.
THIS PUBLICATION COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES
ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN; THESE CHANGES WILL BE INCORPORATED IN
NEW EDITIONS OF THE PUBLICATION. SUN MICROSYSTEMS, INC. MAY MAKE IMPROVEMENTS AND/OR
CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED IN THIS PUBLICATION AT ANY TIME.
-
8/3/2019 System Tuning Perf Final
3/84
i
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Tuning the Solaris Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Reconfiguring the Solaris Kernel. . . . . . . . . . . . . . . . . . . . . . . . . 4
3. Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Memory Paging and Swapping . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Processes and Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Managing Swap SpaceThe swap Command . . . . . . . . . . . . . 13
Process Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Add ress Space Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4. Tuning Disk Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Disk Performance Specifications . . . . . . . . . . . . . . . . . . . . . . . . . 17
Disk Interface Stand ards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Disk Activity and Performanceiostat . . . . . . . . . . . . . . . . . . . 32
Managing Disk Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
-
8/3/2019 System Tuning Perf Final
4/84
ii Delivering Performance on Sun: System Tuning April 1999
Volume Managemen t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5. File System Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Unix File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Tmpfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
VERITAS File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Tun ing File Systemstunefs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6. Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Selecting the Right Network Technology . . . . . . . . . . . . . . . . . . 44
Ensuring Data Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Evaluating System Performance . . . . . . . . . . . . . . . . . . . . . . . . . 51
Redu cing N etwork Congestion . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7. Summary of Guidelines for System Performance Evaluation 61
A. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
References
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
-
8/3/2019 System Tuning Perf Final
5/84
1
Introduction
1
This document focuses on techniques for tuning the Sun computing
environment for maximum performance. It is aimed at system administrators
and sophisticated end users wh o d o their own system ad ministration. Each
chapter concentrates on a different subsystem of the computing environment
and the specific things that can be done to increase performance. The
subsystem topics are presented in the order recommended for tuning analysisand optimization:
Chap ter 2 Tun ing the Solaris Kernel identifies the p aram eters that can
be tuned to ensure the Solaris operating environment operates at peak
efficiency.
Chapter 3 Memory discusses the impact of paging and swapping
schemes, and identifies tools adm inistrators can u se to effectively monitor
and manage memory and swap space.
Chap ter 4 Tun ing Disk Subsystems focuses on the key p erformance
criteria, important disk technologies, and data layout schemes that ensure
high p erformance I/ O. Sun tools that can aid in th e detection, isolation, and
correction of p erformance-related issues are also d iscussed .
Chap ter 5 File System Tuning looks at the high performan ce file
systems available from Sun, as w ell as the tools and techniques that can be
employed to ensure that these file systems continue to operate at peak
efficiency.
-
8/3/2019 System Tuning Perf Final
6/84
2
Delivering Performance on Sun: System Tuning April 1999
1
Chap ter 6 focuses on the netw orking technologies available today, as well as
the issues surrounding choosing a networking technology, ensuring data
integrity, evaluating network performance, and reducing network
congestion. Tools available to aid these efforts on Sun systems are also
discussed.
Chap ter 7 highlights the m ajor fun ctions of the Solaris kernel, and discusses
the key tables and environment variables that need to be tuned in order to
ensure it operates efficiently.
Other Performance White Papers and Sun Documentation
This paper, as well as companion documents, can be found at
http://www.sun.com
on th e World Wide Web. These docum ents include:
Delivering Performance on Sun: Optimizing Applications for Solaris
helps
software developers construct high performan ce app lications with the use of
proper compiler flags. It also discusses the development of high
performance multithreaded applications.
Delivering Performance on Sun: Tools for High Performance Distributed
Computing
addresses performance issues for distributed computingplatforms and the compute-intensive applications they run.
In addition, all Sun documentation can be found at http://docs.sun.com
.
-
8/3/2019 System Tuning Perf Final
7/84
3
Tuning the Solaris Kernel
2
The Solaris operating system, called the kernel
, is loaded into memory w hen
the system boots. The kernel m anages all of the ph ysical resources of the
computing system. Applications interface with the kernel through a series of
system calls.
Key kernel functions includ e:
Implemen ting m any types of file systems (UFS, NFS, tmp fs, etc.) and
perm itting p rocesses to create, read, w rite, and a ccess these files.
Managing daemons, system processes which manage system resources.
Moving large grou ps of information, called pages
, to and from the disk into
main m emory as needed . This concept is known as virtual memory
.
Keeping track of all active p rocesses and deciding w hich one gets to ru n
next.
Managing device drivers
, the progr ams th at control phy sical devices like d isk
drive subsystems.
Managing netw ork services and IPC facilities.
Providing system management functions, such as booting and error
handling.
Providing oth er functions that m ake system resources available to processes.
-
8/3/2019 System Tuning Perf Final
8/84
4 Delivering Performance on Sun: System Tuning April 1999
2
Reconfiguring the Solaris Kernel
The standard Solaris kernel is dynamic, supporting all devices, pseudo-
devices, and op tions by au tomatically loading mod ules when they are needed.
Unlike previous versions of SunOS, there is no need to manually configure the
Solaris kernel.
The kernel needs to be reconfigured after hardware has been installed, and a
reconfiguration reboot (boot -r) performed before it can be recognized an d
used by the system. For some configuration changes it is sufficient to run the
drvconfig comm and followed by the disks or tapes command to
recognize devices recently added to the system.
The /etc/system File
The / etc/ system file houses a wide variety of system configuration and tuning
parameters. Administrators can tune many of the parameters contained in
/ etc/ system to increase system performan ce, includ ing buffer cache, file
system, STREAMS, interprocess comm un ication, memor y ma nagem ent, and
other p arameters.
bufhwm, the maximum size for buffer cache memory usage, in KB
ufs_ninode, the maximum size of the inode table
ncsize, the size of the d irectory nam e lookup cache
nstrpush, the maximum number of STREAMS pushes allowed
strmsgsz, the m aximum size for a STREAMS message a user can create
strctlsz, the maximum size of the ctl part of a message
strthresh, the maximum amount of dynamic memory the STREAMS
subsystem can consume, in bytes
sadcnt, the number ofsad devices
msginfo_msgmap, the number of entries in the message map
msginfo_msgmax , the maximum message size
msginfo_msgmnb, the maximu m bytes on queue
msginfo_msgmni, the number of message queue identifiers
msginfo_msgssz , th segment size of a message (should be a multiple of the
word size)
msginfo_msgtql, the number of system message headers
msginfo_msgseg, the nu mber of m essage segments (mu st be < 32768)
seminfo_semmap, the number of entries in the semaphore map seminfo_semmni, the number of semaphore identifiers
seminfo_semmns, the number of semaphores in the system
-
8/3/2019 System Tuning Perf Final
9/84
Tuning the Solaris Kernel 5
2
seminfo_semmnu , the number of processes using the undo facility
seminfo_semmsl, the maximu m n um ber of semaphores, per id
seminfo_semopm , the maximum num ber of operations, per semaphore call
seminfo_semume, the maximu m n um ber of undo structures per process
seminfo_semvmx , the semaph ore maximu m value
seminfo_semaem, the maximum value for adjustment on exit
shminfo_shmmax , the maximu m shared memory segment size
(set shmsys:shminfo_shmmax = 0xffffffff, or
set shmsys:shminfo_shmmax = 4292967295)
shminfo_shmmin, the minimum shared mem ory segment size
shminfo_shmmni, the num ber of shared m emory identifiers
shminfo_shmseg, the segments, per process lotsfree, if freemem drop s below lotsfree, the system starts to steal pages
from p rocesses
tune_t_fsflushr, the rate at which fsflush is run, in seconds
tune_t_minarmem, the minimum available resident (not swappable) memory
needed to avoid deadlock, in pages
tune_t_minasmem, the minimum available swap pable memory needed to
avoid deadlock, in pages
tune_t_flckrec, the maximum number of active frlocks
lwp_default_stksize, the size of the kernel stack for lwp s. This value shou ldnot be ad justed un less there is a kernel overflow. lwp _default_stksize is
expressed in bytes an d mu st be a m ultiple of PAGESIZE bytes.
npty, the total nu mber of 4.0 or 4.1 pseu do-ttys configured
pt_cnt, the total number of 5.7 pseudo-ttys configured
priority_paging, enabling the sytem to p lace a boun dary around the file
cache, ensuring that file system I/ O does not cause app lication pag ing
consistent_coloring, identifies the method used to improve hit ratios for the
CPU external cache 0 (uses va rious virtu al ad dress bits), 1 (ph ysical
add ress is set to virtu al ad dress), 2 (bin h opp ing), 6 (SBSD page coloring
scheme)
Administrators can modify these kernel parameters while the system is
runn ing via the adb comm and, or by editing the / etc/ system file and
rebooting the system.
More information / etc/ system tun able parameters can be found in the System
Administration Volume IImanu al located at http://docs.sun.com on th e World
Wide Web.
-
8/3/2019 System Tuning Perf Final
10/84
6 Delivering Performance on Sun: System Tuning April 1999
2
Kernel Tables
A variety of kernel tables can be tuned by system administrators to enhance
the p erformance of the Solaris operating en vironmen t. Table 2-1 identifies thesetables and the relevant environment variables they control, as well as
suggested settings for each.
Table 2-1 System administrators can tune key Solaris kernel tables
Many of the traditional Unix kernel limits have been removed over the years.
Solaris now dy nam ically sets limits that increase on large memory
configura tions. Trad itional tables that no longer exist or need to be tun ed
includ e the text table, open fi le table, callout table, and clist table.
More information on configu ring kernel tables can be found in Sun Performance
and Tuning, Java and the Internet, Chapter 14: Kernel Algorithms and Tuning.
maxusers Kernel Parameter
Over time, the Solaris kernel has been mod ified to enab le it to scale according
to the h ardw are capabilities of the w orkload it is run ning. By self-configu ring
and tuning, Solaris reduces adm inistrative burd ens and dynam ically ad apts to
changing bu siness needs. While earlier versions needed to be h and -tuned , each
successive version of Solaris 2 has converted han d-ad justed values into
adaptively changing limits.
One kernel parameter that remains an important factor in system performance
is the maxusers environm ent va riable. As eviden ced in Table 2-1, many
Solaris kernel parameters are defined relative maxusers, rendering it perhaps
the most critical variable administrators can tune. Indeed, maxusers
Variable Default Setting
Process Table max_procs 20 + 16 * maxusers
Per User Process Limit maxuprc max_nprocs - 5
Inode Cache ufs_ninode max_nprocs + 16 + maxuers + 64
Quota Table ndquot (maxusers * NMOUNT) / 4 + max_nprocs
Directory Name Lookup Cache ncsize max_nprocs + 16 + maxusers + 64
-
8/3/2019 System Tuning Perf Final
11/84
Tuning the Solaris Kernel 7
2
determines whether or not the Solaris kernel has sufficient room to
accomm oda te users comfortably. (It is importa nt to n ote that maxusers is not
a limit on the nu mber of simu ltaneous u sers supported by the system.)
Increasing th e maxusers par ameter increases the size of the Solaris kernel,
and consequently decreases the amount of memory available. Administrators
shou ld, therefore, follow a few simp le guidelines to ensure the kern el is tun ed
effectively for users of the system:
The maxusers variable is autom atically set by Solaris according to the
amount of physical memory available in the system. The automatic
minimum value for maxusers is 8, and the au tomatic maximum is 1024.
The maxusers par ameters is set a little less than th e amou nt of RAM in thesystem, in megabytes.
For NFS servers w ith small amou nts of mem ory (un der 512 MB), it can be
beneficial to increase the DN LC and inode cache limits.
There is no need to tu ne in most cases and only a small imp act on performance
can be expected from any kernel tuning.
Additional information on the impact of the maxusers kernel parameter can
be found in Sun Performance and Tunin g, Java and the In ternet, Chapter 14: Kernel
Algorithms and Tuning.
-
8/3/2019 System Tuning Perf Final
12/84
8 Delivering Performance on Sun: System Tuning April 1999
2
-
8/3/2019 System Tuning Perf Final
13/84
9
Memory 3
Memory Paging and Swapping
When memory becomes tight, the kernel begins to employ techniques known
as paging an d swapping. Paging occurs when the kernel puts part of an
executing program out to disk, enabling the pages it occupied to be used for
other pu rposes. When the process needs to access the page of memory tha t was
moved to disk, apage faultoccurs, forcing th e process to wait u ntil the page is
copied ba ck into m emory. This scheme fosters the efficient u se of mem ory, as
only the parts of a program that are in use at the time the program is running
need to be in memory and available for use.
Swapp ing occurs w hen th e kernel stores an entire curren tly inactive process to
disk, enabling the pages it occupied to be used by other executing programs.
To continue th e process is paged ba ck in grad ually. Swa pp ing is a par t of the
normal housekeeping performed by the Solaris kernel.
Paging and Swapping Parameters
The Solaris kernel utilizes several values stored in the / etc/ system file to
determine when to start paging and swapp ing, actions that begin and end
when the memory system crosses a threshold set as part of the kernel paging
algorithm:
lotsfree, the amount of memory required by the p age daemon. If memorydrops below this limit, the page daemon starts looking for pages to reuse.
desfree, the level at which swapping begins.
-
8/3/2019 System Tuning Perf Final
14/84
10 Delivering Performance on Sun: System Tuning April 1999
3
minfree, the minimum memory level acceptable to the system.
slowscan, the num ber of pages scanned p er second wh en lotsfree is reached .
fastscan, the num ber of pages scanned per second w hen minfree is reached .
handspreadpages, the distance between the hand that sets a reference flag
and the hand that clears it. The two hands keep track of which pages of
mem ory ha ve been accessed recently. If the p ages have n ot been accessed in
a certain amount of time, they are paged out.
maxpgio, the maximum n um ber of page I/ O operations per second the
system will schedule. maxpgio is typically set to 40, assuming there is little
swap space on mu ltiple devices. If system swap space is on fast and/ or
multiple disks, maxpgio should be increased to take advantage of the higher
performance devices.
More information on paging in the Solaris kernel can be found in Su n
Performance and Tuning, Java and the Internet, Chapter 13: RAM and Virtual
Memory. Information on the / etc/ system file can be found in the System
Administration Guide, Volume IIlocated at http://docs.sun.com on the World Wide
Web.
Processes and Paging
Und erstanding the n um ber of processes running on a system and how they are
affected by p aging is essential to overall performan ce. System a dm inistrators
should routinely monitor processes and learn how they are utilizing system
resources. To aid this effort, Sun has p rovided several tools for ad ministrators
as part of the Solaris operating env ironment: vmstat, swap , sar, ps and Find
Process. Each tool can be used in a variety of wa ys, includ ing d etermining
memory, paging, and sw apping m easurements.
Memory Measurements
The first step in evaluating processes is determining the paging and swapping
activity occurr ing on the system. Both the vmstat an d sar comm ands can be
used to mon itor paging activity. While sar is better for information logging
purposes, vmstat tends to provide more detailed information. Correlations
exist between the ou tpu t of these tools, as discussed below.
-
8/3/2019 System Tuning Perf Final
15/84
Memory 11
3
Swap space (vmstat swap, sar -r freeswap, and swap -s)
Each command describes the amount of available swap space, although in
different forms. The vmstat swap command identifies the amount ofavailable swap space, in KB wh ile the sar -r freeswap reports in 512
byte blocks. The swap -s command shows available swap space in addition
to other metrics. Administrators should note that when available swap
space is exhausted the system will be unable to use additional memory.
Free memory (vmstat free an d sar -r freemem)
Administrators can determine the amount of free memory the pages of
RAM that are ready imm ediately to be used w hen a p rocess starts execution
or needs more memory. The vmstat free comm and reports free memory
in KB, the while sar -r freemem comm and reports it in p ages.
Reclaims (vmstat re)
The nu mber of p ages reclaimed from the free memory list are called reclaims.
Pages that hav e been taken from a process but not yet reused by a nother can
be reclaimed, thereby avoiding the full page fault that would typically
result. The vmstat re command can be used to identify reclaimed
memory.
Page Fault Measurements
Identifying when page faults are occurring is an important step in evaluating
system performance.
Attaches to existing p ages (vmstat at an d sar -p atch)
The vmstat at an d sar -p atch comm ands measure the num ber of
attaches to shared m emory pages.
Pages paged in (vmstat pi an d sar -p pgin, ppgin)
The vmstat pi comm and reports the number of KB/ second, w hile the sar
command reports the number of page faults and the n um ber of pages paged
in by swap space for file system read operations. Administrators should
note th at the file system block size is 8 KB. Consequen tly, there m ay be tw o
pages, or 8 KB, paged in per page fau lt on m achines with 4 KB pag es. Also
notew orthy is the fact that UltraSPARC machines em ploy 8 KB pages;almost all other CPUs utilize 4 KB pages.
-
8/3/2019 System Tuning Perf Final
16/84
12 Delivering Performance on Sun: System Tuning April 1999
3
Minor faults (vmstat mf an d sar -p vflt)
Minor page faults are caused by add ress space or hardw are add ress
translation faults that can be resolved without performing a page-inoperation. Such operations a re fast and requ ire little CPU u tlization,
eliminating the need for a process to have to stop and wait.
Other fault types (sar -p pflt, slock, vmstat -s copy-on-write,
zero fill)
Many oth er types of page fault conditions are possible, and can be identified
using the above commands. More information on these and other fault
conditions can be found in the corresponding man pages, as well as Su n
Performance and Tuning, Java and the Internet, Chapter 13: RAM and Virtual
Memory.
Page-out Measurements
The vmstat an d sar commands p rovide information on the amoun t of data
paged out to the sw ap space or filesystem:
Pages paged out (vmstat po an d sar -g pgout, ppgout)
The vmstat po command reports the num ber of KB/ second p age out,
while sar reports the number of page-outs and the num ber of pages paged
out to swap or file system space. Administrators should note that the
clustering of swap space writes can lead to a large number of pages written
per p age-out.
Pages freed (vmstat fr an d sar -g pgfree)
The Solaris operating environm ent keeps track of the pages freed the rate
at which memory is returned to the free list. The vmstat fr command
reports the amount of memory freed in KB freed per second, while the sar
command reports the number of pages freed p er second.
Short-term memory deficit (vmstat de)
Deficit is a paging parameter that provides a measure of hysteresis for the
page scanner when there is a period of high memory d emand . If the value
returned by the v mstat de command is non-zero, then mem ory was being
consumed quickly and extra free memory will be reclaimed in anticipationof its impending use.
-
8/3/2019 System Tuning Perf Final
17/84
Memory 13
3
Page daemon scanning rate (vmstat sr an d sar -g pgscan)
The vmstat sr an d sar -g pgscan commands can be used to
determine the nu mber of pages scanned by the page d aemon as it looks forpages used infrequently. If the page daemon scanning rate stays above 200
pages p er second for long period s of time, then a m emory sh ortage is likely.
Pages freed by the inode cache
Statistics related to the inod e cache u sed by the Un ix file system (UFS) are
also indicators of system p erformance. In p articular, ufs_ipf measu res UFS
inode cache reuse, a feature that can cause pages to be freed when an
inactive inod e is freed. ufs_ipf is the nu mber of inod es with p ages freed as a
percentage of the total number of inodes freed.
Swapping M easurements
The sar an d vmstat commands also supp ly system swap ping information.
Pages swap ped in (vmstat -S si an d sar -w swpin,bswin)
The vmstat -S si comm and reports the number of KB/ second swap ped
in, while the sar -w swpin command reports the number of swap -inoperations. The sar -w bswin command reports the number of 512 byte
blocks swap ped in.
The vmstat -S so comm and reports the number of KB/ second swap ped
out, while the sar -w swpot command reports the number of 512 byte
blocks swapp ed out.
More information on analyzing paging statistics can be found in Su n
Performance Tuning, Java and the Internet, Chapter 13.
Managing Swap Space The swapCommand
The efficient operation of a system relies on the effective utilization of both
memory an d sw ap space. System ad ministrators must ensure the system h as
sufficient swap resources so the paging and swapping activities performed by
the kernel can operate smoothly. Sun provides the swap utility in the Solaris
operating environment to facilitate the addition, deletion, and monitoring of
the swap areas used by the m emory manager.
-
8/3/2019 System Tuning Perf Final
18/84
14 Delivering Performance on Sun: System Tuning April 1999
3
Adding a swap area
The -a option of the swap comm and can be used by system ad ministrators
to add a swap area to the system. A swap file must be specified, such as/ dev/ dsk/ c0t0d0s1, as well as a length for the swap area in 512 byte blocks.
A swap space mu st be at last 16 blocks in length.
Deleting a swap area
The -d option of the swap comm and can be used by system ad ministrators
to delete a swap area from the system. Once deleted, blocks will no longer
be allocated from this area. In add ition, all swap blocks previou sly in use in
this swap area will have been m oved to other swap areas.
Listing swap area status
The -l option of the swap comm and can be used by system ad ministrators
to list the status of all swap areas on th e system. Several pertinent pieces of
information are reported, including: the path name for the swap area, the
major/ minor d evice num ber if a block special device is employed , the
length of the swap area, and the available free space.
Printing summary information
The -s option of the swap comm and can be used by system ad ministrators
to summary information for all swap areas on the system. Total swap space
usage an d av ailability is provided. System ad ministrators can determ ine the
total amount of swap space allocated, the amount not currently allocated
but reserved for future use, the total amount that is either allocated or
reserved, and the amount available for future reservation and allocation.
Distributing Swap SpaceTo achieve peak system performance, system administrators must distribute
swap space so it can be u tilized effectively. Following a few simple gu idelines
can dramatically improve system performance:
Place swap partitions or swap files on as many different disks as possible.
Place swap partitions or swap files on the fastest disks in the system.
If more th an on e d isk is available per controller, limit the sw ap space to only
one of those d isks.
3
-
8/3/2019 System Tuning Perf Final
19/84
Memory 15
3
Although swapping to remote (NFS) file systems is permitted, this practice
should be avoided if at all possible. Swapping to remote file systems not
only degrades swap performance, it also increases network traffic.
Avoid p lacing sw ap fi les on file systems th at are susp ected of excess
fragmentation or are almost full. If possible, swap files should be p laced in a
dedicated disk partition to eliminate fragmentation concerns.
More information on managing swap space can be found in the swap(1M)
man page.
Process Manager
The new Process Manag er in Solaris 7 enables adm inistrators to quickly obtain
process related information in the CDE environm ent (Figure 3-1). The Process
Manager enables administrators to:
Show u ser, system or a ll processes
Sort processes by resource consum ption
Suspend , resume, or kill processes
Diagnose sick processes and contact their owning user via Address Manager
More information on the Process Manager can be found in the Solaris
operating environment administrative guides located at http://docs.sun.com on
the World Wide Web.
Obtaining Process Informationps
System ad ministrators need to obtain information abou t active processes in the
system. The ps command aids this effort by reporting the process ID, terminal
identifier, cumu lative execution time, and th e comman d n ame by d efault. Moreinformation, including processes owned by a specific user, can be obtained by
utilizing the options specified in the ps(1M) man p age.
Address Space Map
The /usr/proc/bin/pmap comm and can be used to print the add ress space
map for each running process. Furthermore, the new -x option av ailable in the
Solaris 7 operating environment prints resident/ shared/ private mappingdetails, providing more information than ever before.
3
-
8/3/2019 System Tuning Perf Final
20/84
16 Delivering Performance on Sun: System Tuning April 1999
3
More information on the pm ap command can be found in the proc(1M) man
page.
-
8/3/2019 System Tuning Perf Final
21/84
17
Tuning Disk Subsystems 4
This chapter first describes options for selecting disk subsystems which may
improv e system p erformance. Following is a d escription of the tools and
utilities for analyzing the p erformance of these disk subsystem s.
It is not u nu sual for storage capacity requirements to grow by as mu ch as 100%
per year. Coincident with this growth is an increasing need to ensure ready
access to data whenever it is needed. System administrators know that rawcapacity and performance do not stand alone both scalability and high
availability are importan t pr actical requiremen ts for ensuring econom ical and
reliable operation and information access. Indeed, disk subsystems must be
configured and tuned to match the performance, availability, and cost
requirements of users.
Focusing on disk sub systems, this chap ter describes key performan ce criteria,
introdu ces impor tant d isk technologies, and d escribes Sun tools that can aid in
the d etection, isolation, and correction of p erformance-related issues.References for more detailed tuning information are provided throughout.
Disk Performance Specifications
A number of factors are helpful in determining the relative performance of
disk subsystems to one another:
Disk capacity, the amount of data th at can be stored on the d isk or in the d isk
subsystem.
4
-
8/3/2019 System Tuning Perf Final
22/84
18 Delivering Performance on Sun: System Tuning April 1999
4
Seek time, the amount of time it takes to move the disk drives heads from
one track to another, and is a product of the performance of the actuator.
Disk drives typically spend ten times more time seeking than transferring
data, making seek time perhaps the most important indicator of a disksperformance. Some disk vendors quote the time it takes to traverse the
entire disk; others cite the time it takes to traverse on e-third of the d isk.
Rotational speed, the speed a t wh ich the disk spins. After the head h as moved
to the app ropriate track, the disk mu st rotate until the d ata is und erneath
the head. The disks rotational speed determines its latencythe amount of
time it takes to set up for a data transfer.
Data transfer rate, the amoun t of data th at can be transferred from the disk to
the host per unit time. Both theoretical maximum and typical transfer rates
are often quoted . Data transfer rate is quoted in Kilobytes/ second,
Megabytes/ second, or Gigabytes/ second.
Zone bit recording, a technique which enables data to be recorded at variable
densities throughout the disk, thereby providing higher capacities. This
method breaks the surface of the disk into different zones, each with a
different number of bytes per track, enabling the bit density to remain
constant across the disks sur face. Drives w ith this feature offer improved
transfer rate an d capacity.
Bus bandwidth, the amount of data that can be handled by the system bu s.
SCSI bus band width can be mad e faster and wider.
Controller, a device which coordinates read and write requests between the
host and the d isk drive.
Buffer size, the amou nt of cache em ployed on th e disk controller. Some
drives utilize separate read and write buffers; others use a unified cache.
Mean Tim e Between Failure (M TBF), the average time betw een failures.
Mean Time to Data Loss (MTDL), the average time a disk subsystem can
operate before another failure causes da ta to be lost.
More information on these and related topics can be found in Sun Performance
and Tuning, Java and the Internet, Chapter 8: Disks, and Configuration and Capacity
Planning for Solaris Servers, Chapter 7: Storage Subsystem Architecture.
-
8/3/2019 System Tuning Perf Final
23/84
4
-
8/3/2019 System Tuning Perf Final
24/84
20 Delivering Performance on Sun: System Tuning April 1999
4
Table 4-1 highlights the key characteristics of the various SCSI standards.
Table 4-1 The SCSI protocols provide a wide variety of performance characteristics
More information on SCSI can be found in Configuration and Capacity Planning
for Solaris Servers, Chapter 7: Storage Subsystem Architecture.
Fibre Channel
Fibre Channel is a high performance interconnect standard designed for
bidirectional, point-to-point communications between high performance
wor kstations, peripherals, and large host systems. It offers a variety of benefits
over other link-level protocols:
Efficiency and high performance
Scalability
Operation over long distances
Easy to cable
Supports popular high level protocols
Sup ports switched networks
Secure from RF snoop ing
Invulnerab le to RF interference
SCSI Fast SCSIFast and Wide
SCSIUltra SCSI
Asynchronous/Synchronous
Asynchronous or
Synchronous
Single-ended
Synchronous
Differential
Synchronous
Differential
Synchronous
Differential
Transfer Rate 5 MB/second 10 MB/second 20 MB/second 40 MB/second
Number of Pins 50 pins 50 pins 68 pins 68 pins
Width 8 bits 8 bits 16 bits 16 bits
Maximum Cable Length 6 m 3 m synchronous
25 m differential
3 m synchronous
25 m differential
3 m synchronous
12 m differential
Number of Targets 7 7 15 15
4
-
8/3/2019 System Tuning Perf Final
25/84
Tuning Disk Su bsystems 21
4
Theory of Operation
Channels are a generic class of communications technology designed to move
data from one p oint to another with h igh speed an d minimum latency. Forspeed , functions such as error correction and retry-on-busy are typically
implemented in hardware. Because they do not have sophisticated software-
based rou ting and switching capabilities, chan nels typically opera te in
environments w here every device on the channel is defined and known in
advance.
At its most basic, Fibre Chann el is just a high sp eed serial connection betw een
two buffers. Indifferent to the format of the data, Fibre Channel only p rovides
the error checking and necessary signaling to ensure the reliable movement ofdata between the source and destination buffers.
Fibre Channel employs a mechanism know n as a fabric to establish conn ections
between ports. The fabric, like the telephon e netw ork, has all the need ed
intelligence to make rou ting d ecisions. The only concern of a p ort is to ensure
its proper conn ection to the fabric. Each fabric has an ad dress space that allows
up to 16 million ad dresses. A fabric can be simp le, or a mix of switches and
hubs th at connect a complex and widely dispersed network of ports.
Reliability
There are no limits on the size of a transfer between ap plications using Fibre
Chan nel. Each sequence is broken into frames, the framesize being negotiated
by th e por ts and the fabric. A 32-bit cyclic redu nd ancy check ensures th at
transm ission errors can be reliably detected.
Fibre Channel Arbitrated LoopAn imp ortant enhancement to Fibre Channel has been the d evelopment of
Fibre Chan nel Arbitrated Loop (FC-AL), developed specifically to meet th e
need s of storage interconnects. Employing a simple loop topology, FC-AL can
supp ort both simple configurations and sophisticated arrangements of hubs,
switches, servers, and storage systems (Figure 4-1). Furth ermore, by using SCSI
protocols over the m uch faster, more robust Fibre Chan nel link, FC-AL
provides higher levels of performance without requiring expensive and
complex changes to existing dev ice d rivers.
4
-
8/3/2019 System Tuning Perf Final
26/84
22 Delivering Performance on Sun: System Tuning April 1999
4
Figure 4-1 FC-ALs loop topology can support both simple and complexconfigurations.
Key Benefits of FC-AL
FC-AL offers a host of benefits over other p rotocols, enabling storage systems
based on it to foster the creation of new app lications that take full advan tage of
some impressive sp ecifications:
Gigabit bandwidth
FC-AL is capable of data tran sfer rates of up to 100 MB/ second (du al
ported ), with 400 MB/ second envisioned for the futu re far faster than
SCSI, Serial Storage Architecture (SSA), or P1394 (Firewire).
Suitability for networks
In addition to performance, Fibre Channel is ideal for building networks.
Employing hubs and switches similar to those used in networks, Fibre
Channel enables complex arrangements of storage and systems to be
connected together in highly scalable, highly ava ilable netw orks, or fabrics.
Use of existing SCSI protocols
FC-AL allows SCSI comm and packets to be sent over a h igh speed ph ysical
medium , reducing software and firmw are costs and m inimizing impact on
existing software.
4
-
8/3/2019 System Tuning Perf Final
27/84
Tuning Disk Su bsystems 23
Node addressability far better than SCSI
With th e ability to su pp ort u p to 126 FC-AL devices on a single host
adaptor, cost and implementation complexity is greatly reduced. Using
optical fiber media, a single FC-AL loop can support nodes with a
separation of up to ten kilometers from the host.
Unique device naming convention
FC-AL employs un ique, 64-bit w orld w ide nam es for devices. Disk drives,
host adapters, and interface boards are identified by a world w ide name,
enabling improved d iagnostics and software sup port.
Greatly simplified wiring and cabling requirements
Because Fibre Channel is a largely optical, simple serial protocol, electrical
interference and expensive cabling are much less of an issue than with the
complex parallel data paths used by SCSI.
In add ition, FC-AL imp lementations support redund ant d ata paths, hot
pluggable components, multiple host connections, and dual ported drives
features that 15 year-old SCSI technology was never intended to support.
Configuration Flexibility
Fibre Channel Arbitrated Loop enables storage systems to implement several
loop configurations, each designed to address varying levels of performance
and availability. Utilizing bypass circuits and multiplexors, FC-AL based
storage d evices like the Sun StorEdge A5000 perm it the constru ction of
multiple loops within a single enclosure, improving both performance and
ava ilability (Figu re 4-2).
More information on Fibre Channel technology can be found in Su n
Performance and Tuning, Java and the Internet, Chapter 8: Disks, and Configuration
and Capacity Planning for Solaris Servers, Chapter 7: Storage Subsystem
Architecture, and in the w hite pap ers listed in th e References section of this
document.
4
-
8/3/2019 System Tuning Perf Final
28/84
24 Delivering Performance on Sun: System Tuning April 1999
Figure 4-2 FC-ALs loop topology supports both single and dual loop configurations.
RAID
Redun dan t Arrays of Inexpensive Disks, or RAID systems, were first formally
defin ed by researchers at Berkeley in 1987. The most imp ortant p art of their
idea was to combine the power of small, inexpensive drives to achieveperforma nce equal to a single large, expensive d isk. Because RAID requ ired
mu ltiple spind les, features were also need ed that could p rotect against the data
loss that could be caused by the failure of any one of the d rives. These original
safeguards subsequently became an important feature in their own right, with
man y u sers pu rchasing RAID systems just for th eir availability.
Toda y, disk arra ys are sold a s an intelligently man aged collection of disk d rives
organized to op timize performan ce for a particular set of tasks. All RAID
systems achieve higher performance and / or reliability by replicating orspanning data across multiple disks. Exactly how this is done has profound
effects on subsystem performance, cost, and reliability.
Two metrics are used in assessing the performance of disk subsystems:
Transfer Rate
Transfer rate is the speed (measured in MB/ second ) with wh ich a subsystem
can move data through its controller. In RAID systems, read and write
transfer rate perform ance can vary considerably, and is particularly valuablefor app lications tha t have to move large am oun ts of data qu ickly, like
document imaging or digital video.
4
-
8/3/2019 System Tuning Perf Final
29/84
Tuning Disk Su bsystems 25
I/O Operations per Second (IOPS)
IOPS are a m easure of the ability of a storage system to h and le multiple,
indep end ent I/ O requests in a certain period of time. RAID systems with
high transfer rates do not always have good transaction rate performance.
Database and transaction processing systems are examples of applications
that typically require high I/ O rate p erformance.
Recovery After a Failure
When a disk in a protected RAID system does fail, the array continues to
operate in reduced mode. In reduced m ode, array performance may be lowered,
depend ing on the RAID level being u sed and wh ether reads or w ritespredominate. Upon encountering a failure, the simplest RAID systems inform
the administrator of the problem, leaving them to organize the replacement of
failed compon ents and to begin the restoration of norm al operations. The most
soph isticated systems can au tomatically begin reconstructing data to a reserve
dr ive called a hot spare. Hot sparing allows the array to automatically return to
full performance quickly, and p ermits adm inistrators to defer maintenance to a
more conven ient time. Some systems even allow d efective d rives to be
replaced without interrupting processing or removing power, but such hot
swapping is sometimes less useful than a system th at can instantly begin
recovery using a hot spare.
Popular RAID Levels
Can RAID subsystem s actually eliminate the I/ O bottleneck and p rovide very
high levels of perform ance? In some circumstances, RAID configu rations can
dramatically improve performance and in others may actually impair it. To
accommodate the various kinds of demands put on storage systems, different
levels of RAID hav e been d efined, each w ith their ow n sp ecial characteristics.
Five RAID levels, numb ered 1 throu gh 5, were d efined by the Berkeley
researchers. Since that time, a few more have been added some simply
variations of the original five. The four RAID levels offered in the Sun
StorEdge array product family, levels 0, 1, 5, and 0+1, are discussed here.
4
-
8/3/2019 System Tuning Perf Final
30/84
26 Delivering Performance on Sun: System Tuning April 1999
Level 0 - Striping
Striping is an indu stry term for breaking a data stream up and placing it across
mu ltiple d isks in equ al-sized chunks or stripe blocks. Each chu nk is sequ entiallywr itten to successive drives in the ar ray. The set of sequential chun ks that
begins with the first drive and ends with the last drive forms the stripe
(Figure 4-3). Array management software running on the host, on a special
controller inside th e array, or some combination of both , is responsible for
making the array look like a normal, or virtual, disk to the operating system
and applications.
Figure 4-3 RAID 0 breaks data into equal sized chunks of data in sets that form astripe. In this figure, the stripe width is three, with chunks 1, 2, and 3forming a single stripe.
Each disk in a stripe is generally assumed to be on its own independent datachannel, allowing the transfer rate of a RAID 0 imp lementation to app roach the
sum of the transfer rates of each of the drives. (Presum ing that th e intervening
software and hardw are has sufficient throughpu t.)
RAID 0 can also p rovide excellent IOPS performance, because d ata is spread
across many independ ent spindles and actuators, helping to balance the I/ O
load. Op timizing RAID 0 for transfer rate or IOPS performance requires
administrators to make configuration decisions about the relationship of the
sizes of the chunks in a stripe and the average size of disk transactions.
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 5
Chunk 6
Chunk 1
Chunk 2
Chunk 3Array ManagementSoftware
Virtual Disk
PhysicalDisk 1
PhysicalDisk 2
PhysicalDisk 3
Chunk 4
Chunk 5
Chunk 6
4
-
8/3/2019 System Tuning Perf Final
31/84
Tuning Disk Su bsystems 27
Level 1 M irroring
Used for a long time in fault tolerant ap plications, this type of array is also
called shadowing or mirroring. Designed to optimize data availability ratherthan speed, RAID 1 duplicates each write transaction to one or more mirror
d isks (Figure 4-4).
Mirrored systems can be used to improve the IOPS performance of read
operations, because the least busy d rive can be selected to service requests.
Because both drives must be involved in write operations, write performance
may be slightly worse than an independ ent drive.
Figure 4-4 RAID 1, or mirroring, provides d ata redu ndan cy by recording datamu ltiple times on independent spindles.
Level 5 Striping with Interleaved Parity
RAID 5 exploits the p erformance gains of striping w hile mitigating its
vu lnerability to d ata loss by ad ding error correction information (called parity)to the data.
RAID 5 allows independent access to individual drives in a group with striped
da ta. If the d ata is interleaved across the group of drives in blocks equal in size
or larger than most I/ O requests, then drives can respond ind ependently to
requests, and even to m ultiple requests simu ltaneously. This has the effect of
allowing RAID 5 to have very good random read performance. Conversely, if
the block sizes in a stripe are smaller than the average size of an I/ O request,
forcing m ultiple dr ives to wor k together, RAID 5 systems can a lso enjoy goodsequential performance.
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 1
Chunk 2
Chunk 3Chunk 4
Array ManagementSoftware
Chunk 1
Chunk 3
Chunk 4
Chunk 2
Virtual Disk
4
-
8/3/2019 System Tuning Perf Final
32/84
28 Delivering Performance on Sun: System Tuning April 1999
Figure 4-5 RAID 5 interleaves parity and data across the stripe to improve
performance.
With RAID 5, both the parity information and data are interleaved across the
array in a cyclic pattern (Figure 4-5). If one of the disks fail, a simple algorithm
allows the missing d ata to be recovered from the remaining disks. RAID 5 is a
good compromise for those needing good rand om p erformance without
incurring the expense of mirroring.
RAID 5s failing comes from its p oor p erformance on small wr ites. When a
single-block w rite command is issued to a RAID 5 configu ration, the systemmu st first complete four I/ O op erations and two p arity calculations in a read-
modify-write sequence, degrading ran dom write performance.
Cached RA ID 5
Using a n on-volatile cache in th e disk controller, the need to mod ify w rite
operations can be optimized to use full width stripes, resulting in very good
wr ite performance. This is often know n as h ardw are RAID 5 or controller-
based RAID. Sun s StorEd ge A1000, A3000, and A7000 subyst ems p erform high
speed cached RAID 5 operations for write-intensive workloads.
Chunk 1
Chunk 2
Chunk 3
Chunk 4
Chunk 5
Chunk 1
Chunk 2
Chunk 3
Array ManagementSoftware
Physical
Disk 1
PhysicalDisk 2
PhysicalDisk 3
PhysicalDisk 4
P(1-3)
Chunk 6
P(4-6)
Chunk 7
Chunk 8
Chunk 9
P(7-9)
Chunk 10
Chunk 11
Chunk 12
P(10-12)
Virtual Disk
Chunk 4
Chunk 5
Chunk 6
Chunk 7
Chunk 8
Chunk 9Chunk 10
Chunk 11
Chunk 12
4
-
8/3/2019 System Tuning Perf Final
33/84
Tuning Disk Su bsystems 29
RAID 0+1 Striping plus Mirroring
Useful configurations have been created by combining two existing RAID
levels. One of the more popular is known as RAID 0+1, which combines thereliability of mirroring (RAID 1) with the performance of striping (RAID 0)
(Figure 4-6).
The reliability of a striped and mirrored system is excellent because of the h igh
redu nd ancy afforded by m irroring. RAID 0+1 systems can tolerate the failure
of one or more disks, and continue to deliver data with virtually no
performa nce degra dation, u nlike RAID 5. RAID 0+1 systems do carry,
however, the higher cost of mirrored systems, as data requiring protection
needs twice the disk space of simple independent spindles.
Some subsystems also support RAID 1+0, in which data is first mirroed and
then striped. This scheme can withstand more individual disk failures and
man tain d ata av ailability.
Figure 4-6 RAID 0+1 combines m irroring an d str iping to achieve excellent ava ilabilityand performance.
Array ManagementSoftware
Chunk 2
Chunk 1
Chunk 3
Chunk 4
Chunk 1
Chunk 3Chunk 5
Chunk 7
Chunk 2
Chunk 4
Chunk 6
Chunk 8
Array ManagementSoftware
Array ManagementSoftware
Striping
Mirroring
Virtual DisksPhysical Disks
Virtual Disk
Chunk 1
Chunk 3
Chunk 5
Chunk 7
Chunk 2
Chunk 4
Chunk 6
Chunk 8
Striping
Chunk 6
Chunk 5
Chunk 7
Chunk 8Chunk 2
Chunk 1
Chunk 3
Chunk 4
Chunk 6
Chunk 5
Chunk 7
Chunk 8
Chunk 2Chunk 1
Chunk 3
Chunk 4
Chunk 6
Chunk 5
Chunk 7
Chunk 8
4
-
8/3/2019 System Tuning Perf Final
34/84
30 Delivering Performance on Sun: System Tuning April 1999
Optimizing the RA ID Configuration
The various RAID levels have d istinctive strength s, and therefore tend to be
best for certain kinds of applications (Table 4-2). Database management, forexample, requires good rand om p erformance, and indep endent spind les are
the best solution. If reliability is a concern, however, mirroring (RAID 1) can
give excellent performance, but is expensive for large data sets. RAID 5 offers
slightly worse throughput, but with good redundancy and economy. Clearly,
finding the right solution requires a good und erstanding of the I/ O load.
Table 4-2 RAID levels provide varying degrees of availability with correspondingtrade-offs in performance and cost
Interlace Size
The interlace size specifies the chunk sizethe amount of data written to each
disk before moving on to th e next componen t in a stripe. Optimally setting th e
interlace size can h elp ensu re the RAID configuration p erforms w ell for bothread an d w rite operations. RAID 0 and RAID5 configura tions require different
settings for the interlace size, depen ding on the type of I/ O:
In striped configurations experiencing a high volume of rand om access I/ O,
the interlace size shou ld be set large relative to the size of a typical I/ O
request.
In striped configu rations subject to mostly sequential access I/ O, the
interlace size should b e set small relative to the size of the typical I/ O
request. One typ ical strategy to d etermine the interlace size is to divide th e
typical I/ O size by the nu mber of disks in the stripe (Figure 4-7).
Strengths Weaknesses
RAID 0
Striping
Improves I/O performance No redundancy
A single disk failure causes all data tobe lost
RAID 1
Mirroring
Improves data availability
May improve read performance
Expensive
Potentially decreased write
performance
RAID 0+1Striping Plus MIrroring
Improves data availability without
sacrificing performance
Expensive
RAID 5
Striping Plus Distributed Parity
Improves data availability
Error correction information
Lower cost than mirroring
Survives only a single disk failure
Poor small write performance
4
-
8/3/2019 System Tuning Perf Final
35/84
Tuning Disk Su bsystems 31
Figure 4-7 Setting the interlace size in striped configurations fosters higher sequentialperformance
RAID 5 configurations perform best when write requests align with the
stripe and the I/ O is a mu ltiple of the interlace size. Ind eed, full stripe
wr ites can be u tilized if the size of a typical I/ O requ est equals the interlace
size multiplied by one less than the number of components in the stripe
(Figure 4-8).
Figure 4-8 Setting the interlace size in RAID 5 configurations promotes faster writeoperations
More information on configuring and optimizing RAID systems can be found
in Sun Performance and Tuning, Java and the Internet, Chapter 8: Disks, and
Configuration and Capacity Planning for Solaris Servers, Chapter 7: Storage
Subsystem Architecture.
Host-Based RA ID versus Controller-Based RA ID
In recent years, much has been said about the implementation options
available for RAID subsystems, and contention has ar isen over the p erceived
benefits and drawbacks to both host-based RAID and controller-based RAID
devices. While traditional controller-based RAID su bsystems em ploying SCSItechnology were long seen as the answ er to higher performan ce by minimizing
the amou nt of traffic flowing between the host and peripheral, the adv ent of
100 MB/ second Fibre Chann el is bridging th e gap. Utilizing a faster transport
medium like Fibre Channel, the performance of host-based RAID storage
systems compete w ith, and in man y cases surp ass, their controller-based RAID
counterp arts. In add ition, the low er cost and h igher d egree of flexibility offered
with host-based RAID solutions like the Sun StorEdge A5000 ensures that users
have access to the best price/ performan ce configura tions available.
I/O_Size___________________________
Number_of_componentsInterlace_Size =
I/O_Size = Interlace_Size * (Number_of_components - 1)
4
-
8/3/2019 System Tuning Perf Final
36/84
32 Delivering Performance on Sun: System Tuning April 1999
Disk A ctivity and Performance iostat
Two factors that impact overall system throughput are CPU utilization and
disk I/ O activity. If there are insufficient CPU resou rces to process data andexecute tasks, d ata w ill not be stream ed to devices and performa nce will suffer.
In addition, it is possible for disks to become saturated, rendering them
incapable of hand ling higher loads of traffic.
The iostat utility bund led in the Solaris operating en vironmen t helps system
administrators determine disk activity patterns and performance. In order to
iteratively repor t disk I/ O activity as well as CPU u tilization, the Solaris kernel
maintains a series of counters:
The number of reads and writes per disk The number of bytes read and bytes written per d isk
Time stamps
The residence time and cumulative residence length product for each queue
These statistics enable iostat to produce highly accurate measures of
throughput, utilization, queue lengths, transaction rates, and service times
(Figure 4-9).
Figure 4-9 The iostat utility reports I/ O statistics which can be analyzed todetermine CPU an d disk bottlenecks.
More information on iostat can be found in the iostat(1M) man p age.
# iostat -xpextended device statistics
devicedad0dad0,adad0,bdad0,cdad1dad1,adad1,b
dad1,cfd0sd2nfs1
r/s0.00.00.00.00.00.00.0
0.00.00.00.0
w/s0.00.00.00.00.20.20.0
0.00.00.00.0
kr/s0.00.00.00.00.10.10.0
0.00.00.00.0
kw/s0.00.00.00.01.41.40.0
0.00.00.00.0
wait0.00.00.00.00.00.00.0
0.00.00.00.0
actv0.00.00.00.00.00.00.0
0.00.00.00.0
svc_t0.00.00.00.0
61.761.77.9
0.00.00.00.0
%w0000000
0000
%b0000000
0000
4
-
8/3/2019 System Tuning Perf Final
37/84
Tuning Disk Su bsystems 33
Managing Disk Space
Data availability depends on disk drives to be able to read and write data at a
given time. System administrators must ensure that adequate disk and filesystem space exists for users and app lications. Quotas can be u sed to constrain
users from utilizing too much disk space.
Determining Free Disk Blocks and Files df
To determine the number of free disk blocks and files on the disks in the
system, administrators can employ the df command included in the Solaris
operating environment. The df comm and displays the amount of disk space
occupied by currently mounted file systems, the amount of used and available
space, as well as the amou nt of each file systems capacity that has been u sed
(Figure 4-10).
Figure 4-10 The df utility reports free disk blocks and files, enabling administrators toquickly isolate capacity problems that can lead to decreased performanceand availability.
More information on df can be found in the df(1M) man p age.
# df -ak
Filesystem/proc
/dev/dsk/c0t1d0s0fdswapauto.sharedauto.scdeauto.src-hostsauto.doeauto.wsauto_home
kbytes0
387574206378320000000
used0
8164090553120000000
avail0
30205760
5825200000000
capacity0%
22%0%9%0%0%0%0%0%0%0%
Mounted on/proc
//dev/fd/tmp/shared/scde/src/net/doews
/home-xfnnevermore:vold(pid236)
00
00
00
0%0%
/xfn/vol
4
-
8/3/2019 System Tuning Perf Final
38/84
34 Delivering Performance on Sun: System Tuning April 1999
Determining Disk Usage du
The ability to ensure that disks do not completely fill aids both performance
and availability. System ad ministrators can obtain a q uick gauge of d isk usage
by executing the du command. The du command recursively describes the
number of kilobytes contained in all files and directories (Figure 4-11). The size
of the file space allocated to a d irectory is reported as the su m of the total sp ace
allocated to the files contained in the d irectory, as w ell as the space allocated to
the d irectory itself. All files sizes are reported in, and roun ded to, the nearest
512 byte unit.
Figure 4-11 The du utility describes the amount of disk space allocated to files anddirectories.
More information on the du comm and can be found in the du(1M) man page.
System Activity Reporter sar
The Solaris operating environm ent includes a system activity reporter (sar)
that sam ples cumu lative activity coun ters at specified intervals. The sar
command can also be used to determine storage related information, including:
Disk activity levels for system disks (-d option)
Buffer cache activity (-b option)
File access system rou tine usag e (-a option)
Process status (-v option)
Average queue length (-q option)
More information on the sar comm and can be found in the sar(1M) man
page.
# du -as /tmp
306128 /tmp
4
-
8/3/2019 System Tuning Perf Final
39/84
Tuning Disk Su bsystems 35
Volume M anagement
The combination of disk drives and intelligent software to manage them is
common ly referred to as a storage management system. With increasing n um bersof disks and a higher d eman d for availability in mission-critical environmen ts,
storage management systems h ave a nu mber of requirements:
Reliability
Organizations need assurance that the d ata they d epend on is accurate.
Storage management systems must implement the features necessary to
ensure the accurate storage and retrieval of data.
Data and system availability
Many org anizations run 7x24. In these environmen ts, it is imperative for
data to be available on demand. For many, unavailable data can mean a
significant loss in income. Wh en a system fails, data mu st be accessible to an
alternate host. In the event of a d isk failure, some means m ust exist for the
data stored on that disk to be made available to users.
Performance
Storage management systems mu st provide the performance required bytodays d emand ing app lications and environments.
Management of large amounts of data
Users are generating more data than ever before. Organizations need a
method for organizing disks and the d ata on th em such that d ata reliability
and availability are maintained without sacrificing performance.
Simplified administration
Managing a large nu mber of disks can be dau nting and challenging. Storage
management systems must provide tools to ease this task.
Application integration
A storage management system must fit seamlessly into existing customer
server, database and OLTP environments.
Sun p rovides tw o tools to aid th ese efforts: the Sun StorEdge Volume Manager,
and RAID Manager. More information on Sun volume management solutions
can be found at http://www.sun.com/storage/products.html on the World Wide
Web.
4
-
8/3/2019 System Tuning Perf Final
40/84
36 Delivering Performance on Sun: System Tuning April 1999
-
8/3/2019 System Tuning Perf Final
41/84
37
File System Tuning 5
The Solaris operating environment supports several file system types, each
designed to provide high performance to users. These file systems include the
stand ard Unix File System (UFS), a RAM disk file system (Tmpfs), and th e
VERITAS file system (VxFS) discussed below. Two additional file systems,
CacheFS and NFS, are geared toward enhancing the performance of network
operations and are discussed in chapter 7 of this document.
Unix File System
The Unix File System (UFS) is the stand ard file system used to man age d isk
dr ives in the Solaris operating environm ent. Based on the Berkeley Fast File
System, UFS is designed for high performan ce. Its tree-like structure, ma naged
by a structure know n as an inode, enables fast access to directories and files.
With UFS, all files that are read are stored in RAM. If RAM becomes full, files
are flushed to disk and retrieved w hen needed . Consequently, the more RAMin the system the better overall file system performance.
UFS and Transaction Logging
When u nschedu led system dow ntime occurs, such as a system p anic or power
failure, partially comp leted up dates to th e file system are possible. If partial
up da tes exist, the file system can be left in an inconsistent state. To fix these
inconsistencies, UFS file systems are checked at boot time (with fsck an d
quotacheck) and repaired, if possible. Because mu ltiple and complex
5
-
8/3/2019 System Tuning Perf Final
42/84
38 Delivering Performance on Sun: System Tuning April 1999
directory hierarchies often exist, and because fsck mu st traverse each an d
every h ierarchy to ensure its integrity an d consistency, the fsck process is
often time consuming.
The combination of Solaris 2.4 and the Solstice DiskSuite storage man agementtool offers a new feature called UFS Logging, sometimes referred to as
journ alling. UFS logging tak es the (logically) synchronou s natu re of up dating a
file system and makes it asynchronous. Updates to a file system are not made
directly to the disk or partition containing the file system. Instead, user data is
written to the device containing the file system, but the file system disk
structures are not modifiedthey are logged instead.
Updates are applied to the file system structure at key times to ensure
consistency. Changes m ad e to the file system by u nfin ished system calls arediscarded , ensuring th at the file system is in a consistent state. This mean s that
logging file systems do not have to be checked at boot time, speeding the
reboot p rocess.
System administrators should consider the following when configuring UFS
file systems with transaction logging:
The absolute m inimum size for a transaction log is 1 MB of disk spa ce. A
recommend ed minimu m is 1 MB of log space for every 1 GB of file systemspace.
The larger the log, the more simultan eous file system operations p er second .
As a general ru le, ad ministrators shou ld p lan on 1 MB of log space for every
100 MB of file system space being logged.
Small file systems, or p rimarily read-intensive file systems, will not benefit
from tran saction logging. H ow ever, file systems experiencing a significant
num ber of up dates should be logged.
More information about UFS transaction logging can be found in the Solstice
DiskSuite Users Guide, and Solstice DiskSuite Reference Guide available at
http://www.docs.sun.com on the World Wide Web.
Configuring UFS File Systems for M aximum Performance
The advent of very large disk drives and RAID subsystems is leading to ever
increasing file system sizes. Users of large file systems n eed to reset a nu mber
of UFS para meters to ensure efficient disk utilization and I/ O perform ance:
5
-
8/3/2019 System Tuning Perf Final
43/84
File Sy stem Tuning 39
inode density
When U FS wa s introdu ced, most files were very sm all, averaging on ly 1 to 2
KB in size. Today, files are approaching 64 KB or larger. To ensure the file
system u tilizes space efficiently, system ad ministrators shou ld set the inodeden sity to one inod e per 8 KB of file space.
minfree
Disk have grown tremen dou sly since the minfree=10% rule w as set back in
1980. At that time, 10 percent of a disk ranged from 2.5 MB to 5 MB. Today,
10 percent of a typical disk is several hund red m egabytes. For large file
systems, or those using d isk drives larger than 2 GB, minfree should be set
to 1 percent to reduce file system overhead and wasted disk space. On largefile systems, minfree is autom atically redu ced (Solaris 2.6 and later).
maxcontig
The file system cluster size, or maxcontig, specifies the m aximu m num ber
of blocks belonging to a single file that can be allocated contiguously before
inserting a rotational delay of the disk. Performa nce may be imp roved if the
file system I/ O cluster size is an integral of the stripe w idth of file systems
on RAID 0 or RAID 5 volum es.
cylinders per cylinder group
Volumes an d file systems that u tilize more than 8 GB of disk space may
need to increase the size of a cylinder group from 16 cylinders p er cylind er
group to as many as 256 cylinders per cylinder group.
More information on configuring UFS file systems can be found in...?
TmpfsThe Solaris operating en vironmen t includ e sup port for Tmpfs, a temp orary file
system that utilizes memory to cache writes scheduled to go to disk. System
administrators can improve system performance by making extensive use of
Tmpfs to store shor t-lived files. This schem e enables temp orary files to be
written and accessed without the typical performance impact associated with
writing information to disk.
5
-
8/3/2019 System Tuning Perf Final
44/84
40 Delivering Performance on Sun: System Tuning April 1999
A system is a candid ate for Tmpfs usage if:
If it ru ns a pp lications th at create temp orary, short-lived files. (Systems th at
run compilers are a good example of this type of environment.)
It has a lot of memory and swap space.
It employs slow disks.
Since many ap plications already emp loy the /tmp directory on Solaris systems
for temporary files, it is a natural candidate for Tmpfs. System administrators
should note, however, that any directory can utilize Tmpfs.
More information on Tmpfs can be found in...?
VERITAS File System
The VERITAS File System is a high-performance, fast recovery, enterprise file
system. Typically emp loyed w ith very large ma ss storage subsystems from
Sun, the VERITAS File System offers a host of features and benefits to users,
including:
Enterprise file system
The VERITAS File System is d esigned as an enterpr ise file system, provid ing
superior file system performance in certain applications and increased
availability over th e stand ard UN IX file system (UFS).
Availability
High availability is an essential requirement of enterprises run ning m ission-
critical applications. The VERITAS File System provides fast recovery after
system crashes, decreasing system down-time and increasing data and
app lication availability. Adm inistrative op erations such as file systembackup, resizing, and defragmentation can occur on-line. Administrators no
longer need to take the file system off-line to perform th ese common
operations, increasing file system availability and making administration
more convenient.
High performance
Using a more efficient extent-based a llocation schem e, hashed directories, as
well as layou t and caching mod ifications, the VERITAS File System p rovides
the high I/ O performance demanded by large enterprise databases and
other applications.
5
-
8/3/2019 System Tuning Perf Final
45/84
File Sy stem Tuning 41
Extent -Based A llocation
The VERITAS File System allocates space in contiguous segments, or extents,
instead of sm all fixed-size disk blocks. This can accelerate sequen tial I/ O by
reducing seek and disk rotation times. Performance is also improved as fewer
I/ O operations are needed to read and write large amounts of data.
Extent sizes can be chosen automatically by the VERITAS File System based on
the I/ O pa ttern of the file, or can be explicitly selected by th e adm inistrator.
More information on the VERITAS File System can be found at
http://www.sun.com/storage/software/veritas.html on the World Wide Web.
Tuning File Systemstunefs
The layout of a file system can have a d ram atic affect on I/ O perform ance.
When a file system is over 90 percent full, system ad ministrators shou ld
optimize it for time, ensuring higher throughput. The tunefs utility is
designed to change the dynamic parameters of a file system than affect layout
policies.
More information on tunefs can be found in the tunefs(1M) man p age.
5
-
8/3/2019 System Tuning Perf Final
46/84
42 Delivering Performance on Sun: System Tuning April 1999
-
8/3/2019 System Tuning Perf Final
47/84
43
Networking 6
The last decade has seen a ten-thousand fold increase in computer
performa nce. This increase in pow er, along with ad vances in operating systems
and graphical desktop environments, has spurred the d evelopment of an array
of new software tools and technologies. These new capabilities in softwa re
have in turn pushed computing resources to their limit, causing users to
deman d h igher performance hardware. Networks are developing through a
similar technology and demand-driven cycle.
Networks must not only keep pace with technological advancements, they
must ensure that sufficient resources are available to meet the demands of
increasingly soph isticated ap plications. To meet th ese needs, netw orks m ust
perform well and respond to client requests as needs dictate.
When a netw ork server is un able to respond to a clients request, the client
typically retransm its the request a n um ber of times. Each retransmission
induces additional system overhead and generates more network traffic.
System a dm inistrators can mitigate the extent of excessive retransmissions by
improv ing three key cond itions: data integrity, system performa nce, and
network congestion.
6
Selecting the Right Network Technology
-
8/3/2019 System Tuning Perf Final
48/84
44 Delivering Performance on Sun: System Tuning April 1999
Selecting the Right Network Technology
Local Area Networks
Local Area Networks, or LANs, provide a means to connect systems within a
limited distance, facilitating information an d resou rce sharing in a mu ltivend or
environm ent. While adequ ate for small, centralized en vironmen ts, existing
LAN techn ologies are qu ickly becoming a bottleneck in larger d istributed
computing environments.
In the futu re, with resources as likely to be remote as local, networ ks mu st be
able to accommodate the rapid movement of large amounts of data. With
escalating needs for collaboration, organizations are demanding advancedfunctionality, such a s video mu lticasting, time-synchronou s ap plications, and
integration with wide area networks. Enterprises require high speed networks
(100 Megabits/ second or abov e) but will not accept expen sive chan ges to their
existing hardware, software and training. To keep up with the expanding
demands of computing, Sun believes that three technologies have promise:
Ethernet, FDDI, and ATM.
Ethernet
Conventional Ethernet was designed when it was ambitious for an affordable
departmental system to deliver 1 SPECmark. Provided as a standard feature or
low-cost option with nearly every w orkstation, 10 Megabit/ second Ethernet
(Ethernet or Ethernet 10) is a u biquitous, inexpensive alternative w hich is
simple to d eploy. Its pop ularity has m ade p ossible the economies of scale that
come from volume manu facturing, which in turn has mad e Ethernet
inexpensive.
Althoug h it h as extensive capabilities, such a s ease of connectivity, and a stable
installed base, Ethernet is inadequate for todays ambitious requirements
(Figure 7-2). Its inability to scale to more th an a few h un dred users, or easily
accommodate bandwidth-intensive applications makes it difficult to use in
many environments.
Because Ethernet is collision-based, on ly one d evice can tra nsmit a t a given
time. Should two devices attempt to transmit simultaneously, their data
packets w ill collide, resulting in lower aggregate th rough pu t. A 10BaseTEthernet network is considered saturated at 35percent utilization.
6
Consequently approximately 500 KB of traffic can saturate a 10BaseT LAN
-
8/3/2019 System Tuning Perf Final
49/84
Networking 45
Consequently, approximately 500 KB of traffic can saturate a 10BaseT LAN.
The use of switches and routers can minimize collisions and improve LAN
performance by breaking the network into pieces and isolating traffic to
smaller segments.
Figure 7-1 Today, networking technology is characterized by inadequate performance,complex interconnections, and multiple protocols
Fiber Data Distributed Interface
Primarily used to satisfy the needs of data communication users, the Fiber
Data Distributed Interface (FDDI) has become a standard for networks
requiring 100 Megabit/ second transm ission speed s. Providing m edia access
control, data encoding and decoding, as well as defining drivers for fiber-optic
components and a multilayered network management scheme, FDDI often is
used as the backbone for lower speed networks such as Ethernet 10.
Although a good investment when speed is required, FDDI has limitations. It
is more expensive to deploy than Ethernet, it offers limited bandwidth, and is
not as popular, and therefore is more expensive than other networking
technologies. Neverth eless, for u sers that requ ire standard ization, a high
degree of interop erability, flexibility an d robustness, FDDI is one of th e most
mature high speed network technologies available today.
CampusB
ackbone
10 Mbps
Bridge/Router
Server
10BaseTHub
10-20 Workstations in a Workgroup
n x 10 Mbps
10 Mbps
6
Fast Ethernet
-
8/3/2019 System Tuning Perf Final
50/84
46 Delivering Performance on Sun: System Tuning April 1999
Fast Ethernet
For users needing a low er-cost alternative to FDDI, Fast Ethernet (100
Megabit/ second Ethernet) may be the networ king technology of choice. Fast
Ethernet is an extension to the Ethernet 10 standard, supporting a wide range
of applications with increased bandwidth. Fast Ethernet maintains
compatibility with the installed base of wiring currently employed for
Ethernet. It has the ability to au tomatically sense 10 Megabits/ second or 100
Megabits/ second op eration and can adjust to either speed. Fast Ethernet
ensures maximum compatibility with existing environments and applications,
providing an investment-preserving migration path from conventional
Ethernet.
Gigabit Ethernet
The transition of faster interfaces, such as Fast Ethernet, to a wid espread
desktop technology had created another p roblem for netw ork ad ministrators.
As the number of devices entering the network at 100 Mbps grows, the
demand for higher speed at the backbone and server level also increases.
Organizations are finding they need to increase the level of network
performance to support computing power demands.
Expected to become the solution of choice for netw ork ad ministrators, Gigabit
Ethernet provides a raw data bandwidth of 1000 Mbps (or 1 Gigabit per
second) that aims to help the increased network congestion experienced at the
backbone and server levels. Not just a high performance networking
technology, Gigabit Ethernet also gives ad ministrators the flexibility th ey need
new full-duplex operating modes for switch-to-switch and switch-to-server
connections, and half-duplex modes for shared connections using repeaters
and the carrier sense multiple/ collision detection (CSMA/ CD) access method .
More information on Gigabit Ethernet and its application can be found in the
Sun and Gigabit Ethernetwhite paper located on the World Wide Web at
http://www.sun.com/products-n-solutions/hw/networking/whitepapers/gigabitwp.html .
Asyn chronous Transfer Mode
Perhaps the m ost promising n ew network technology is Asynchronous
Transfer Mode (ATM). Based on cell switching technology, ATM uses small,
fixed-sized cells and inexpensive, high-band wid th, low-latency switches. With
6
da ta rates from 51 Megabits per second to over 2 Gigabits per second, ATM
-
8/3/2019 System Tuning Perf Final
51/84
Networking 47
g p g p
offers the performance needed by the most demanding interactive applications
(Figure 7-2).
Figure 7-2 Advanced networking technologies have performance far exceeding that ofEthernet, the most commonly used network today
ATM is independent of the speed of the line on which it runs, making it
adaptable to most wiring technologies. ATM can run full duplex, permitting
the simultaneous transmission and receipt of data without interruption. Its
flexibility and low latency allows real-time ap plications like video a nd aud io to
be supp orted along w ith data services.
A scalable technology offering a deterministic quality of service, ATM allows
the n etwork to be tailored to th e ap plication. A global, mobile, scalable, and
app lication tran sparent p rotocol, ATM allow s the m ixture of d ata, voice and
video. With its increased performance and ability to support advanced
app lications, ATM holds the p romise of being th e prem ier network solution for
the rest of the decad e and beyond (Figure 7-4).
TransferRate(Mbits/sec)
Ethernet Fast Ethernet FDDI1
10
100
2500
ATM
10 Mbps
100 Mbps 51 Mbps to 2.5 Gbps100 Mbps
6
-
8/3/2019 System Tuning Perf Final
52/84
48 Delivering Performance on Sun: System Tuning April 1999
Figure 7-3 ATM has many characteristics that make it the preferred solution for mostfuture networking requirements
Wide Area Networks
Wide Area N etworks, or WANs, provide a m eans to connect systems which are
distributed over a large geograp hic area, facilitating informa tion and resource
sharing. WAN s mu st be low-cost, high sp eed, and easily integrated w ith LANs.
Three technologies are emerging as leading WAN contenders: Frame Relay,
Integrated Services Digital Network (ISDN) and A synchronou s Transfer Mod e
(ATM).
Frame Relay
Frame Relay is emp loyed to efficiently transfer information w ith minim al
delay. Designed to hand le data commu nication w ith large, bursty d ata packets,
it uses a transmission channel only when there is data present. This offers
significant advantages over LAN technologies that tie up lines for the entire
du ration of a session. Frame Relays ability to efficiently p rovide su ch on-
demand transmission has garnered considerable attention. However, its
inability to provid e consistent p erformance, its imp recise specifications for
Data segmented in fixed 53-byte long cells
Circuit-oriented for easy routing
Transparent to data type
Multimedia Support
Real-time capable
Low latency
Dynamic bandwidth
Star topology
Full bandwidth to each user
Easy network maintenance
Seamless integration between local
and wide-area networks
ATM Scalable Bandwidth Large volume Low cost
Wide vendor acceptance
6
congestion avoidan ce and recovery m echanisms, and its inefficiency in
-
8/3/2019 System Tuning Perf Final
53/84
Networking 49
transmitting voice and video, preclude Frame Relay from being a multimedia
WAN solution.
Integrated Services Digital Network
Integrated Services Digital Network (ISDN) emerged through the need for a
single integrated stru cture that efficiently accommod ated a variety of services.
The most widely used network technology, ISDN has addressed the growing
need s of digital data comm un ications in a var iety of ind ustries. The ability to
transport information digitally allows a single line to transmit up to 1.5
Megabits per second while providing superior quality of service over analog
tr