Vmware 12

412
Managing Serviceguard A.11.20 HP Part Number: 689162-001 Published: March 2012

description

Vmware 12

Transcript of Vmware 12

Managing Serviceguard A.1 1.20

HP Part Number: 689162-001 Published: March 2012

Legal Notices Copyright 1995-2012 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.21 and 12.212, Commercial 1 Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendors standard commercial license. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Java is a U.S. trademark of Oracle Corporation. Linux is a U.S. registered trademark of Linus Torvalds. MS-DOS, Microsoft, Windows, Windows NT, and Windows XP are U.S. registered trademarks of Microsoft Corporation. Oracle is a registered trademark of Oracle Corporation. UNIX is a registered trademark in the United States and other countries, licensed exclusively through The Open Group.

ContentsPublishing History .......................................................................................18 Preface......................................................................................................19 1 Serviceguard at a Glance.........................................................................21What is Serviceguard? ...........................................................................................................21 Failover............................................................................................................................22 About Veritas CFS and CVM from Symantec..........................................................................23 Using Serviceguard Manager..................................................................................................24 Monitoring Clusters with Serviceguard Manager....................................................................24 Administering Clusters with Serviceguard Manager................................................................24 Configuring Clusters with Serviceguard Manager...................................................................24 Starting Serviceguard Manager...........................................................................................24 Using SAM............................................................................................................................24 What are the Distributed Systems Administration Utilities?............................................................25 A Roadmap for Configuring Clusters and Packages ....................................................................25

2 Understanding Serviceguard Hardware Configurations..................................27Redundancy of Cluster Components..........................................................................................27 Redundant Network Components .............................................................................................28 Rules and Restrictions.........................................................................................................28 Redundant Ethernet Configuration .......................................................................................29 Cross-Subnet Configurations................................................................................................29 Configuration Tasks.......................................................................................................30 Restrictions...................................................................................................................30 For More Information.....................................................................................................31 Replacing Failed Network Cards..........................................................................................31 Redundant Disk Storage..........................................................................................................32 Supported Disk Interfaces....................................................................................................32 Using iSCSI LUNs as Shared Storage...............................................................................32 Data Protection .................................................................................................................33 Disk Mirroring .............................................................................................................33 Disk Arrays using RAID Levels and Multiple Data Paths ......................................................33 About Multipathing...................................................................................................33 Monitoring LVM Disks Through Generic Resources Monitoring Service......................................34 Monitoring LVM Disks Through Event Monitoring Service.........................................................34 Monitoring VxVM and CVM Disks.......................................................................................34 Replacing Failed Disk Mechanisms ......................................................................................34 Replacing Failed I/O Cards................................................................................................34 Sample SCSI Disk Configurations ........................................................................................35 Sample Fibre Channel Disk Configuration ............................................................................36 Redundant Power Supplies ......................................................................................................37 Larger Clusters ......................................................................................................................37 Active/Standby Model ......................................................................................................38 Point to Point Connections to Storage Devices .......................................................................38

3 Understanding Serviceguard Software Components......................................40Serviceguard Architecture........................................................................................................40 Serviceguard Daemons.......................................................................................................40 Configuration Daemon: cmclconfd...................................................................................41 Cluster Daemon: cmcld..................................................................................................41 File Management Daemon: cmfileassistd..........................................................................42 Syslog Log Daemon: cmlogd..........................................................................................42Contents 3

Cluster Logical Volume Manager Daemon: cmlvmd...........................................................42 Persistent Reservation Daemon: cmprd..............................................................................42 Cluster Object Manager Daemon: cmomd.......................................................................42 Cluster SNMP Agent Daemon: cmsnmpd..........................................................................42 Generic Resource Assistant Daemon: cmresourced.............................................................43 Service Assistant Daemon: cmserviced.............................................................................43 Quorum Server Daemon: qs...........................................................................................43 Network Manager Daemon: cmnetd................................................................................43 Lock LUN Daemon: cmdisklockd......................................................................................43 Utility Daemon: cmlockd................................................................................................43 Cluster WBEM Agent Daemon: cmwbemd........................................................................44 Proxy Daemon: cmproxyd..............................................................................................44 CFS Components..........................................................................................................44 How the Cluster Manager Works .............................................................................................44 Configuring the Cluster ......................................................................................................45 Heartbeat Messages .........................................................................................................45 Manual Startup of Entire Cluster ..........................................................................................46 Automatic Cluster Startup ...................................................................................................46 Dynamic Cluster Re-formation .............................................................................................46 Cluster Quorum to Prevent Split-Brain Syndrome.....................................................................46 Cluster Lock ......................................................................................................................47 Lock Requirements.........................................................................................................47 Use of a Lock LUN or LVM Lock Disk as the Cluster Lock..........................................................47 Single Lock Disk or LUN.................................................................................................48 Dual Lock Disk..............................................................................................................48 Use of the Quorum Server as the Cluster Lock........................................................................49 No Cluster Lock ................................................................................................................50 What Happens when You Change the Quorum Configuration Online.......................................50 How the Package Manager Works...........................................................................................50 Package Types...................................................................................................................51 Non-failover Packages...................................................................................................51 Failover Packages..........................................................................................................51 Configuring Failover Packages ..................................................................................51 Deciding When and Where to Run and Halt Failover Packages .....................................52 Failover Packages Switching Behavior.........................................................................52 Failover Policy..........................................................................................................54 Automatic Rotating Standby.......................................................................................54 Failback Policy.........................................................................................................56 Using Older Package Configuration Files..............................................................................58 Using the Generic Resources Monitoring Service....................................................................58 Using the Event Monitoring Service .....................................................................................60 Using the EMS HA Monitors................................................................................................60 How Packages Run.................................................................................................................61 What Makes a Package Run?..............................................................................................61 Before the Control Script Starts.............................................................................................62 During Run Script Execution.................................................................................................63 Normal and Abnormal Exits from the Run Script.....................................................................64 Service Startup with cmrunserv............................................................................................64 While Services are Running................................................................................................64 When a Service, Subnet, or Monitored Resource Fails, or a Dependency is Not Met..................65 When a Package is Halted with a Command........................................................................65 During Halt Script Execution................................................................................................65 Normal and Abnormal Exits from the Halt Script....................................................................67 Package Control Script Error and Exit Conditions...............................................................67 How the Network Manager Works ..........................................................................................684 Contents

Stationary and Relocatable IP Addresses ..............................................................................68 Types of IP Addresses....................................................................................................69 Adding and Deleting Relocatable IP Addresses .....................................................................69 Load Sharing ...............................................................................................................70 Monitoring LAN Interfaces and Detecting Failure: Link Level.....................................................70 Local Switching ............................................................................................................71 Switching Back to Primary LAN Interfaces after Local Switching...........................................73 Remote Switching .........................................................................................................74 Address Resolution Messages after Switching on the Same Subnet.......................................74 Monitoring LAN Interfaces and Detecting Failure: IP Level.......................................................74 Reasons To Use IP Monitoring.........................................................................................75 How the IP Monitor Works.............................................................................................75 Failure and Recovery Detection Times..............................................................................76 Constraints and Limitations.............................................................................................77 Reporting Link-Level and IP-Level Failures................................................................................77 Example 1: If Local Switching is Configured......................................................................77 Example 2: If There Is No Local Switching........................................................................78 Automatic Port Aggregation................................................................................................78 VLAN Configurations..........................................................................................................79 What is VLAN?............................................................................................................79 Support for HP-UX VLAN................................................................................................79 Configuration Restrictions...............................................................................................80 Additional Heartbeat Requirements.................................................................................80 Volume Managers for Data Storage..........................................................................................80 Types of Redundant Storage................................................................................................80 About Device File Names (Device Special Files).....................................................................81 Examples of Mirrored Storage.............................................................................................81 Examples of Storage on Disk Arrays.....................................................................................83 Types of Volume Manager..................................................................................................85 HP-UX Logical Volume Manager (LVM)..................................................................................85 Veritas Volume Manager (VxVM).........................................................................................85 Propagation of Disk Groups in VxVM..............................................................................85 Package Startup Time with VxVM....................................................................................85 Veritas Cluster Volume Manager (CVM)................................................................................86 Cluster Startup Time with CVM........................................................................................86 Propagation of Disk Groups with CVM.............................................................................86 Redundant Heartbeat Subnets.........................................................................................87 Comparison of Volume Managers........................................................................................87 iSCSI Storage and Persistent Reservations...................................................................................89 Rules and Limitations..........................................................................................................90 How Persistent Reservations Work........................................................................................90 Applying a Package......................................................................................................91 Running a Package........................................................................................................92 Distributing Persistent Reservation Keys on Extended Volume Groups..............................................92 Example...........................................................................................................................92 Responses to Failures..............................................................................................................93 System Reset When a Node Fails ........................................................................................93 What Happens when a Node Times Out.........................................................................93 Example..................................................................................................................94 Responses to Hardware Failures...........................................................................................94 Responses to Package and Service Failures ...........................................................................95 Responses to Package and Generic Resources Failures............................................................95 Service Restarts .................................................................................................................96 Network Communication Failure .........................................................................................96

Contents

5

4 Planning and Documenting an HA Cluster ..................................................97General Planning ..................................................................................................................97 Serviceguard Memory Requirements.....................................................................................97 Planning for Expansion ......................................................................................................97 Hardware Planning ................................................................................................................98 SPU Information ................................................................................................................99 Network Information .........................................................................................................99 LAN Information ..........................................................................................................99 Setting SCSI Addresses for the Largest Expected Cluster Size ................................................100 Disk I/O Information .......................................................................................................101 Hardware Configuration Worksheet ..................................................................................102 Power Supply Planning .........................................................................................................102 Power Supply Configuration Worksheet .............................................................................103 Cluster Lock Planning............................................................................................................103 Cluster Lock Disk and Re-formation Time .............................................................................104 Planning for Expansion.....................................................................................................104 Using a Quorum Server....................................................................................................104 Quorum Server Worksheet ..........................................................................................104 LVM Planning ......................................................................................................................105 Using Generic Resources to Monitor Volume Groups.............................................................106 Using EMS to Monitor Volume Groups................................................................................106 LVM Worksheet ..............................................................................................................107 CVM and VxVM Planning ....................................................................................................107 CVM and VxVM Worksheet .............................................................................................108 Cluster Configuration Planning ..............................................................................................108 About Cluster-wide Device Special Files (cDSFs)...................................................................109 Points To Note............................................................................................................109 Where cDSFs Reside...................................................................................................109 Limitations of cDSFs................................................................................................110 LVM Commands and cDSFs.....................................................................................110 About Easy Deployment....................................................................................................110 Advantages of Easy Deployment...................................................................................110 Limitations of Easy Deployment......................................................................................111 Heartbeat Subnet and Cluster Re-formation Time .................................................................111 About Hostname Address Families: IPv4-Only, IPv6-Only, and Mixed Mode............................111 What Is IPv4only Mode?............................................................................................112 What Is IPv6-Only Mode?............................................................................................112 Rules and Restrictions for IPv6-Only Mode.................................................................112 Recommendations for IPv6-Only Mode......................................................................113 What Is Mixed Mode?.................................................................................................113 Rules and Restrictions for Mixed Mode......................................................................114 Cluster Configuration Parameters .......................................................................................114 Cluster Configuration: Next Step........................................................................................130 Package Configuration Planning ............................................................................................130 Logical Volume and File System Planning ...........................................................................130 Planning Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS)........................131 CVM 4.1 and later without CFS.....................................................................................131 CVM 4.1 and later with CFS ........................................................................................132 About the Volume Monitor................................................................................................133 Using the Volume Monitor............................................................................................133 Command Syntax...................................................................................................134 Examples..............................................................................................................134 Scope of Monitoring...............................................................................................135 Planning for NFS-mounted File Systems...............................................................................1356 Contents

Planning for Expansion.....................................................................................................136 Choosing Switching and Failover Behavior..........................................................................137 Parameters for Configuring Generic Resources.....................................................................138 Configuring a Generic Resource........................................................................................138 Getting and Setting the Status/Value of a Simple/Extended Generic Resource....................140 Using Serviceguard Command to Get the Status/Value of a Simple/Extended Generic Resource...............................................................................................................140 Using Serviceguard Command to Set the Status/Value of a Simple/Extended Generic Resource...............................................................................................................140 Online Reconfiguration of Generic Resources..................................................................141 Parameters for Configuring EMS Resources..........................................................................141 About Package Dependencies...........................................................................................142 Simple Dependencies..................................................................................................142 Rules for Simple Dependencies.................................................................................142 Guidelines for Simple Dependencies.........................................................................146 Extended Dependencies...............................................................................................146 Rules for Exclusionary Dependencies.........................................................................147 Rules for different_node and any_node Dependencies.................................................147 What Happens when a Package Fails............................................................................148 For More Information...................................................................................................149 About Package Weights....................................................................................................149 Package Weights and Node Capacities.........................................................................149 Configuring Weights and Capacities.............................................................................149 Simple Method...........................................................................................................149 Example 1.............................................................................................................150 Points to Keep in Mind............................................................................................151 Comprehensive Method...............................................................................................151 Defining Capacities................................................................................................151 Defining Weights...................................................................................................152 Rules and Guidelines...................................................................................................154 For More Information...................................................................................................155 How Package Weights Interact with Package Priorities and Dependencies...........................155 Example 1.............................................................................................................155 Example 2.............................................................................................................156 About External Scripts......................................................................................................156 Using Serviceguard Commands in an External Script.......................................................158 Determining Why a Package Has Shut Down..................................................................158 last_halt_failed.......................................................................................................159 About Cross-Subnet Failover..............................................................................................159 Implications for Application Deployment.........................................................................160 Configuring a Package to Fail Over across Subnets: Example............................................160 Configuring node_name..........................................................................................160 Configuring monitored_subnet_access.......................................................................161 Configuring ip_subnet_node....................................................................................161 Configuring a Package: Next Steps....................................................................................161 Planning for Changes in Cluster Size.......................................................................................161

5 Building an HA Cluster Configuration........................................................162Preparing Your Systems ........................................................................................................162 Installing and Updating Serviceguard ................................................................................162 Where Serviceguard Files Are Kept....................................................................................162 Creating Cluster-wide Device Special Files (cDSFs)................................................................163 Before You Start..........................................................................................................163 Creating cDSFs for a Group of Nodes...........................................................................163 Adding a Node to a cDSF Group.................................................................................164Contents 7

Removing a Node from a cDSF Group...........................................................................165 Displaying the cDSF Configuration................................................................................165 Migrating Existing LVM Cluster Storage to cDSFs.............................................................165 Using Easy Deployment....................................................................................................165 Before You Start..........................................................................................................165 Using Easy Deployment Commands to Configure the Cluster.............................................166 Configuring Root-Level Access............................................................................................170 Allowing Root Access to an Unconfigured Node.............................................................170 Ensuring that the Root User on Another Node Is Recognized.............................................171 About identd..........................................................................................................171 Configuring Name Resolution............................................................................................172 Safeguarding against Loss of Name Resolution Services...................................................173 Ensuring Consistency of Kernel Configuration .....................................................................174 Enabling the Network Time Protocol ..................................................................................174 Tuning Network and Kernel Parameters...............................................................................175 Creating Mirrors of Root Logical Volumes............................................................................176 Choosing Cluster Lock Disks..............................................................................................177 Backing Up Cluster Lock Disk Information ......................................................................177 Setting Up a Lock LUN......................................................................................................177 Creating a Disk Partition on an HP Integrity System..........................................................178 Defining the Lock LUN.................................................................................................179 Excluding Devices from Probing.........................................................................................179 Setting Up and Running the Quorum Server........................................................................180 Creating the Storage Infrastructure and file systems with LVM, VxVM and CVM........................180 Creating a Storage Infrastructure with LVM..........................................................................180 Using the Generic Resources Disk Monitor......................................................................181 Using the EMS Disk Monitor.........................................................................................182 Using Mirrored Individual Data Disks.............................................................................182 Creating Volume Groups.........................................................................................182 Creating Logical Volumes.............................................................................................183 Setting Logical Volume Timeouts...............................................................................183 Creating File Systems...................................................................................................184 Distributing Volume Groups to Other Nodes...................................................................184 Deactivating the Volume Group................................................................................184 Distributing the Volume Group..................................................................................185 Making Physical Volume Group Files Consistent..............................................................186 Creating Additional Volume Groups..............................................................................186 Creating a Storage Infrastructure with VxVM.......................................................................187 Converting Disks from LVM to VxVM..............................................................................187 Initializing Disks for VxVM...........................................................................................187 Initializing Disks Previously Used by LVM........................................................................187 Creating Disk Groups..................................................................................................188 Creating Logical Volumes.............................................................................................188 Creating File Systems...................................................................................................188 Deporting Disk Groups................................................................................................189 Re-Importing Disk Groups.............................................................................................189 Clearimport at System Reboot Time................................................................................189 Configuring the Cluster .........................................................................................................189 cmquerycl Options...........................................................................................................190 Speeding up the Process..............................................................................................190 Specifying the Address Family for the Cluster Hostnames..................................................190 Specifying the Address Family for the Heartbeat .............................................................191 Specifying the Cluster Lock...........................................................................................191 Generating a Network Template File..............................................................................191 Full Network Probing...................................................................................................1928 Contents

Specifying a Lock Disk......................................................................................................192 Specifying a Lock LUN......................................................................................................193 Specifying a Quorum Server.............................................................................................193 Obtaining Cross-Subnet Information...................................................................................194 Identifying Heartbeat Subnets............................................................................................195 Specifying Maximum Number of Configured Packages ........................................................196 Modifying the MEMBER_TIMEOUT Parameter......................................................................196 Controlling Access to the Cluster........................................................................................196 A Note about Terminology...........................................................................................196 How Access Roles Work..............................................................................................196 Levels of Access..........................................................................................................197 Setting up Access-Control Policies..................................................................................198 Role Conflicts.........................................................................................................200 Package versus Cluster Roles.........................................................................................200 Adding Volume Groups....................................................................................................201 Verifying the Cluster Configuration ....................................................................................201 Distributing the Binary Configuration File ...........................................................................202 Storing Volume Group and Cluster Lock Configuration Data .............................................202 Creating a Storage Infrastructure with Veritas Cluster File System (CFS)....................................203 Modular CFS packages v/s Legacy CFS packages..........................................................203 Preparing the Cluster and the System Multi-node Package.................................................205 Creating the Disk Groups.............................................................................................206 Creating the Disk Group Cluster Packages......................................................................207 Creating Volumes........................................................................................................208 Managing Disk Groups and Mount Points Using Modular Packages...................................208 Creating Modular Disk Group and Mount Point Packages............................................209 Parallel Activation of Disk Groups and Parallel Mounting of Mount Points.......................211 Creating Modular Checkpoint and Snapshot Packages for CFS....................................211 Online reconfiguration of modular CFS package parameters.............................................214 Example of online addition and deletion of a disk group and a mount point from a modular CFS package.........................................................................................................216 Guidelines for Migrating from Legacy CFS Package to Modular CFS Package.....................217 Managing Disk Groups and Mount Points Using Legacy Packages.....................................220 Creating a File System and Mount Point Package........................................................220 Creating Checkpoint and Snapshot Packages for CFS.................................................221 Creating the Storage Infrastructure with Veritas Cluster Volume Manager (CVM).......................223 Initializing the Veritas Volume Manager ........................................................................224 Preparing the Cluster for Use with CVM ........................................................................224 Identifying the Master Node.........................................................................................224 Initializing Disks for CVM.............................................................................................224 Creating Disk Groups..................................................................................................225 Mirror Detachment Policies with CVM........................................................................225 Creating Volumes .......................................................................................................225 Adding Disk Groups to the Package Configuration .........................................................225 Using DSAU during Configuration......................................................................................226 Managing the Running Cluster...............................................................................................226 Checking Cluster Operation with Serviceguard Manager......................................................226 Checking Cluster Operation with Serviceguard Commands...................................................226 Preventing Automatic Activation of LVM Volume Groups .......................................................227 Setting up Autostart Features ............................................................................................227 Changing the System Message .........................................................................................228 Managing a Single-Node Cluster......................................................................................228 Single-Node Operation...............................................................................................229 Disabling identd..............................................................................................................229 Deleting the Cluster Configuration .....................................................................................229Contents 9

6 Configuring Packages and Their Services ..................................................231Choosing Package Modules...................................................................................................231 Types of Package: Failover, Multi-Node, System Multi-Node...................................................232 Differences between Failover and Multi-Node Packages........................................................233 Package Modules and Parameters......................................................................................233 Base Package Modules................................................................................................234 Optional Package Modules..........................................................................................235 Package Parameter Explanations...................................................................................236 package_name......................................................................................................237 module_name........................................................................................................237 module_version......................................................................................................237 package_type........................................................................................................237 package_description...............................................................................................237 node_name...........................................................................................................238 auto_run...............................................................................................................238 node_fail_fast_enabled...........................................................................................238 run_script_timeout...................................................................................................239 halt_script_timeout..................................................................................................239 successor_halt_timeout............................................................................................240 script_log_file.........................................................................................................240 operation_sequence................................................................................................240 log_level...............................................................................................................240 failover_policy.......................................................................................................240 failback_policy.......................................................................................................241 priority..................................................................................................................241 dependency_name.................................................................................................241 dependency_condition............................................................................................242 dependency_location..............................................................................................243 weight_name, weight_value.....................................................................................243 local_lan_failover_allowed......................................................................................244 monitored_subnet...................................................................................................244 monitored_subnet_access........................................................................................244 cluster_interconnect_subnet......................................................................................244 ip_subnet..............................................................................................................245 ip_subnet_node .....................................................................................................246 ip_address............................................................................................................246 service_name.........................................................................................................246 service_cmd...........................................................................................................247 service_restart........................................................................................................247 service_fail_fast_enabled.........................................................................................247 service_halt_timeout................................................................................................247 generic_resource_name...........................................................................................247 generic_resource_evaluation_type.............................................................................248 generic_resource_up_criteria....................................................................................248 resource_name.......................................................................................................249 resource_polling_interval.........................................................................................249 resource_start.........................................................................................................249 resource_up_value..................................................................................................250 concurrent_vgchange_operations..............................................................................250 enable_threaded_vgchange.....................................................................................250 vgchange_cmd......................................................................................................251 cvm_activation_cmd................................................................................................251 vxvol_cmd.............................................................................................................251 vg........................................................................................................................25110 Contents

cvm_dg.................................................................................................................252 vxvm_dg...............................................................................................................252 vxvm_dg_retry.......................................................................................................252 deactivation_retry_count..........................................................................................252 kill_processes_accessing_raw_devices ......................................................................252 File system parameters............................................................................................252 concurrent_fsck_operations......................................................................................253 concurrent_mount_and_umount_operations................................................................253 fs_mount_retry_count..............................................................................................253 fs_umount_retry_count ............................................................................................253 fs_name................................................................................................................253 fs_server................................................................................................................254 fs_directory............................................................................................................254 fs_type..................................................................................................................254 fs_mount_opt.........................................................................................................254 fs_umount_opt........................................................................................................254 fs_fsck_opt.............................................................................................................254 pev_.....................................................................................................................255 external_pre_script..................................................................................................255 external_script........................................................................................................255 user_name.............................................................................................................255 user_host...............................................................................................................256 user_role...............................................................................................................256 Additional Parameters Used Only by Legacy Packages................................................256 Generating the Package Configuration File..............................................................................256 Before You Start...............................................................................................................256 cmmakepkg Examples......................................................................................................256 Next Step.......................................................................................................................257 Editing the Configuration File.................................................................................................258 Verifying and Applying the Package Configuration....................................................................261 Adding the Package to the Cluster..........................................................................................262 How Control Scripts Manage VxVM Disk Groups.....................................................................262

7 Cluster and Package Maintenance............................................................264Reviewing Cluster and Package Status.....................................................................................264 Reviewing Cluster and Package Status with the cmviewcl Command........................................264 Viewing Dependencies.................................................................................................264 Viewing CFS Multi-Node Information ............................................................................265 Types of Cluster and Package States .............................................................................265 Cluster Status ........................................................................................................265 Node Status and State ...........................................................................................265 Package Status and State.........................................................................................265 Package Switching Attributes....................................................................................267 Service Status .......................................................................................................267 Generic Resource Status..........................................................................................267 Network Status.......................................................................................................267 Failover and Failback Policies...................................................................................268 Examples of Cluster and Package States ........................................................................268 Normal Running Status............................................................................................268 Quorum Server Status.............................................................................................269 CFS Package Status ...............................................................................................269 Status After Halting a Package.................................................................................270 Status After Moving the Package to Another Node......................................................271 Status After Auto Run is Enabled...............................................................................272 Status After Halting a Node.....................................................................................273Contents 1 1

Viewing Information about Unowned Packages...........................................................273 Viewing Information about System Multi-Node Packages..............................................273 Checking Status of the Cluster File System (CFS)..........................................................274 Status of the Packages in a Cluster File System............................................................275 Status of CFS Modular Disk Group and Mount Point Packages......................................275 Status of Legacy CVM Disk Group Packages..............................................................276 Status of Legacy CFS Mount Point Packages...............................................................276 Checking the Cluster Configuration and Components............................................................276 Checking Cluster Components......................................................................................277 Setting up Periodic Cluster Verification...........................................................................279 Example................................................................................................................280 Limitations..................................................................................................................280 Managing the Cluster and Nodes ..........................................................................................280 Starting the Cluster When all Nodes are Down ...................................................................281 Using Serviceguard Commands to Start the Cluster..........................................................281 Adding Previously Configured Nodes to a Running Cluster....................................................281 Using Serviceguard Commands to Add Previously Configured Nodes to a Running Cluster ................................................................................................................................281 Removing Nodes from Participation in a Running Cluster.......................................................282 Halting the Entire Cluster ..................................................................................................282 Automatically Restarting the Cluster ...................................................................................282 Halting a Node or the Cluster while Keeping Packages Running............................................282 What You Can Do......................................................................................................283 Rules and Restrictions...................................................................................................283 Additional Points To Note.............................................................................................285 Halting a Node and Detaching its Packages...................................................................286 Halting a Detached Package........................................................................................286 Halting the Cluster and Detaching its Packages...............................................................286 Example: Halting the Cluster for Maintenance on the Heartbeat Subnets............................287 Managing Packages and Services .........................................................................................287 Starting a Package ..........................................................................................................287 Starting a Package that Has Dependencies.....................................................................287 Using Serviceguard Commands to Start a Package..........................................................288 Starting the Special-Purpose CVM and CFS Packages..................................................288 Halting a Package ..........................................................................................................288 Halting a Package that Has Dependencies.....................................................................288 Using Serviceguard Commands to Halt a Package .........................................................288 Handling Failures During Package Halt..........................................................................289 Moving a Failover Package ..............................................................................................290 Using Serviceguard Commands to Move a Running Failover Package................................290 Changing Package Switching Behavior ..............................................................................290 Changing Package Switching with Serviceguard Commands.............................................290 Maintaining a Package: Maintenance Mode.......................................................................291 Characteristics of a Package Running in Maintenance Mode or Partial-Startup Maintenance Mode .......................................................................................................................291 Rules for a Package in Maintenance Mode or Partial-Startup Maintenance Mode ...........292 Dependency Rules for a Package in Maintenance Mode or Partial-Startup Maintenance Mode ..................................................................................................................293 Performing Maintenance Using Maintenance Mode.........................................................293 Procedure..............................................................................................................293 Performing Maintenance Using Partial-Startup Maintenance Mode.....................................294 Procedure..............................................................................................................294 Excluding Modules in Partial-Startup Maintenance Mode.............................................294 Reconfiguring a Cluster.........................................................................................................295 Previewing the Effect of Cluster Changes.............................................................................29612 Contents

What You Can Preview................................................................................................297 Using Preview mode for Commands and in Serviceguard Manager...................................297 Using cmeval..............................................................................................................298 Updating the Cluster Lock Configuration.............................................................................298 Updating the Cluster Lock Disk Configuration Online.......................................................299 Updating the Cluster Lock LUN Configuration Online.......................................................299 Reconfiguring a Halted Cluster .........................................................................................299 Reconfiguring a Running Cluster........................................................................................300 Adding Nodes to the Cluster While the Cluster is Running ...............................................300 Removing Nodes from the Cluster while the Cluster Is Running ..........................................300 Changing the Cluster Networking Configuration while the Cluster Is Running......................301 What You Can Do..................................................................................................301 What You Must Keep in Mind..................................................................................302 Example: Adding a Heartbeat LAN..........................................................................303 Example: Deleting a Subnet Used by a Package.........................................................304 Removing a LAN or VLAN Interface from a Node.......................................................304 Changing the LVM Configuration while the Cluster is Running ..........................................305 Changing the VxVM or CVM Storage Configuration .......................................................305 Changing MAX_CONFIGURED_PACKAGES..................................................................306 Configuring a Legacy Package...............................................................................................306 Creating the Legacy Package Configuration .......................................................................306 Configuring a Package in Stages...................................................................................307 Editing the Package Configuration File...........................................................................307 Creating the Package Control Script...................................................................................309 Customizing the Package Control Script .........................................................................310 Adding Customer Defined Functions to the Package Control Script ....................................311 Adding Serviceguard Commands in Customer Defined Functions .................................311 Support for Additional Products.....................................................................................311 Verifying the Package Configuration...................................................................................312 Distributing the Configuration............................................................................................312 Distributing the Configuration And Control Script with Serviceguard Manager.....................312 Copying Package Control Scripts with HP-UX commands..................................................312 Distributing the Binary Cluster Configuration File with HP-UX Commands ...........................313 Configuring Cross-Subnet Failover......................................................................................313 Configuring node_name..............................................................................................313 Configuring monitored_subnet_access............................................................................314 Creating Subnet-Specific Package Control Scripts............................................................314 Control-script entries for nodeA and nodeB................................................................314 Control-script entries for nodeC and nodeD................................................................314 Reconfiguring a Package.......................................................................................................314 Migrating a Legacy Package to a Modular Package.............................................................315 Reconfiguring a Package on a Running Cluster ...................................................................315 Renaming or Replacing an External Script Used by a Running Package..............................316 Reconfiguring a Package on a Halted Cluster .....................................................................316 Adding a Package to a Running Cluster..............................................................................317 Deleting a Package from a Running Cluster ........................................................................317 Resetting the Service Restart Counter..................................................................................318 Allowable Package States During Reconfiguration ...............................................................318 Changes that Will Trigger Warnings..............................................................................324 Responding to Cluster Events .................................................................................................324 Single-Node Operation ........................................................................................................324 Disabling Serviceguard.........................................................................................................325 Removing Serviceguard from a System.....................................................................................325

Contents

13

8 Troubleshooting Your Cluster....................................................................326Testing Cluster Operation ......................................................................................................326 Start the Cluster using Serviceguard Manager.....................................................................326 Testing the Package Manager ...........................................................................................326 Testing the Cluster Manager .............................................................................................327 Testing the Network Manager ..........................................................................................327 Monitoring Hardware ...........................................................................................................327 Using System Fault Management Service.............................................................................328 Using Event Monitoring Service..........................................................................................328 Using EMS (Event Monitoring Service) Hardware Monitors....................................................328 Hardware Monitors and Persistence Requests.......................................................................328 Using HP ISEE (HP Instant Support Enterprise Edition)............................................................329 Replacing Disks....................................................................................................................329 Replacing a Faulty Array Mechanism..................................................................................329 Replacing a Faulty Mechanism in an HA Enclosure..............................................................329 Replacing a Lock Disk.......................................................................................................330 Replacing a Lock LUN......................................................................................................330 Online Hardware Maintenance with In-line SCSI Terminator .................................................331 Replacing I/O Cards............................................................................................................331 Replacing SCSI Host Bus Adapters.....................................................................................331 Revoking Persistent Reservations after a Failure.........................................................................332 Examples........................................................................................................................332 Replacing LAN or Fibre Channel Cards...................................................................................332 Offline Replacement.........................................................................................................333 Online Replacement.........................................................................................................333 After Replacing the Card..................................................................................................333 Replacing a Failed Quorum Server System...............................................................................333 Troubleshooting Approaches .................................................................................................334 Reviewing Package IP Addresses .......................................................................................335 Reviewing the System Log File ...........................................................................................335 Sample System Log Entries ...........................................................................................335 Reviewing Object Manager Log Files .................................................................................336 Reviewing Serviceguard Manager Log Files ........................................................................336 Reviewing the System Multi-node Package Files....................................................................336 Reviewing Configuration Files ...........................................................................................336 Reviewing the Package Control Script ................................................................................336 Using the cmcheckconf Command......................................................................................337 Reviewing the LAN Configuration ......................................................................................337 Solving Problems .................................................................................................................337 Serviceguard Command Hangs.........................................................................................338 Networking and Security Configuration Errors.....................................................................338 Cluster Re-formations Caused by Temporary Conditions........................................................338 Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low.................................338 System Administration Errors .............................................................................................339 Package Control Script Hangs or Failures ......................................................................340 Problems with Cluster File System (CFS)...............................................................................341 Problems with VxVM Disk Groups......................................................................................342 Force Import and Deport After Node Failure...................................................................342 Package Movement Errors ................................................................................................342 Node and Network Failures .............................................................................................343 Troubleshooting the Quorum Server....................................................................................343 Authorization File Problems...........................................................................................343 Timeout Problems........................................................................................................343 Messages...................................................................................................................34414 Contents

A Enterprise Cluster Master Toolkit ..............................................................345 B Designing Highly Available Cluster Applications ........................................346Automating Application Operation ........................................................................................346 Insulate Users from Outages .............................................................................................347 Define Application Startup and Shutdown ..........................................................................347 Controlling the Speed of Application Failover ..........................................................................347 Replicate Non-Data File Systems .......................................................................................348 Use Raw Volumes ...........................................................................................................348 Evaluate the Use of JFS ....................................................................................................348 Minimize Data Loss .........................................................................................................348 Minimize the Use and Amount of Memory-Based Data ....................................................348 Keep Logs Small ........................................................................................................348 Eliminate Need for Local Data .....................................................................................348 Use Restartable Transactions .............................................................................................349 Use Checkpoints .............................................................................................................349 Balance Checkpoint Frequency with Performance ...........................................................349 Design for Multiple Servers ..............................................................................................349 Design for Replicated Data Sites .......................................................................................350 Designing Applications to Run on Multiple Systems ..................................................................350 Avoid Node-Specific Information .......................................................................................350 Obtain Enough IP Addresses .......................................................................................351 Allow Multiple Instances on Same System ......................................................................351 Avoid Using SPU IDs or MAC Addresses ............................................................................351 Assign Unique Names to Applications ...............................................................................351 Use DNS ..................................................................................................................351 Use uname(2) With Care .................................................................................................352 Bind to a Fixed Port .........................................................................................................352 Bind to Relocatable IP Addresses ......................................................................................352 Call bind() before connect() .........................................................................................353 Give Each Application its Own Volume Group ....................................................................353 Use Multiple Destinations for SNA Applications ..................................................................353 Avoid File Locking ...........................................................................................................353 Using a Relocatable Address as the Source Address for an Application that is Bound to INADDR_ANY.....................................................................................................................354 Restoring Client Connections .................................................................................................355 Handling Application Failures ...............................................................................................356 Create Applications to be Failure Tolerant ..........................................................................356 Be Able to Monitor Applications .......................................................................................356 Minimizing Planned Downtime ..............................................................................................357 Reducing Time Needed for Application Upgrades and Patches .............................................357 Provide for Rolling Upgrades .......................................................................................357 Do Not Change the Data Layout Between Releases .........................................................357 Providing Online Application Reconfiguration .....................................................................358 Documenting Maintenance Operations ..............................................................................358

C Integrating HA Applications with Serviceguard..........................................359Checklist for Integrating HA Applications ................................................................................359 Defining Baseline Application Behavior on a Single System ..................................................359 Integrating HA Applications in Multiple Systems ..................................................................360 Testing the Cluster ...........................................................................................................361

D Software Upgrades ...............................................................................362Special Considerations for Upgrade to Serviceguard A.1 1.20.....................................................362 Special Considerations for Upgrade to Serviceguard A.1 1.19.......................................................362 How To Tell when the Cluster Re-formation Is Complete.........................................................363Contents 15

Types of Upgrade.................................................................................................................363 Rolling Upgrade..............................................................................................................363 Rolling Upgrade Using DRD..............................................................................................363 Restrictions for DRD Upgrades...........................................................................................364 Non-Rolling Upgrade.......................................................................................................364 Non-Rolling Upgrade Using DRD.......................................................................................364 Migration with Cold Install................................................................................................364 Guidelines for Rolling Upgrade..............................................................................................364 Performing a Rolling Upgrade................................................................................................365 Limitations of Rolling Upgrades .........................................................................................365 Before You Start...............................................................................................................366 Running the Rolling Upgrade.............................................................................................366 Keeping Kernels Consistent................................................................................................366 Migrating cmclnodelist entries from A.1 1.15 or earlier............................................................366 Performing a Rolling Upgrade Using DRD................................................................................367 Before You Start...............................................................................................................367 Running the Rolling Upgrade Using DRD.............................................................................367 Example of a Rolling Upgrade ..............................................................................................368 Step 1. ..........................................................................................................................368 Step 2. ..........................................................................................................................369 Step 3. ..........................................................................................................................369 Step 4. ..........................................................................................................................370 Step 5. ..........................................................................................................................370 Guidelines for Non-Rolling Upgrade.......................................................................................371 Migrating Cluster Lock PV Device File Names......................................................................371 Other Considerations.......................................................................................................371 Performing a Non-Rolling Upgrade.........................................................................................371 Limitations of Non-Rolling Upgrades...................................................................................371 Steps for Non-Rolling Upgrades.........................................................................................371 Performing a Non-Rolling Upgrade Using DRD.........................................................................372 Limitations of Non-Rolling Upgrades using DRD...................................................................372 Steps for a Non-Rolling Upgrade Using DRD.......................................................................372 Guidelines for Migrating a Cluster with Cold Install...................................................................373 Checklist for Migration.....................................................................................................373

E Blank Planning Worksheets......................................................................374Worksheet for Hardware Planning..........................................................................................374 Power Supply Worksheet.......................................................................................................374 Quorum Server Worksheet.....................................................................................................375 LVM Volume Group and Physical Volume Worksheet.................................................................375 VxVM Disk Group and Disk Worksheet ..................................................................................376 Cluster Configuration Worksheet.............................................................................................376 Package Configuration Worksheet..........................................................................................377

F Migrating from LVM to VxVM Data Storage ...............................................379Loading VxVM.....................................................................................................................379 Migrating Volume Groups......................................................................................................379 Customizing Packages for VxVM............................................................................................380 Customizing Packages for CVM.......................................................................