Oracle clusterware 11gR2UKOUG TEBS 2010
Frits Hoogland
Tuesday, December 7, 2010
Who am I?
Frits Hoogland–Working with Oracle products since 1996–Working with VX Company since 2009Interests–Databases– ApplicaCon servers–OperaCng systems–Web techniques, TCP/IP, network security– Technical security, performanceBlog: h6p://fritshoogland.wordpress.comEmail: >[email protected] ACE DirectorOakTable member
Tuesday, December 7, 2010
Agenda Oracle Restart Hardware
requirements Software
requirements Shutdown Instance check by
clusterware Listener check by
clusterware Startup standalone3
init.ohasd Processes standalone Startup clustered Processes clustered Votedisk ASM & Clusterware Stability Q & A
Tuesday, December 7, 2010
This is an investigation into:
–Oracle Clusterware
–Oracle clusterware 11.2.0.1 aka 11gR2 - 32 bit–Oracle Enterprise Linux 5.4 - 32 bit
–Using: VMWare Fusion version 2.0.6 (196839)
4
Tuesday, December 7, 2010
Clusterware - Oracle restart Standalone install of clusterware
Starts the database and listener automatically–Feature is called ‘oracle restart’–Clusterware needs to be installed in separate home–Free!
5
Tuesday, December 7, 2010
Hardware requirements
6
Memory requirements
–Documentation (http://docs.oracle.com):- Grid/cluster: 1.5G- Grid/standalone: 1.0G
– Installer:- Grid/cluster: 1.5G- Grid/standalone: 1.5G
Tuesday, December 7, 2010
Hardware requirements
7
Network requirements
–Grid/standalone:- 1 network interface- Hostname resolvable
–Grid/cluster- 2 network interfaces- Hostnames resolvable
- hostname(s), vip, SCAN- SSH equivalence (*)
Tuesday, December 7, 2010
Software requirements
8
Operating system
–Use ‘yum’- Needs internet access.- See http://public-yum.oracle.com- Free service (no license required!)
– Install ‘oracle-validated’ package-# yum install oracle-validated
Tuesday, December 7, 2010
Shutdown Clusterware integrates with stop/start system
9
$ ls -l /etc/rc.d/init.d/*ohasd-rwxr-xr-x 1 root root 3105 Mar 12 13:01 /etc/rc.d/init.d/init.ohasd-rwxr-xr-x 1 root root 2616 Mar 12 13:01 /etc/rc.d/init.d/ohasd
Huh? Two control scripts for the HA daemon?
Tuesday, December 7, 2010
Shutdown Close investigation of the comments reveals
some information:
10
$ head -6 /etc/rc.d/init.d/init.ohasd #!/bin/sh## Copyright (c) 2001, 2009, Oracle and/or its affiliates. All rights reserved. ## init.ohasd - Control script for the Oracle HA services daemon# This script is invoked by the init system
Tuesday, December 7, 2010
Shutdown Close investigation of the comments reveals
some information:
11
$ head -6 /etc/rc.d/init.d/ohasd #!/bin/sh## Copyright (c) 2001, 2009, Oracle and/or its affiliates. All rights reserved. ## ohasd.sbs - Control script for the Oracle HA Services daemon# This script is invoked by the rc system.
Tuesday, December 7, 2010
Shutdown
ohasd - rc
12
$ tail -1 /etc/inittabh1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
$ /sbin/chkconfig --list ohasdservice ohasd does not support chkconfig
$ /sbin/chkconfig --list sendmailsendmail 0:off 1:off 2:on 3:on 4:on 5:on 6:off
$ /sbin/chkconfig --list doesnotexisterror reading information on service doesnotexist: No such file or directory
init.ohasd - init
Huh?
Tuesday, December 7, 2010
Shutdown It doesn’t use Redhats specific stop/start
implementation. It starts, though:
13
$ find /etc/rc.d -name "S*ohasd"../rc3.d/S96ohasd../rc5.d/S96ohasd
$ find /etc/rc.d -name "K*ohasd"../rc0.d/K19ohasd../rc6.d/K19ohasd../rc2.d/K19ohasd../rc4.d/K19ohasd../rc1.d/K19ohasd
And it is configured to stop:
....mind the ‘is configured’.....Tuesday, December 7, 2010
Shutdown But it will never stop...
–Leading to crash the clusterware and services(!) on shutdown
14
Redhats stop/start implementation requires:–A lock file– In /var/lock/subsys/–Otherwise it’s considered not started–And will not be stopped as a result of that–Lock-filename = name of stop/start script
identified as bug: 8740030
Tuesday, December 7, 2010
Example stop/start script
15
prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}
Tuesday, December 7, 2010
Example stop/start script
15
prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}
Tuesday, December 7, 2010
Example stop/start script
15
prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}
Tuesday, December 7, 2010
Example stop/start script
15
prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}
Tuesday, December 7, 2010
Example stop/start script
15
prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}
Tuesday, December 7, 2010
Example stop/start script
15
prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}
Tuesday, December 7, 2010
Example stop/start script
15
prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}
Tuesday, December 7, 2010
Example stop/start script
15
prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}
Tuesday, December 7, 2010
Example stop/start script
15
prog=smartdpidfile=/var/lock/subsys/smartdstart(){ echo -n $"Starting $prog: " daemon $SMARTD_BIN $smartd_opts RETVAL=$? echo [ $RETVAL = 0 ] && touch $pidfile return $RETVAL}stop(){ echo -n $"Shutting down $prog: " killproc $SMARTD_BIN RETVAL=$? echo rm -f $pidfile return $RETVAL}
Tuesday, December 7, 2010
Shutdown
Is this really a problem?
16
Tuesday, December 7, 2010
Instance check by clusterware Applies to database instance and ASM instance
–CRS resource types:- ora.asm.type- ora.database.type
Checked every 1 second:
17
$ crsctl status resource ora.testdb.db -f | grep ^CHECK_INTERVALCHECK_INTERVAL=1$ crsctl status resource ora.asm -f | grep ^CHECK_INTERVALCHECK_INTERVAL=1
Tuesday, December 7, 2010
Instance check by clusterware Two checks are done:
1. Check for pmon background process- Via linux’ proc filesystem: /proc/<PID>/stat file
2. Check for instance status via ‘health check file’- File: $ORACLE_HOME/dbs/hc_sid.dat
This means:–There is a negligible impact on the instance–Whilst a detailed status is known
18
Tuesday, December 7, 2010
Instance check by clusterware Because clusterware reads hc_sid.dat
– It knows if stop/start is user initiated–Modifies resource target status accordingly
19
Tuesday, December 7, 2010
Listener check by clusterware Applies to listener
–CRS resource type:- ora.listener.type
Checked every 60 seconds:
20
$ crsctl status res ora.LISTENER.lsnr -f | grep ^CHECK_INTERVALCHECK_INTERVAL=60
Tuesday, December 7, 2010
Listener check by clusterware One check is done:
1. The command ‘lsnrctl status’ is issued- Returncode of ‘lsnrctl status’ command is used
Listener notifies clusterware of start and stop–Listener.log: ‘Listener completed notification to CRS on start/stop’
–Done through socket- ‘/var/tmp/.oracle/sCRSD_UI_SOCKET’
–Cluster modifies resource target accordingly
21
Tuesday, December 7, 2010
Startup standalone
(Clusterware Administration and Deployment guide 11gr2, 1 Introduction, Overview)
22
Tuesday, December 7, 2010
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
Tuesday, December 7, 2010
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
Tuesday, December 7, 2010
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
2. ohasdrun = restart?
Tuesday, December 7, 2010
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
2. ohasdrun = restart?no
Tuesday, December 7, 2010
/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
2. ohasdrun = restart?no
Tuesday, December 7, 2010
/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
2. ohasdrun = restart?
7. ohasdrun readable?
no
Tuesday, December 7, 2010
8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe
/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
2. ohasdrun = restart?
7. ohasdrun readable?
no
Tuesday, December 7, 2010
8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe
/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
2. ohasdrun = restart?
7. ohasdrun readable?
9. ohasdrun = restart? ohasd restart &; wait for message from pipe
no
Tuesday, December 7, 2010
8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe
/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
2. ohasdrun = restart?
7. ohasdrun readable?
9. ohasdrun = restart? ohasd restart &; wait for message from pipe
10. ohasdrun = stop?
no
Tuesday, December 7, 2010
8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe
/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
2. ohasdrun = restart?
3. `crsctl check has`
7. ohasdrun readable?
9. ohasdrun = restart? ohasd restart &; wait for message from pipe
10. ohasdrun = stop?
no
Tuesday, December 7, 2010
8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe
/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
2. ohasdrun = restart?
3. `crsctl check has`
7. ohasdrun readable?
9. ohasdrun = restart? ohasd restart &; wait for message from pipe
10. ohasdrun = stop?
4. != CRS-4638? sleep 10
no
Tuesday, December 7, 2010
8. ohasdrun = reboot? echo ”restart” > ohasdrun; wait for message from pipe
/var/tmp/.oracle/npohasd (pipe)6. (re)make pipe
init.ohasd
23
init
/etc/rc.d/init.d/init.ohasd
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
1. ohasdrun readable? /etc/oracle/scls_scr/<host>/<usr>/ohasdrun
2. ohasdrun = restart?
3. `crsctl check has`
7. ohasdrun readable?
9. ohasdrun = restart? ohasd restart &; wait for message from pipe
10. ohasdrun = stop?
4. != CRS-4638? sleep 10
5. CRS-4638? wait for message from pipe
no
Tuesday, December 7, 2010
Startup standalone
24
init /etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
rootoracle
grid
Tuesday, December 7, 2010
Startup standalone
24
init
/etc/rc.d/init.d/init.ohasd /var/log/messages
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
rootoracle
grid
Tuesday, December 7, 2010
Startup standalone
25
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
Tuesday, December 7, 2010
Startup standalone
26
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
rc
Tuesday, December 7, 2010
Startup standalone
26
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
/etc/oracle/scls_scr/<host>/<usr>/ohasdstr: enable|disable/etc/rc.d/init.d/ohasd
rc
Tuesday, December 7, 2010
Startup standalone
26
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
/etc/oracle/scls_scr/<host>/<usr>/ohasdstr: enable|disable/etc/rc.d/init.d/ohasd
rc
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
Tuesday, December 7, 2010
GRID_HOME/bin/orarootagent.bin
Startup standalone
27
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GRID_HOME/bin/oraagent.binGRID_HOME/bin/cssdagent GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
Tuesday, December 7, 2010
GRID_HOME/bin/ocssd.bin
GRID_HOME/bin/orarootagent.bin
Startup standalone
28
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GRID_HOME/bin/oraagent.bin
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GH/log/<host>/agent/ohasd/oracssdagent_oracle/oracssdagent_oracle.log
GH/log/<host>/agent/ohasd/orarootagent_oracle/orarootagent_oracle.log
GH/log/<host>/agent/ohasd/oraagent_oracle/oraagent_oracle.log
GRID_HOME/bin/cssdagent
Tuesday, December 7, 2010
GRID_HOME/bin/diskmon.binGRID_HOME/bin/orarootagent.bin
GRID_HOME/bin/ocssd.bin
Startup standalone
29
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GRID_HOME/bin/oraagent.bin
GRID_HOME/bin/cssdagent
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GH/log/<host>/cssd/ocssd.log
Tuesday, December 7, 2010
listener resource database resourceASM resource
GRID_HOME/bin/diskmon.bin
GRID_HOME/bin/ocssd.bin
Startup standalone
30
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GRID_HOME/bin/oraagent.bin
GRID_HOME/bin/cssdagent
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GRID_HOME/bin/orarootagent.bin
GH/log/<host>/diskmon/diskmon.log
Tuesday, December 7, 2010
GRID_HOME/bin/diskmon.binGRID_HOME/bin/orarootagent.bin
GRID_HOME/bin/ocssd.bin
Startup standalone
31
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GRID_HOME/bin/oraagent.bin
GRID_HOME/bin/cssdagent
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
ASM resource
database resource
listener resource
Tuesday, December 7, 2010
Processes - standalone Any cluster process can be killed or crashed.
–Without influencing the resources it protects.
Except ocssd.bin–cluster synchronisation services daemon
–Documentation:
32
The cssdagent process monitors the cluster and provides I/O fencing. This service formerly was provided by Oracle Process Monitor Daemon (oprocd), also known as OraFenceService on Windows. A cssdagent failure results in Oracle Clusterware restarting the node.
Tuesday, December 7, 2010
Processes - standalone. 1. killall -9 ocssd.bin2. ASM instance terminates
33
Tuesday, December 7, 2010
Unix process pid: 27506, image: [email protected] (GMON)
2010-03-26 11:06:03.152: [ CSSCLNT]clsssRecvMsg: got a disconnect from the server while waiting for message type 1
2010-03-26 11:06:03.152: [ CSSCLNT]clssgsGroupGetStatus: communications failed (0/3/-1)
2010-03-26 11:06:03.152: [ CSSCLNT]clssgsGroupGetStatus: returning 8
kgxgnpstat: received ABORT event from CLSSGroup services Error [NM abort event ] @ 28019:1125error 29702 detected in background process
ORA-29702: error occurred in Cluster Group Service operation
GMON (ospid: 27506): terminating the instance due to error 29702
34
Processes - standalone.
Tuesday, December 7, 2010
Processes - standalone. 1. killall -9 ocssd.bin2. ASM instance terminates3. Database instances which use ASM terminates
35
Tuesday, December 7, 2010
Unix process pid: 27588, image: [email protected] (ASMB)
NOTE: ASMB terminatingerror 15064 detected in background processORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channelASMB (ospid: 27588): terminating the instance due to error 15064
36
Processes - standalone.
Tuesday, December 7, 2010
Processes - standalone. 1. killall -9 ocssd.bin2. ASM instance terminates3. Database instances which use ASM terminates4. oraagent detects ASM instance termination
- ASM resource status is set to FAILED5. oraagent detects db instance termination
- db resource status is set to FAILED- Diskgroup resource status is set to FAILED
6. orarootagent detects diskmon termination7. orarootagent cleans state and starts diskmon37
Tuesday, December 7, 2010
Processes - standalone. 8. ohasd restarts cssdagent
- (new) cssdagent sets cssd resource state to OFFLINE
9. cssdagent starts cssd- cssd resource state is set to ONLINE
10. ASM instance is started & set to ONLINE11. Diskgroup resource is started & set to
ONLINE12. Database instances are started & set to
ONLINE
38
Tuesday, December 7, 2010
Startup clustered
39
init /etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
rootoracle
grid
Tuesday, December 7, 2010
Startup clustered
39
init
/etc/rc.d/init.d/init.ohasd /var/log/messages
/etc/inittab: h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
rootoracle
grid
Tuesday, December 7, 2010
Startup clustered
40
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
Tuesday, December 7, 2010
Startup clustered
41
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
rc
Tuesday, December 7, 2010
Startup clustered
41
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
/etc/oracle/scls_scr/<host>/<usr>/ohasdstr: enable|disable/etc/rc.d/init.d/ohasd
rc
Tuesday, December 7, 2010
Startup clustered
41
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
/etc/oracle/scls_scr/<host>/<usr>/ohasdstr: enable|disable/etc/rc.d/init.d/ohasd
rc
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
Tuesday, December 7, 2010
GRID_HOME/bin/oraagent.binGRID_HOME/bin/orarootagent.binGRID_HOME/bin/cssdmonitorGRID_HOME/bin/cssdagent
Startup clustered
42
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
Tuesday, December 7, 2010
cssd.bin
evmd.binASM resourcegipcd.bingpnp.binmdnsd.bin evmlogger.binGRID_HOME/bin/oraagent.bin
Startup clustered
43
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GRID_HOME/bin/cssdmonitor
GRID_HOME/bin/cssdagent
octssd.bincrsd.bindiskmon.binGRID_HOME/bin/orarootagent.bin
GH/log/<host>/agent/ohasd/oraagent_grid/oraagent_grid.log
GH/log/<host>/agent/ohasd/oracssdmonitor_root/oracssdmonitor_root.log
GH/log/<host>/agent/ohasd/oracssdagent_root/oracssdagent_root.log
GH/log/<host>/agent/ohasd/orarootagent_root/orarootagent_root.log
Tuesday, December 7, 2010
cssd.bin
octssd.bincrsd.bindiskmon.bin
evmlogger.binevmd.bin
ASM resource
gipcd.bin
gpnp.bin
mdnsd.bin
Startup clustered
44
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GRID_HOME/bin/cssdmonitor
GRID_HOME/bin/oraagent.bin
GRID_HOME/bin/orarootagent.bin
GRID_HOME/bin/cssdagent
GH/log/<host>/mdnsd/mdnsd.log
GH/log/<host>/gpnpd/gpnpd.log
GH/log/<host>/gipcd/gipcd.log
GH/log/<host>/evmd/evmd.log
GH/evm/log/<hostvip>_evmlog.<date>
Tuesday, December 7, 2010
crsd.bindiskmon.binoctssd.binGRID_HOME/bin/orarootagent.bin
evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin
Startup clustered
45
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GRID_HOME/bin/cssdmonitor
GRID_HOME/bin/oraagent.bin
cssd.bin
GRID_HOME/bin/cssdagent
GH/log/<host>/cssd/cssd.log
Tuesday, December 7, 2010
octssd.bin
diskmon.bin
cssd.bin
evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.binGRID_HOME/bin/oraagent.bin
Startup clustered
46
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GRID_HOME/bin/orarootagent.bin
GRID_HOME/bin/cssdmonitor
GRID_HOME/bin/cssdagent
GH/log/<host>/diskmon/diskmon.log
GH/log/<host>/ctssd/octssd.log
GH/log/<host>/crsd/crsd.loglistenerlistener_scan1onseons (java)ASM (check)oraagent.bincrsd.bin
Tuesday, December 7, 2010
listenerlistener_scan1onseons (java)ASM (check)
cssd.bin
evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin
octssd.bin
diskmon.bin
GRID_HOME/bin/oraagent.bin
Startup clustered
47
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GRID_HOME/bin/orarootagent.bin
GRID_HOME/bin/cssdmonitor
GRID_HOME/bin/cssdagent
oraagent.bin
orarootagent.bin
crsd.bin oraagent.bin
GH/log/<host>/agent/crsd/oraagent_grid/oraagent_grid.log
GH/log/<host>/agent/crsd/oraagent_oracle/oraagent_oracle.log
GH/log/<host>/agent/crsd/orarootagent_root/orarootagent_root.log
Tuesday, December 7, 2010
database oraagent.binlistener
listener_scan1
ons
eons (java)
ASM (check)
crsd.bin
cssd.bin
evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin
octssd.bin
diskmon.bin
GRID_HOME/bin/oraagent.bin
Startup clustered
48
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GRID_HOME/bin/orarootagent.bin
GRID_HOME/bin/cssdmonitor
GRID_HOME/bin/cssdagent
orarootagent.bin
oraagent.bin
Tuesday, December 7, 2010
<host-vip>scan1net1
crsd.bin
cssd.bin
evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin
octssd.bin
diskmon.bin
GRID_HOME/bin/oraagent.bin
Startup clustered
49
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GRID_HOME/bin/orarootagent.bin
GRID_HOME/bin/cssdmonitor
GRID_HOME/bin/cssdagent
database
listenerlistener_scan1onseons (java)ASM (check)
oraagent.bin
orarootagent.bin
oraagent.bin
Tuesday, December 7, 2010
crsd.bin
cssd.bin
evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin
octssd.bin
diskmon.bin
GRID_HOME/bin/oraagent.bin
Startup clustered
50
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GRID_HOME/bin/orarootagent.bin
GRID_HOME/bin/cssdmonitor
GRID_HOME/bin/cssdagent
database
listenerlistener_scan1onseons (java)ASM (check)
oraagent.bin
oraagent.bin
net1
<host-vip>
scan1
orarootagent.bin
Tuesday, December 7, 2010
listenerlistener_scan1onseons (java)ASM (check)crsd.bin
cssd.bin
evmlogger.binevmd.binASM resourcegipcd.bingpnp.binmdnsd.bin
octssd.bin
diskmon.bin
GRID_HOME/bin/oraagent.bin
Startup clustered
51
/etc/rc.d/init.d/init.ohasd /var/log/messages
rootoracle
grid
GH/log/<host>/ohasd/ohasd.logGRID_HOME/bin/ohasd.bin
GRID_HOME/bin/orarootagent.bin
GRID_HOME/bin/cssdmonitor
GRID_HOME/bin/cssdagent
oraagent.bin
oraagent.bin
net1<host-vip>scan1orarootagent.bin
Tuesday, December 7, 2010
Processes - clustered Any cluster process can be killed or crashed.
–Without influencing the resources it protects.
Except ocssd.bin–cluster synchronisation services daemon
In clustered mode, ocssd.bin’s death results in node reboot.
52
Tuesday, December 7, 2010
Votedisk Function: registration of cluster membership
Votedisks are shared With version 11.2 only CFS or ASM are
supported votedisk storage*–Raw devices supported for upgrade
* A votedisk can be placed on NFS:http://www.oracle.com/technology/products/database/clusterware/pdf/grid_infra_thirdvoteonnfs.pdf
53
Tuesday, December 7, 2010
Votedisk This is how the clusterware detects the ASM
votedisks:
54
$ kfed read /dev/mapper/vg00-lvasmkfbh.endian: 1 ; 0x000: 0x01kfbh.hard: 130 ; 0x001: 0x82kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD...kfdhdb.vfstart: 352 ; 0x0ec: 0x00000160kfdhdb.vfend: 384 ; 0x0f0: 0x00000180...
Tuesday, December 7, 2010
Votedisk ½ plus 1 rule: Quorum
The number of votedisks in a diskgroup is limited by the diskgroup redundancy level:
55
Redundancy Level Number of votedisksExternal 1Normal 3High 5
Tuesday, December 7, 2010
ASM & Clusterware ASM startup settings are in:
–$ORACLE_HOME/gpnp/<hostname>/profiles/peer/profile.xml
–File is ‘signed’
56
<orcl:ASM-Profile id="asm" DiscoveryString="/dev/iscsi" SPFile="+DATA/oel5u4-cluster/asmparameterfile/registry.253.718585107"/>
Tuesday, December 7, 2010
ASM & Clusterware It’s possible to manipulate profile.xml:
–Unsign:
–Adjust–Sign again:
57
$ gpnptool unsign -p=profile.xml -ovr -o=profile.xml
$ gpnptool sign -p=profile.xml -ovr -o=profile.xml -w=file:$ORACLE_HOME/gpnp/wallets/peer -rmws
Tuesday, December 7, 2010
Stability 11gR2 Clusterware is reported to be stable
Bugs/issues known to me:–8740030 OHASD STOP DOES NOT GET EXECUTED DURING SYSTEM SHUTDOWN
–9251136 INSTANCE WILL NOT USE HUGEPAGE IF STARTED BY SRVCTL
–Difficulties with database versions < 11–(kfod / Jeff Hunter’s 11gr2 install doc.)
58
Tuesday, December 7, 2010
A
59
Q &
Tuesday, December 7, 2010
Top Related