Recovering OCR & Vote Disks

18
Lab: 25 Date: 2014-Feb-05 Create By: William Muñoz Rodas Email: [email protected] Procedimiento de recuperacion OCR y Vote Disks A continuación describo los pasos necesarios para realizar la recuperación de un sistema en Clúster con Oracle RAC 11g Release 2, cuando se pierde el disco OCR y Votedisk. Copia de seguridad del disco OCR Lo primero que se debe realizar cuando se instala Oracle Clusterware y Oracle Database RAC es garantizar que el respaldo de la copia de seguridad se realiza a un medio externo. Estado del OCR antes de realizar la actividad Para verificar el estado del OCR ejecute el siguiente comando: [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 157848 Used space (kbytes) : 3784 Available space (kbytes) : 154064 ID : 3299366 Device/File Name : +OCR Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check succeeded

description

Procedure to recover an OCR & Vote Disks

Transcript of Recovering OCR & Vote Disks

  • Lab: 25

    Date: 2014-Feb-05

    Create By: William Muoz Rodas

    Email: [email protected]

    Procedimiento de recuperacio n OCR y Vote Disks

    A continuacin describo los pasos necesarios para realizar la recuperacin de un sistema en

    Clster con Oracle RAC 11g Release 2, cuando se pierde el disco OCR y Votedisk.

    Copia de seguridad del disco OCR

    Lo primero que se debe realizar cuando se instala Oracle Clusterware y Oracle Database RAC es

    garantizar que el respaldo de la copia de seguridad se realiza a un medio externo.

    Estado del OCR antes de realizar la actividad

    Para verificar el estado del OCR ejecute el siguiente comando:

    [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/ocrcheck Status of Oracle Cluster Registry is as follows : Version : 3 Total space (kbytes) : 157848 Used space (kbytes) : 3784 Available space (kbytes) : 154064 ID : 3299366 Device/File Name : +OCR Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check succeeded

  • [root@rh6-112-rac1 ~]#

    Ubicacio n de la copia de seguridad del OCR

    Nodo 1 [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/ocrconfig -showbackup rh6-112-rac1 2013/11/30 12:47:09 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/backup00.ocr rh6-112-rac1 2013/08/02 18:56:44 /oracle/app/grid/11.2.0.3/cdata/rh6-112-scan/backup01.ocr rh6-112-rac1 2013/02/13 11:51:10 /oracle/app/grid/11.2.0.3/cdata/rh6-112-scan/backup02.ocr rh6-112-rac1 2013/11/30 12:47:09 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/day.ocr rh6-112-rac1 2013/11/30 12:47:09 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/week.ocr rh6-112-rac2 2013/08/29 19:59:52 /oracle/app/grid/11.2.0.3/cdata/rh6-112-scan/backup_20130829_195952.ocr [root@rh6-112-rac1 ~]# Nodo 2 [root@rh6-112-rac2 ~]# /oracle/app/grid/11.2.0.4/bin/ocrconfig -showbackup rh6-112-rac1 2013/11/30 12:47:09 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/backup00.ocr rh6-112-rac1 2013/08/02 18:56:44 /oracle/app/grid/11.2.0.3/cdata/rh6-112-scan/backup01.ocr rh6-112-rac1 2013/02/13 11:51:10 /oracle/app/grid/11.2.0.3/cdata/rh6-112-scan/backup02.ocr rh6-112-rac1 2013/11/30 12:47:09 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/day.ocr rh6-112-rac1 2013/11/30 12:47:09 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/week.ocr rh6-112-rac2 2013/08/29 19:59:52 /oracle/app/grid/11.2.0.3/cdata/rh6-112-scan/backup_20130829_195952.ocr [root@rh6-112-rac2 ~]#

  • Se observa por la fechas de creacin de copias de seguridad que no se ha realizado una copia en mucho tiempo, esto se presenta en este RAC porque no mantiene activo, entonces las tareas programadas no se ejecutan correctamente.

    Realizar la copia de seguridad de forma manual

    Para realizar la copia de seguridad manual del disco OCR ejecute la siguiente instruccin: [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/ocrconfig -manualbackup rh6-112-rac1 2014/02/05 16:08:23 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/backup_20140205_160823.ocr rh6-112-rac2 2013/08/29 19:59:52 /oracle/app/grid/11.2.0.3/cdata/rh6-112-scan/backup_20130829_195952.ocr [root@rh6-112-rac1 ~]#

    Verificar que la copia se realizo correctamente

    Para verificar que la copia manual se realiz correctamente, ejecute el siguiente comando: [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/ocrconfig -showbackup rh6-112-rac1 2013/11/30 12:47:09 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/backup00.ocr rh6-112-rac1 2013/08/02 18:56:44 /oracle/app/grid/11.2.0.3/cdata/rh6-112-scan/backup01.ocr rh6-112-rac1 2013/02/13 11:51:10 /oracle/app/grid/11.2.0.3/cdata/rh6-112-scan/backup02.ocr rh6-112-rac1 2013/11/30 12:47:09 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/day.ocr rh6-112-rac1 2013/11/30 12:47:09 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/week.ocr rh6-112-rac1 2014/02/05 16:08:23 /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/backup_20140205_160823.ocr rh6-112-rac2 2013/08/29 19:59:52 /oracle/app/grid/11.2.0.3/cdata/rh6-112-scan/backup_20130829_195952.ocr [root@rh6-112-rac1 ~]#

  • Identificar los discos que pertenecen al diskgroup OCR

    Ingresar a la instancia de ASM e identificar los discos que pertenecen al OCR, en un ambiente productivo esta informacin se debe encontrar en el documento de instalacin inicial. SQL> col path for a25 SQL> select GROUP_NUMBER, DISK_NUMBER, LABEL, PATH, NAME from v$asm_disk where name like 'OCR%' order by PATH; GROUP_NUMBER DISK_NUMBER LABEL PATH NAME ------------ ----------- ------------------------------- ------------------------- ------------------------------ 4 0 /dev/asm-disk1 OCR_0000 4 3 /dev/asm-disk10 OCR_0003 4 1 /dev/asm-disk8 OCR_0001 4 2 /dev/asm-disk9 OCR_0002 SQL> Documentacin inicial de la configuracin de los discos

    Dispositivo Linux Dispositivo ASM Dispositivo VirtualBox

    /dev/sda N/A 1ATA_VBOX_HARDDISK_VBd4680125-6f8d8c97

    /dev/sdb /dev/asm-disk1 1ATA_VBOX_HARDDISK_VB3943247b-f8e55cbe

    /dev/sdc /dev/asm-disk2 1ATA_VBOX_HARDDISK_VB628e47d7-e28b8f09

    /dev/sdd /dev/asm-disk3 1ATA_VBOX_HARDDISK_VB89c5e2c2-621e8052

    /dev/sde /dev/asm-disk4 1ATA_VBOX_HARDDISK_VB68036111-7239a139

    /dev/sdf /dev/asm-disk5 1ATA_VBOX_HARDDISK_VBd43f3407-91b4f8e6

    /dev/sdg /dev/asm-disk6 1ATA_VBOX_HARDDISK_VB2679427e-16395935

    /dev/sdh /dev/asm-disk7 1ATA_VBOX_HARDDISK_VB3e10734e-20faed6d

    /dev/sdi N/A 1ATA_VBOX_HARDDISK_VB36da5687-77269fb7

    /dev/sdj /dev/asm-disk8 1ATA_VBOX_HARDDISK_VBa6fa1a99-724b260f

    /dev/sdk /dev/asm-disk9 1ATA_VBOX_HARDDISK_VBeec1a7ac-f1bb7085

    /dev/sdl /dev/asm-disk10 1ATA_VBOX_HARDDISK_VB63a1a136-6a6a0ff8

    /dev/sdm /dev/asm-disk11 1ATA_VBOX_HARDDISK_VB82e67e5e-752fdc5d

    Con la siguiente instruccin en Linux se puede visualizar el Id asignado por VirtualBox al disco: [root@rh6-112-rac1 dev]# /sbin/scsi_id -g -u -d /dev/sdb 1ATA_VBOX_HARDDISK_VB3943247b-f8e55cbe

    Dan ar los discos que pertenecen al OCR (Simulando un crash)

    Este procedimiento lo realic para simular una cada, no hacer esto en un entorno de produccin. Con las siguientes instrucciones se daan los encabezados de los discos, una vez ejecutado el RAC no debe funcionar correctamente:

  • [root@rh6-112-rac1 ~]# dd if=/dev/zero of=/dev/sdb bs=1024 count=1000 1000+0 records in 1000+0 records out 1024000 bytes (1.0 MB) copied, 0.114191 s, 9.0 MB/s [root@rh6-112-rac1 ~]# dd if=/dev/zero of=/dev/sdl bs=1024 count=1000 1000+0 records in 1000+0 records out 1024000 bytes (1.0 MB) copied, 0.105007 s, 9.8 MB/s [root@rh6-112-rac1 ~]# dd if=/dev/zero of=/dev/sdj bs=1024 count=1000 1000+0 records in 1000+0 records out 1024000 bytes (1.0 MB) copied, 0.125585 s, 8.2 MB/s [root@rh6-112-rac1 ~]# dd if=/dev/zero of=/dev/sdk bs=1024 count=1000 1000+0 records in 1000+0 records out 1024000 bytes (1.0 MB) copied, 0.106918 s, 9.6 MB/s [root@rh6-112-rac1 ~]#

    Detener los servicios

    Nodo 1 [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl stop crs CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rh6-112-rac1' CRS-2673: Attempting to stop 'ora.crsd' on 'rh6-112-rac1' CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rh6-112-rac1' CRS-2673: Attempting to stop 'ora.LISTENER_SCAN3.lsnr' on 'rh6-112-rac1' CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rh6-112-rac1' CRS-2672: Attempting to start 'httpd-vip' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rh6-112-rac1' CRS-2673: Attempting to stop 'ora.ACFS.dg' on 'rh6-112-rac1' CRS-2673: Attempting to stop 'ora.OCR.dg' on 'rh6-112-rac1' CRS-2673: Attempting to stop 'ora.registry.acfs' on 'rh6-112-rac1' CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rh6-112-rac1' CRS-2677: Stop of 'ora.LISTENER_SCAN3.lsnr' on 'rh6-112-rac1' succeeded CRS-2673: Attempting to stop 'ora.scan3.vip' on 'rh6-112-rac1' CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rh6-112-rac1' succeeded CRS-2673: Attempting to stop 'ora.rh6-112-rac1.vip' on 'rh6-112-rac1' CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rh6-112-rac1' succeeded CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rh6-112-rac1' CRS-2677: Stop of 'ora.registry.acfs' on 'rh6-112-rac1' succeeded CRS-2677: Stop of 'ora.rh6-112-rac1.vip' on 'rh6-112-rac1' succeeded

  • CRS-2672: Attempting to start 'ora.rh6-112-rac1.vip' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.DATA.dg' on 'rh6-112-rac1' succeeded CRS-2677: Stop of 'ora.scan3.vip' on 'rh6-112-rac1' succeeded CRS-2672: Attempting to start 'ora.scan3.vip' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.ACFS.dg' on 'rh6-112-rac1' succeeded CRS-2677: Stop of 'ora.scan1.vip' on 'rh6-112-rac1' succeeded CRS-2672: Attempting to start 'ora.scan1.vip' on 'rh6-112-rac2' CRS-2676: Start of 'httpd-vip' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'tomcat' on 'rh6-112-rac1' CRS-2676: Start of 'ora.scan3.vip' on 'rh6-112-rac2' succeeded CRS-2672: Attempting to start 'ora.LISTENER_SCAN3.lsnr' on 'rh6-112-rac2' CRS-2676: Start of 'ora.rh6-112-rac1.vip' on 'rh6-112-rac2' succeeded CRS-2676: Start of 'ora.scan1.vip' on 'rh6-112-rac2' succeeded CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'rh6-112-rac2' CRS-2675: Stop of 'tomcat' on 'rh6-112-rac1' failed CRS-2676: Start of 'ora.LISTENER_SCAN3.lsnr' on 'rh6-112-rac2' succeeded CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.OCR.dg' on 'rh6-112-rac1' succeeded CRS-2673: Attempting to stop 'ora.asm' on 'rh6-112-rac1' CRS-2677: Stop of 'ora.asm' on 'rh6-112-rac1' succeeded CRS-2799: Failed to shut down resource 'tomcat' on 'rh6-112-rac1' CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'rh6-112-rac1' has failed CRS-5022: Stop of resource "ora.crsd" failed: current state is "UNKNOWN" CRS-2675: Stop of 'ora.crsd' on 'rh6-112-rac1' failed CRS-2799: Failed to shut down resource 'ora.crsd' on 'rh6-112-rac1' CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'rh6-112-rac1' has failed CRS-4687: Shutdown command has completed with errors. CRS-4000: Command Stop failed, or completed with errors. [root@rh6-112-rac1 ~]# Nodo 2 [root@rh6-112-rac2 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl stop crs CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.crsd' on 'rh6-112-rac2' CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.ACFS.dg' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.OCR.dg' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.registry.acfs' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.LISTENER_SCAN3.lsnr' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.rh6-112-rac1.vip' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'httpd-vip' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.LISTENER_SCAN2.lsnr' on 'rh6-112-rac2'

  • CRS-2677: Stop of 'ora.LISTENER_SCAN3.lsnr' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.scan3.vip' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.scan1.vip' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.LISTENER_SCAN2.lsnr' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.scan2.vip' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.rh6-112-rac1.vip' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.scan1.vip' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.scan3.vip' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.registry.acfs' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.DATA.dg' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.ACFS.dg' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.scan2.vip' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'httpd-vip' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.rh6-112-rac2.vip' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.rh6-112-rac2.vip' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.OCR.dg' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.asm' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.asm' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.ons' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.ons' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.net1.network' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.net1.network' on 'rh6-112-rac2' succeeded CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rh6-112-rac2' has completed CRS-2677: Stop of 'ora.crsd' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.ctssd' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.evmd' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.asm' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rh6-112-rac2' CRS-2673: Attempting to stop 'ora.mdnsd' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.evmd' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.mdnsd' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.ctssd' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.drivers.acfs' on 'rh6-112-rac2' succeeded CRS-2677: Stop of 'ora.asm' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.cssd' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.cssd' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.crf' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.crf' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.gipcd' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.gipcd' on 'rh6-112-rac2' succeeded CRS-2673: Attempting to stop 'ora.gpnpd' on 'rh6-112-rac2' CRS-2677: Stop of 'ora.gpnpd' on 'rh6-112-rac2' succeeded

  • CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rh6-112-rac2' has completed CRS-4133: Oracle High Availability Services has been stopped. [root@rh6-112-rac2 ~]#

    Configurar los servicios para que no suban al reiniciar el sistema

    [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl disable crs CRS-4621: Oracle High Availability Services autostart is disabled. [root@rh6-112-rac1 ~]# [root@rh6-112-rac2 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl disable crs CRS-4621: Oracle High Availability Services autostart is disabled. [root@rh6-112-rac2 ~]#

    Reiniciar los nodos

    Nodo 1 [root@rh6-112-rac1 ~]# reboot [root@rh6-112-rac1 ~]# Broadcast message from [email protected] (/dev/pts/0) at 17:27 ... The system is going down for reboot NOW! Nodo 2 [root@rh6-112-rac2 ~]# reboot Broadcast message from [email protected] (/dev/pts/0) at 17:27 ... The system is going down for reboot NOW!

    Intentar subir los servicios de clu ster en ambos nodos

    Nodo 1 [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@rh6-112-rac1 ~]# Nodo 2 [root@rh6-112-rac2 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl start crs

  • CRS-4123: Oracle High Availability Services has been started. [root@rh6-112-rac2 ~]# Logs del nodo 1 [root@rh6-112-rac1 crsd]# cd /oracle/app/grid/11.2.0.4/log/rh6-112-rac1/crsd [root@rh6-112-rac1 crsd]# tail -f crsd.log 2014-02-05 17:27:24.378: [ CSSCLNT][2129614608]clssgsGroupGetStatus: returning 8 2014-02-05 17:27:24.378: [ CRSEVT][2129614608] Error in clssgsgrpstat rc =8 2014-02-05 17:27:24.382: [ CSSCLNT][2124883728]clsssRecvMsg: got a disconnect from the server while waiting for message type 1 2014-02-05 17:27:24.382: [ CSSCLNT][2124883728]clssgsGroupGetStatus: communications failed (0/3/-1) 2014-02-05 17:27:24.382: [ CSSCLNT][2124883728]clssgsGroupGetStatus: returning 8 2014-02-05 17:27:24.382: [ CRSCCL][2124883728]Daemon exiting due to error in clssgsgrpstat retval = 8 Logs del nodo 2 [root@rh6-112-rac2 crsd]# tail -f crsd.log 2014-02-05 17:25:06.631: [ CRSD][702093072]{2:44209:353} Exiting on request of the Policy Engine... 2014-02-05 17:25:06.632: [ CRSD][702093072]{2:44209:353} Done. 2014-02-05 17:25:06.681: [ CRSCOMM][721004304] IpcL: connection to member 3 has been removed 2014-02-05 17:25:06.681: [CLSFRAME][721004304] Removing IPC Member:{Relative|Node:0|Process:3|Type:3} 2014-02-05 17:25:06.681: [CLSFRAME][721004304] Disconnected from AGENT process: {Relative|Node:0|Process:3|Type:3} 2014-02-05 17:25:06.682: [ AGFW][714700560]{2:44209:362} Agfw Proxy Server received process disconnected notification, count=1 2014-02-05 17:25:06.682: [ AGFW][714700560]{2:44209:362} /oracle/app/grid/11.2.0.4/bin/oraagent_grid disconnected. 2014-02-05 17:25:06.682: [ AGFW][714700560]{2:44209:362} Agent /oracle/app/grid/11.2.0.4/bin/oraagent_grid[3837] stopped! 2014-02-05 17:25:06.682: [ CRSCOMM][714700560]{2:44209:362} IpcL: removeConnection: Member 3 does not exist in pending connections.

    Detener los servicios y reiniciar los nodos

    Nodo 1 [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl stop crs CRS-2796: The command may not proceed when Cluster Ready Services is not running

  • CRS-4687: Shutdown command has completed with errors. CRS-4000: Command Stop failed, or completed with errors. [root@rh6-112-rac1 ~]# [root@rh6-112-rac1 ~]# reboot [root@rh6-112-rac1 ~]# Broadcast message from [email protected] (/dev/pts/0) at 17:36 ... The system is going down for reboot NOW! Nodo 2 [root@rh6-112-rac2 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl stop crs CRS-2796: The command may not proceed when Cluster Ready Services is not running CRS-4687: Shutdown command has completed with errors. CRS-4000: Command Stop failed, or completed with errors. [root@rh6-112-rac2 ~]# reboot [root@rh6-112-rac2 ~]# Broadcast message from [email protected] (/dev/pts/0) at 17:37 ... The system is going down for reboot NOW!

    Reconfigurar los discos que se dan aron

    Realizar el mismo procedimiento por cada uno de los discos que pertenecan al diskgroup OCR o los nuevos discos que asignaron. En este ejemplo, se debe hacer para los discos: /dev/sdb /dev/sdl /dev/sdj /dev/sdk [root@rh6-112-rac1 ~]# fdisk /dev/sdb Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel with disk identifier 0x9d683600. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) WARNING: DOS-compatible mode is deprecated. It's strongly recommended to switch off the mode (command 'c') and change display units to sectors (command 'u').

  • Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-261, default 1): Using default value 1 Last cylinder, +cylinders or +size{K,M,G} (1-261, default 261): Using default value 261 Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. [root@rh6-112-rac1 ~]# ls -lrt /dev/sdb* brw-rw---- 1 root disk 8, 16 Feb 5 19:28 /dev/sdb [root@rh6-112-rac1 ~]#

    Verificar que los discos son reconocidos por udev

    [root@rh6-112-rac1 ~]# ls -lrt /dev/asm* brw-rw---- 1 grid dba 8, 113 Feb 5 19:39 /dev/asm-disk7 brw-rw---- 1 grid dba 8, 65 Feb 5 19:39 /dev/asm-disk4 brw-rw---- 1 grid dba 8, 97 Feb 5 19:39 /dev/asm-disk6 brw-rw---- 1 grid dba 8, 81 Feb 5 19:39 /dev/asm-disk5 brw-rw---- 1 grid dba 8, 33 Feb 5 19:39 /dev/asm-disk2 brw-rw---- 1 grid dba 8, 49 Feb 5 19:39 /dev/asm-disk3 brw-rw---- 1 grid dba 8, 161 Feb 5 19:39 /dev/asm-disk9 brw-rw---- 1 grid dba 8, 145 Feb 5 19:39 /dev/asm-disk8 brw-rw---- 1 grid dba 8, 177 Feb 5 19:39 /dev/asm-disk10 brw-rw---- 1 grid dba 8, 17 Feb 5 19:39 /dev/asm-disk1

    Iniciar los servicios de clu ster en modo exclusive

    Iniciar los servicios de clster en modo exclusivo, la opcin -nocrs aplica a partir de Oracle 11.2.0.2. [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl start crs excl nocrs La opcin -nocrs es requerida en las versiones 11.2.0.2 para que no intente subir el servicio ora.crsd. Es muy importante que esta opcin sea especificada, porque al fallar el inicio del recurso ora.crsd terminar el recurso ora.cluster_interconnect.haip, el cual a su vez ocasionar la cada de ASM. [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl start crs -excl -nocrs CRS-4123: Oracle High Availability Services has been started.

  • CRS-2672: Attempting to start 'ora.mdnsd' on 'rh6-112-rac1' CRS-2676: Start of 'ora.mdnsd' on 'rh6-112-rac1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'rh6-112-rac1' CRS-2676: Start of 'ora.gpnpd' on 'rh6-112-rac1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rh6-112-rac1' CRS-2672: Attempting to start 'ora.gipcd' on 'rh6-112-rac1' CRS-2676: Start of 'ora.cssdmonitor' on 'rh6-112-rac1' succeeded CRS-2676: Start of 'ora.gipcd' on 'rh6-112-rac1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rh6-112-rac1' CRS-2672: Attempting to start 'ora.diskmon' on 'rh6-112-rac1' CRS-2676: Start of 'ora.diskmon' on 'rh6-112-rac1' succeeded CRS-2676: Start of 'ora.cssd' on 'rh6-112-rac1' succeeded CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rh6-112-rac1' CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rh6-112-rac1' CRS-2672: Attempting to start 'ora.ctssd' on 'rh6-112-rac1' CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rh6-112-rac1' succeeded CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rh6-112-rac1' CRS-2676: Start of 'ora.ctssd' on 'rh6-112-rac1' succeeded CRS-2676: Start of 'ora.drivers.acfs' on 'rh6-112-rac1' succeeded CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rh6-112-rac1' succeeded CRS-2679: Attempting to clean 'ora.asm' on 'rh6-112-rac1' CRS-2681: Clean of 'ora.asm' on 'rh6-112-rac1' succeeded CRS-2672: Attempting to start 'ora.asm' on 'rh6-112-rac1' CRS-2676: Start of 'ora.asm' on 'rh6-112-rac1' succeeded [root@rh6-112-rac1 ~]#

    Verificar si la instancia de ASM se encuentra arriba

    [root@rh6-112-rac1 ~]# su - grid [grid@rh6-112-rac1 ~]$ ps -fea |grep pmon grid 3290 1 0 19:23 ? 00:00:00 asm_pmon_+ASM1 grid 3704 3671 0 19:25 pts/0 00:00:00 grep pmon [grid@rh6-112-rac1 ~]$

    Conectarse a la instancia ASM y crear el diskgroup OCR

    [grid@rh6-112-rac1 ~]$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.4.0 Production on Wed Feb 5 19:26:06 2014 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options

  • SQL> create diskgroup OCR external redundancy disk '/dev/asm-disk1','/dev/asm-disk10','/dev/asm-disk8','/dev/asm-disk9' attribute 'compatible.asm'='11.2.0.0.0','compatible.rdbms'='11.2.0.0.0'; Diskgroup created.

    Crear un archivo de inicio a partir de un pfile organizado

    SQL> create spfile=+OCR from pfile='/home/grid/asm_pfile.ora'; File created. SQL> Contenido del archivo asm_pfile.ora *.asm_diskgroups='OCR' *.asm_diskstring='/dev/asm-disk*' *.asm_power_limit=1 *.diagnostic_dest='/oracle/grid' *.instance_type='asm' *.memory_max_target=1153433600 *.processes=300 *.remote_login_passwordfile='EXCLUSIVE'

    Bajar y subir la instancia ASM

    SQL> shutdown ASM diskgroups volume disabled ASM diskgroups dismounted ASM instance shutdown SQL> startup ASM instance started Total System Global Area 1152450560 bytes Fixed Size 2252584 bytes Variable Size 1116643544 bytes ASM Cache 33554432 bytes ASM diskgroups mounted ASM diskgroups volume enabled SQL> show parameter spfile NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ spfile string +OCR/rh6-112-scan/asmparameter file/registry.253.838757119

  • Restaurar la copia del diskgroup OCR

    Ejecutar la recuperacin de la copia de seguridad del OCR realizada anteriormente de forma manual. [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/ocrconfig -restore /oracle/app/grid/11.2.0.4/cdata/rh6-112-scan/backup_20140205_160823.ocr

    Restaurar el votedisk

    [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl replace votedisk +OCR Successful addition of voting disk 33756983cd414f93bfa663bb62e2fa5c. Successfully replaced voting disk group with +OCR. CRS-4266: Voting file(s) successfully replaced [root@rh6-112-rac1 ~]#

    Habilitar el inicio de crs cuando inicia el sistema operativo

    [root@rh6-112-rac1 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl enable crs CRS-4622: Oracle High Availability Services autostart is enabled. Reiniciar el sistema

    Iniciar el nodo 2

    Habilitar el inicio de crs cuando inicia el sistema operativo: [root@rh6-112-rac2 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl enable crs CRS-4622: Oracle High Availability Services autostart is enabled. Iniciar el crs [root@rh6-112-rac2 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl start crs CRS-4123: Oracle High Availability Services has been started. [root@rh6-112-rac2 ~]#

  • Verificar que los servicios subieron correctamente

    Nodo 1 [root@rh6-112-rac1 crsd]# /oracle/app/grid/11.2.0.4/bin/crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ACFS.dg ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 ora.DATA.dg ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 ora.LISTENER.lsnr ONLINE ONLINE rh6-112-rac1 OFFLINE OFFLINE rh6-112-rac2 ora.OCR.dg ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 ora.asm ONLINE ONLINE rh6-112-rac1 Started ONLINE ONLINE rh6-112-rac2 Started ora.gsd OFFLINE OFFLINE rh6-112-rac1 OFFLINE OFFLINE rh6-112-rac2 ora.net1.network ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 ora.ons ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 ora.registry.acfs ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- httpd-vip 1 OFFLINE OFFLINE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE rh6-112-rac2 ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE rh6-112-rac1 ora.LISTENER_SCAN3.lsnr

  • 1 ONLINE ONLINE rh6-112-rac1 ora.cvu 1 OFFLINE OFFLINE ora.oc4j 1 OFFLINE OFFLINE ora.rac.db 1 OFFLINE OFFLINE Instance Shutdown 2 OFFLINE OFFLINE Instance Shutdown ora.rac.ventas.svc 1 OFFLINE OFFLINE ora.rh6-112-rac1.vip 1 ONLINE ONLINE rh6-112-rac1 ora.rh6-112-rac2.vip 1 ONLINE ONLINE rh6-112-rac2 ora.scan1.vip 1 ONLINE ONLINE rh6-112-rac2 ora.scan2.vip 1 ONLINE ONLINE rh6-112-rac1 ora.scan3.vip 1 ONLINE ONLINE rh6-112-rac1 tomcat 1 ONLINE ONLINE rh6-112-rac1 [root@rh6-112-rac1 crsd]# Nodo 2 [root@rh6-112-rac2 ~]# /oracle/app/grid/11.2.0.4/bin/crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.ACFS.dg ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 ora.DATA.dg ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 ora.LISTENER.lsnr ONLINE ONLINE rh6-112-rac1 OFFLINE OFFLINE rh6-112-rac2 ora.OCR.dg ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 ora.asm ONLINE ONLINE rh6-112-rac1 Started ONLINE ONLINE rh6-112-rac2 Started

  • ora.gsd OFFLINE OFFLINE rh6-112-rac1 OFFLINE OFFLINE rh6-112-rac2 ora.net1.network ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 ora.ons ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 ora.registry.acfs ONLINE ONLINE rh6-112-rac1 ONLINE ONLINE rh6-112-rac2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- httpd-vip 1 OFFLINE OFFLINE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE rh6-112-rac2 ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE rh6-112-rac1 ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE rh6-112-rac1 ora.cvu 1 OFFLINE OFFLINE ora.oc4j 1 OFFLINE OFFLINE ora.rac.db 1 OFFLINE OFFLINE Instance Shutdown 2 OFFLINE OFFLINE Instance Shutdown ora.rac.ventas.svc 1 OFFLINE OFFLINE ora.rh6-112-rac1.vip 1 ONLINE ONLINE rh6-112-rac1 ora.rh6-112-rac2.vip 1 ONLINE ONLINE rh6-112-rac2 ora.scan1.vip 1 ONLINE ONLINE rh6-112-rac2 ora.scan2.vip 1 ONLINE ONLINE rh6-112-rac1 ora.scan3.vip 1 ONLINE ONLINE rh6-112-rac1 tomcat 1 ONLINE ONLINE rh6-112-rac1 [root@rh6-112-rac2 ~]#

  • Recomendaciones

    En las polticas de copias de seguridad diaria, incluir la copia del OCR y VoteDisk

    Cada vez que se adicione un nuevo diskgroup al sistema crear una copia del archivo de inicio de ASM

    Tener la documentacin de los discos asignados y cmo fueron reconocidos por el sistema operativo

    Conocer muy bien su sistema