oracle rac training | oracle rac training videos | oracle rac dba training
RAC Troubleshooting and Diagnosability Sangam2016
-
Upload
sandesh-rao -
Category
Documents
-
view
269 -
download
14
Transcript of RAC Troubleshooting and Diagnosability Sangam2016
![Page 1: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/1.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
Troubleshooting and Diagnosing Oracle RAC in the Private CloudSandeshRao,SeniorDirector,RACDevelopment
![Page 2: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/2.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirection.Itisintendedforinformationpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfunctionality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andtimingofanyfeaturesorfunctionalitydescribedforOracle’sproductsremainsatthesolediscretionofOracle.
Confidential– OracleRestricted 2
![Page 3: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/3.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
Agenda
• ArchitecturalOverview• TroubleshootingScenarios• ProactiveandReactivetools• Q&A
![Page 4: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/4.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureOverview
• OracleClusterwareisrequiredfor11gR2+RACdatabases• OracleClusterwarecanmanagenonRACdatabaseresourcesusingagents• OracleClusterwarecanmanageHAforanyBusinessCriticalApplicationwithagentinfrastructurealsocalledXAG–OraclepublishesBundledAgentsforsomenonRACDBresources
• SAP,GoldenGate,Siebel,Apache..
OracleGridInfrastructure
![Page 5: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/5.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureOverview
• GridInfrastructureisthenameforthecombinationof:-–OracleClusterReadyServices(CRS)–OracleAutomaticStorageManagement(ASM)
• TheGridHomecontainsthesoftwareforbothproducts• CRScanalsobeStandaloneforASMand/orOracleRestart• CRScanrunbyitselforincombinationwithothervendorclusterware• GridHomeandRDBMShomemustbeinstalledindifferentlocations
– TheinstallerlockstheGridHomepathbysettingrootpermissions.
OracleGridInfrastructure
![Page 6: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/6.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureOverview
• CRSrequiressharedOracleClusterRegistry(OCR)andVotingfiles–MustbeinASMorCFS–OCRbackedupevery4hoursautomaticallyGIHOME/cdata– Kept4,8,12hours,1day,1week– Restoredwithocrconfig– VotingfilebackedupintoOCRateachchange.– Votingfilerestoredwithcrsctl
OracleGridInfrastructure
![Page 7: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/7.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureOverview
• FornetworkCRSrequires–One/multiplehighspeed,lowlatency,redundantprivatenetworkforinternodecommunications
– Thinkofinterconnectasamemorybackplaneforthecluster– Shouldbeaseparatephysicalnetwork ormanagedconvergednetwork– VLANSaresupported– Usedfor:-
• Clusterwaremessaging• RDBMSmessagingandblocktransfer• ASMmessaging• HANFSforblocktraffic
OracleGridInfrastructure
![Page 8: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/8.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureOverview
• OnlyonesetofClusterwaredaemonscanrunoneachnode• TheCRSstackisspawnedfromOracleHAServicesDaemon(ohasd)• OnUnixohasd runsoutofinittab withrespawn• Anodecanbeevictedwhendeemedunhealthy
–MayrequirerebootbutatleastCRSstackrestart(rebootless restart)– IPMIintegrationordiskmon incaseofExadata
• CRSprovidesClusterTimeSynchronizationservices– Alwaysrunsbutinobservermodeifntpd configured
OracleGridInfrastructure
![Page 9: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/9.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses11.2+Agentschangeeverything.• Multi-threadedDaemons• Managemultipleresourcesandtypes• Implementsentrypointsformultipleresourcetypes
– Start,stop check,clean,fail
• oraagent,orarootagent,applicationagent,scriptagent,cssdagent• SingleprocessstartedfrominitonUnix(ohasd)• Diagrambelowshowsallcoreresources
![Page 10: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/10.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses
Level1
Level2a
Level2b
Level3
Level4a
Level4b
Level0
![Page 11: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/11.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcessesInitScripts• /etc/init.d/ohasd (locationO/Sdependent)
– RCscriptwith“start”and“stop”actions– InitiatesOracleClusterware autostart– ControlfilecoordinateswithCRSCTL
• /etc/init.d/init.ohasd (locationO/Sdependent)–OHASDFrameworkScriptrunsfrominit/upstart– ControlfilecoordinateswithCRSCTL– NamedpipesyncswithOHASD
![Page 12: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/12.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses
• Level1:OHASDSpawns:– cssdagent - AgentresponsibleforspawningCSSD– orarootagent- Agentresponsibleformanagingallrootownedohasd resources– oraagent - Agentresponsibleformanagingalloracleownedohasd resources– cssdmonitor - MonitorsCSSDandnodehealth(alongwiththecssdagent)
• Level2a:OHASDrootagent spawns:– CRSD- Primarydaemonresponsibleformanagingclusterresources.– CTSSD- ClusterTimeSynchronizationServicesDaemon– Diskmon (Exadata)– ACFS(ASMClusterFileSystem)Drivers
![Page 13: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/13.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses
• Level2b:OHASDoraagent spawns:–MDNSD– MulticastDNSdaemon– GIPCD– GridIPCDaemon– GPNPD– GridPlugandPlayDaemon– EVMD– EventMonitorDaemon– ASM– ASMinstancestartedhereasmayberequiredbyCRSD
• Level3:CRSDspawns:– orarootagent - Agentresponsibleformanagingallrootownedcrsd resources.– oraagent - Agentresponsibleformanagingallnonroot ownedcrsd resources.
• OneisspawnedforeveryuserthathasCRSresourcestomanage.
![Page 14: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/14.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
GridInfrastructureProcesses
• Level4:CRSDoraagent spawns:– ASMResouce - ASMInstance(s)resource(proxyresource)– Diskgroup- Usedformanaging/monitoringASMdiskgroups.– DBResource- UsedformonitoringandmanagingtheDBandinstances– SCANListener- Listenerforsingleclientaccessname,listeningonSCANVIP– Listener- NodelistenerlisteningontheNodeVIP– Services- Usedformonitoringandmanagingservices– ONS- OracleNotificationService– eONS - EnhancedOracleNotificationService(pre11.2.0.2)– GSD- For9ibackwardcompatibility– GNS(optional)- GridNamingService- Performsnameresolution
Startup Sequence
![Page 15: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/15.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
TroubleshootingScenariosClusterStartup ProblemTriage(11.2+)
StartupSequence
ps –ef|grep init.ohasdps –ef|grep ohasd.bin Running?
YES
NO crsctl config crsohasd.log Obvious?
NO EngageOracleSupportEngageSysadminTeam
ClusterStartupDiagnosticFlow
TFACollector
ps –ef|grep cssdagentps –ef|grep ocssd.binps –ef|grep orarootagentps –ef|grep ctssd.binps –ef|grep crsd.binps –ef|grep cssdmonitorps –ef|grep oraagentps –ef|grep ora.asmps –ef|grep gpnpd.binps –ef|grep mdnsd.binps –ef|grep evmd.binCrsctl checkcrsCrsctl checkcluster
Running?
YES
NO
YES
EngageSysadminTeam
ohasd.logagentlogsprocesslogs
Obvious?YES
NO
EngageSysadminTeam
EngageOracleSupportSysadminTeam
TFACollectorohasd.logOLRperms
Comparereferencesystem
Obvious?YESNO
TFACollector EngageSysadminTeam
EngageOracleSupportSysadminTeam
![Page 16: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/16.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• MulticastDomainNameServiceDaemon(mDNS(d))– UsedbyGridPlugandPlaytolocateprofilesinthecluster,aswellasbyGNStoperformnameresolution.ThemDNS processisabackgroundprocessonLinuxandUNIXandonWindows.
– Usesmulticastforcacheupdatesonserviceadvertisementarrival/departure.– Advertises/servesonallfoundnodeinterfaces.– LogisGI_HOME/log/<node>/mdnsd/mdnsd.log
Troubleshooting ScenariosCluster Startup Problem Triage
![Page 17: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/17.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
<?xmlversion="1.0"encoding="UTF-8"?>
<gpnp:GPnP-ProfileVersion="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile"xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile"xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profilegpnp-profile.xsd"ProfileSequence="6" ClusterUId="b1eec1fcdd355f2bbf7910ce9cc4a228"ClusterName="staij-cluster"PALocation="">
<gpnp:Network-Profile><gpnp:HostNetwork id="gen"HostName="*"><gpnp:Network id="net1"IP=”192.168.1.0"Adapter="eth0"Use="public"/><gpnp:Network id="net2"IP=”192.168.2.0"Adapter="eth1“Use="cluster_interconnect"/></gpnp:HostNetworkcss"></gpnp:Network-Profile>
<orcl:CSS-Profileid="DiscoveryString="+asm"LeaseDuration="400"/>
<orcl:ASM-Profileid="asm"DiscoveryString=""SPFile="+SYSTEM/staij-cluster/asmparameterfile/registry.253.693925293"/>
<ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethodAlgorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><ds:SignatureMethodAlgorithm="http://www.w3.org/2001/10/xml-exc-c14n#"><InclusiveNamespaces xmlns="http://www.w3.org/2001/10/xml-exc-c14n#"PrefixList="gpnp orclxsi"/></ds:Transform></ds:Transforms><ds:DigestMethodAlgorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>x1H9LWjyNyMn6BsOykHhMvxnP8U=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>N+20jG4=</ds:SignatureValue></ds:Signature>
</gpnp:GPnP-Profile>
Troubleshooting ScenariosCluster Startup Problem Triage
![Page 18: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/18.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• cssdagentandmonitor– Samefunctionalityinbothagentandmonitor– Functionalityofseveralpre-11.2daemonsconsolidatedinboth
• OPROCD– systemhang• OMON– oracleclusterwaremonitor• VMON– vendorclusterwaremonitor
– Runrealtime withlockeddownmemory,likeCSSD– Providesenhancedstabilityanddiagnosability– Logsare
• GI_HOME/log/<node>/agent/oracssdagent_root/oracssdagent_root.log• GI_HOME/log/<node>/agent/oracssdmonitor_root/oracssdmonitor_root.log• 12.1– ORACLE_BASE/diag/node/agent/..
Troubleshooting ScenariosCluster Startup Problem Triage
![Page 19: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/19.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
NodeEvictions
EvictionScenario
Clusteralertocssd.log
NHB?1050693.11534949.11546004.1
Engagenetworking teamYES
NO
DHB?1549428.11466639.1 YES
NO
Obvious?
NO
YES
Obvious?
NO
YES
Fenced?Resourcestarvation
YES
NO
NOYES
NodeEvictionDiagnosticFlow
Troubleshooting Scenarios
ResourceStarvation?
NO
EngageOracleSupport
Engagesysadminteam
Engagestorageteam
1531223.11328466.1Systemlog
YES
Engageappropriate
team
Resolved?NO
YES
Freememory?CPUload?
NodeResponse?
TFACollector
TFACollector
![Page 20: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/20.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(1)• ocssd.logfromnode1
• ===>sendingnetworkheartbeatsothernodes.Normally,thismessageisoutputonceevery5messages(seconds)
• 2016-08-1317:00:20.023:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:20.023:[CSSD][4096109472]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>Thenetworkheartbeatisnotreceivedfromnode2(drrac2)for15consecutiveseconds.
• ===>Thismeansthat15networkheartbeatsaremissingandisthefirstwarning(50%threshold).
• 2016-08-1317:00:22.818:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)at50%heartbeatfatal,removalin14.520seconds
• 2016-08-1317:00:22.818:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)isimpendingreconfig,flag132108,misstime 15480
• ===>continuingtosendthenetworkheartbeatsandlogmessagesonceevery5messages
• 2016-08-1317:00:25.023:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:25.023:[CSSD][4096109472]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>75%thresholdofmissingnetworkheartbeatisreached.Thisissecondwarning.
• 2016-08-1317:00:29.833:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)at75%heartbeatfatal,removalin7.500seconds
![Page 21: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/21.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(2)• ===>continuingtosendthenetworkheartbeatsandlogmessagesonceevery5messages
• 2016-08-1317:00:30.023:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:30.023:[CSSD][4096109472]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>continuingtosendthenetworkheartbeats,butthemessageisloggedafter4messages
• 2016-08-1317:00:34.021:[CSSD][4096109472]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:34.021:[CSSD][4096109472]clssnmSendingThread:sent4statusmsgs toallnodes
• ===>Lastwarningshowsthat90%thresholdofthemissingnetworkheartbeatisreached.
• ===>Theevictionwilloccurin2.49seconds.
• 2016-08-1317:00:34.841:[CSSD][4106599328]clssnmPollingThread:nodedrrac2(2)at90%heartbeatfatal,removalin2.490seconds,seedhbimpd 1
• ===>Evictionofnode2(drrac2)started
• 2016-08-1317:00:37.337:[CSSD][4106599328]clssnmPollingThread:Removalstartedfornodedrrac2(2),flags0x2040c,state3,wt4c0
• ===>Thisshowsthatthenode2isactivelyupdatingthevotingdisks
• 2016-08-1317:00:37.340:[CSSD][4085619616]clssnmCheckSplit:Node2,drrac2,isalive,DHB(1281744040,1396854)morethandisktimeoutof27000afterthelastNHB(1281744011,1367154)
![Page 22: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/22.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(3)• ===>Evictingnode2(drrac2)
• 2016-08-1317:00:37.340:[CSSD][4085619616](:CSSNM00007:)clssnmrEvict:Evictingnode2,drrac2,fromtheclusterinincarnation169934272,nodebirthincarnation169934271,deathincarnation169934272,stateflags 0x24000
• ===>Reconfiguredtheclusterwithoutnode2
• 2016-08-1317:01:07.705:[CSSD][4043389856]clssgmCMReconfig:reconfigurationsuccessful,incarnation169934272with1nodes,localnodenumber1,masternodenumber1
![Page 23: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/23.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(4)• ocssd.logfromnode2:• ===>Loggingthemessagetoindicate5networkheartbeatsaresenttoothernodes
• 2016-08-1317:00:26.009:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:26.009:[CSSD][4062550944]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>Firstwarningofreaching50%thresholdofmissingnetworkheartbeats
• 2016-08-1317:00:26.213:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)at50%heartbeatfatal,removalin14.540seconds
• 2016-08-1317:00:26.213:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)isimpendingreconfig,flag394254,misstime 15460
• ===>Loggingthemessagetoindicate5networkheartbeatsaresenttoothernodes
• 2016-08-1317:00:31.009:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:31.009:[CSSD][4062550944]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>Secondwarningofreaching75%thresholdofmissingnetworkheartbeats
• 2016-08-1317:00:33.227:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)at75%heartbeatfatal,removalin7.470seconds
![Page 24: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/24.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(5)• ===>Loggingthemessagetoindicate4networkheartbeatsaresent
• 2016-08-1317:00:35.009:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:35.009:[CSSD][4062550944]clssnmSendingThread:sent4statusmsgs toallnodes
• ===>Thirdwarningofreaching90%thresholdofmissingnetworkheartbeats
• 2016-08-1317:00:38.236:[CSSD][4073040800]clssnmPollingThread:nodedrrac1(1)at90%heartbeatfatal,removalin2.460seconds,seedhbimpd 1
• ===>Loggingthemessagetoindicate5networkheartbeatsaresenttoothernodes
• 2016-08-1317:00:40.008:[CSSD][4062550944]clssnmSendingThread:sendingstatusmsg toallnodes
• 2016-08-1317:00:40.009:[CSSD][4062550944]clssnmSendingThread:sent5statusmsgs toallnodes
• ===>Evictionstartedfornode1(drrac1)
• 2016-08-1317:00:40.702:[CSSD][4073040800]clssnmPollingThread:Removalstartedfornodedrrac1(1),flags0x6040e,state3,wt4c0
• ===>Node1isactivelyupdatingthevotingdisk,sothisisasplitbraincondition
• 2016-08-1317:00:40.706:[CSSD][4052061088]clssnmCheckSplit:Node1,drrac1,isalive,DHB(1281744036,1243744)morethandisktimeoutof27000afterthelastNHB(1281744007,1214144)
• 2016-08-1317:00:40.706:[CSSD][4052061088]clssnmCheckDskInfo:Mycohort:2
• 2016-08-1317:00:40.707:[CSSD][4052061088]clssnmCheckDskInfo:Survivingcohort:1
![Page 25: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/25.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(6)• ===>Node2isabortingitselftoresolvethesplitbrainandensuretheclusterintegrity
• 2016-08-1317:00:40.707:[CSSD][4052061088](:CSSNM00008:)clssnmCheckDskInfo:Abortinglocalnodetoavoidsplitbrain.Cohortof1nodeswithleader2,drrac2,issmallerthancohortof1nodesledbynode1,drrac1,basedonmaptype2
• 2016-08-1317:00:40.707:[CSSD][4052061088]###################################
• 2016-08-1317:00:40.707:[CSSD][4052061088]clssscExit:CSSDaborting fromthreadclssnmRcfgMgrThread
• 2016-08-1317:00:40.707:[CSSD][4052061088]###################################
![Page 26: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/26.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
MissingNetworkHeartbeat(7)• Observations1. Bothnodesreportedmissingheartbeatsatthesametime2. Bothnodessentheartbeatstoothernodesallthetime3. Node2aborteditselftoresolvesplitbrain
• Conclusion1. Thisislikelyanetworkproblem,engagenetworkteam2. CheckOSWatcher output(netstat andtraceroute)
1. Configureprivate.net file,notconfiguredbydefault
3. CheckCHMOS4. Checksystemlog
![Page 27: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/27.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
VotingDiskAccessProblem(1)
ocssd.log:
===>Thefirsterrorindicatingthatitcouldnotreadvotingdisk-- firstmessagetoindicateaproblemaccessingthevotingdisk
2016-08-1318:31:19.787:[SKGFD][4131736480]ERROR:-9(Error27072,OSError(LinuxError:5:Input/outputerror
Additionalinformation:4
Additionalinformation:721425
Additionalinformation:-1)
)
2016-08-1318:31:19.787:[CSSD][4131736480](:CSSNM00060:)clssnmvReadBlocks:readfailedatoffset529of/dev/sdb8
2016-08-1318:31:19.802:[CSSD][4131736480]clssnmvDiskAvailabilityChange:votingfile/dev/sdb8nowoffline
![Page 28: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/28.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
VotingDiskAccessProblem(2)====>Theerrormessagethatshowsaproblemaccessingthevotingdiskrepeatsonceevery4seconds
2016-08-1318:31:23.782:[CSSD][150477728]clssnmvDiskOpen:Opening/dev/sdb8
2016-08-1318:31:23.782:[SKGFD][150477728]Handle0xf43fc6c8fromlib:UFS::fordisk:/dev/sdb8:
2016-08-1318:31:23.782:[CLSF][150477728]Openedhdl:0xf4365708fordev:/dev/sdb8:
2016-08-1318:31:23.787:[SKGFD][150477728]ERROR:-9(Error27072,OSError(LinuxError:5:Input/outputerror
Additionalinformation:4
Additionalinformation:720913
Additionalinformation:-1)
)
2016-08-1318:31:23.787:[CSSD][150477728](:CSSNM00060:)clssnmvReadBlocks:readfailedatoffset17of/dev/sdb8
![Page 29: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/29.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
VotingDiskAccessProblem(3)
====>Thelasterrorthatshowsaproblemaccessingthevotingdisk.
====>Notethatthelastmessageis200secondsafterthefirstmessage
====>becausethelongdisktimeout is200seconds
2016-08-1318:34:37.423:[CSSD][150477728]clssnmvDiskOpen:Opening/dev/sdb8
2016-08-1318:34:37.423:[CLSF][150477728]Openedhdl:0xf4336530fordev:/dev/sdb8:
2016-08-1318:34:37.429:[SKGFD][150477728]ERROR:-9(Error27072,OSError(LinuxError:5:Input/outputerror
Additionalinformation:4
Additionalinformation:720913
Additionalinformation:-1)
)
2016-08-1318:34:37.429:[CSSD][150477728](:CSSNM00060:)clssnmvReadBlocks:readfailedatoffset17of/dev/sdb8
![Page 30: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/30.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
VotingDiskAccessProblem(4)====>Thismessageshowsthatocssd.bintriedaccessingthevotingdiskfor200seconds
2016-08-1318:34:38.205:[CSSD][4110736288](:CSSNM00058:)clssnmvDiskCheck:NoI/Ocompletionsfor200880msforvotingfile/dev/sdb8)
====>ocssd.binabortsitselfwithanerrormessagethatthemajorityofvotingdisksarenotavailable.Inthiscase,therewasonlyonevotingdisk,butifthreevotingdiskswereavailable,aslongastwovotingdisksareaccessible,ocssd.binwillnotabort.
2016-08-1318:34:38.206:[CSSD][4110736288](:CSSNM00018:)clssnmvDiskCheck:Aborting,0of1configuredvotingdisksavailable,need1
2016-08-1318:34:38.206:[CSSD][4110736288]###################################
2016-08-1318:34:38.206:[CSSD][4110736288]clssscExit:CSSDabortingfromthreadclssnmvDiskPingMonitorThread
2016-08-1318:34:38.206:[CSSD][4110736288]###################################
• ConclusionThevotingdiskwasnotavailable,engagestorageteam
![Page 31: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/31.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• Timesynchronisationissue• ClusterTimeSynchronisationServicesdaemon
– ProvidestimemanagementinaclusterforOracle.• ObservermodewhenVendortimesynchronisations/wisfound
– LogstimedifferencetotheCRSalertlog• ActivemodewhennoVendortimesyncs/wisfound
Node Eviction TriageTroubleshooting Scenarios
![Page 32: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/32.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• ClusterReadyServicesDaemon– TheCRSDdaemonisprimarilyresponsibleformaintainingtheavailabilityofapplicationresources,suchasdatabaseinstances.CRSDisresponsibleforstartingandstoppingtheseresources,relocatingthemwhenrequiredtoanothernodeintheeventoffailure,andmaintainingtheresourceprofilesintheOCR(OracleClusterRegistry).Inaddition,CRSDisresponsibleforoverseeingthecachingoftheOCRforfasteraccess,andalsobackinguptheOCR.
– LogfileisGI_HOME/log/<node>/crsd/crsd.log• Rotationpolicy10-50M• Retentionpolicy10logs• Dynamicin12.1andcanbechanged
Node Eviction TriageTroubleshooting Scenarios
![Page 33: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/33.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• CRSDoraagent– CRSD’soraagent manages
• alldatabase,instance,serviceanddiskgroupresources• nodelisteners• SCANlisteners,andONS
– IftheGridInfrastructureownerisdifferentfromtheRDBMShomeownerthenyouwouldhave2oraagents eachrunningasoneoftheinstallationowners.Thedatabase,andserviceresourceswouldbemanagedbytheRDBMShomeownerandotherresourcesbytheGridInfrastructurehomeowner.
– Logfileis• GI_HOME/log/<node>/agent/crsd/oraagent_<user>/oraagent_<user>.log
Node Eviction TriageTroubleshooting Scenarios
![Page 34: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/34.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• CRSDorarootagent– CRSD’srootagent manages
• GNSandit’sVIP• NodeVIP• SCANVIP• networkresources.
– Logfileis• GI_HOME/log/<node>/agent/crsd/orarootagent_root/oraagent_root.log
Node Eviction TriageTroubleshooting Scenarios
![Page 35: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/35.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• Agentreturncodes– Checkentrymustreturnoneofthefollowingreturncodes:
• ONLINE• UNPLANNED_OFFLINE
– Target=online,mayberecoveredfailedover• PLANNED_OFFLINE• UNKNOWN
– Cannotdetermine,ifpreviouslyonline,partialthenmonitor• PARTIAL
– Someofaresourcesservicesareavailable.Instanceupbutnotopen.• FAILED
– Requirescleanaction
Node Eviction TriageTroubleshooting Scenarios
![Page 36: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/36.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
§ Importantlogsandtraces§ 11.2– DatabasesonlyuseADR
• GridInfrastructurefilesin$GI_HOME/log/<node_name>/<component_name>– $GI_HOME/log/myHost/cssd– $GI_HOME/log/myHost/alertmyHost.log
§ 12.1– GridInfrastructureandDatabaseuseADR§ DifferentlocationsforGridInfrastructureandDatabases§ GridInfrastructure
• Alert.log,cssd.log,csrd.log,etc
§ Databases§ Alert.log,backgroundprocesstraces,foregroundprocesstraces
Automatic Diagnostic Repository (ADR)Troubleshooting Scenarios
![Page 37: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/37.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• Whatifissuesweredetectedbeforetheyhadanimpact?
• Whatifyouwerenotifiedwithaspecificdiagnosisandcorrectiveactions?
• WhatifresourcebottlenecksthreateningSLAswereidentifiedearly?
• Whatifbottleneckscouldbeautomaticallyrelievedjustintime?
• Whatifdatabasehangsandnoderebootscouldbeeliminated?
37
Oracle’sDatabaseandClusterwareTools
Cluster Verification
Utility
ORAchkCluster Health
Monitor
Trace File Analyzer
Quality of Service
Management
Hang Manager
EXAchk
Cluster Health
Advisor
Memory Guard
Confidential– OracleRestricted
![Page 38: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/38.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
–Automatedriskidentificationandproactivenotificationbeforebusinessisimpacted
–HealthChecksbasedonmostimpactfulreoccurringproblemsacrossOraclecustomerbase
–Runsinyourenvironment– noneedtosendanythingtoOracle
–ScheduledemailHealthCheckreports
–Findingscanbeintegratedintoothertoolsofchoice
Oracle EXAchk/Orachk (Proactive)
EngineeredSystems
NonEngineeredSystems
OracleEXAchk
OracleORAchk
CommonFramework
Lightweight&nonintrusiveOracleStackHealthChecks
38
![Page 39: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/39.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
1. IncludedinbaseimageandlatestOEDA
2. DownloadlatestversionfromMyOracleSupport(install<1min)1070954.1
3. Autoupdatewhenlaterversionavailable
RollOut&MaintainEXAchk
39
![Page 40: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/40.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
1. Downloadtheorachk.ziptoyourlocalmachinefromMOSNote1268927.2
2. TransfertoadirectoryonthetargetSystem
3. Unziporachk.zipo Asowneroforacle
databaseorgridhome
Installation
40
![Page 41: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/41.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• Profilesprovidelogicalgroupingofchecks whichareaboutsimilartopics
• Runonlychecksinaspecificprofile
• Runeverythingexceptchecksinaspecificprofile
Profiles
./exachk –profile <profile>
./exachk –excludeprofile <profile>
Profile Descriptionasm ASMChecksavdf AuditVaultConfigurationchecks
clusterware Oracleclusterwarecheckscontrol_VM ChecksonlyforControlVM(ec1-vm,ovmm,db,pc1,pc2).
Nocrossnodecheckscorroborate Exadatachecksneedsfurtherreviewbyusertodetermine
passorfaildba DBAChecksebs OracleE-BusinessSuitechecks
eci_healthchecks EnterpriseCloudInfrastructureHealthchecksecs_healthchecks EnterpriseCloudSystemHealthchecks
goldengate OracleGoldenGatecheckshardware HardwarespecificchecksforOracleEngineeredsystems
maa MaximumAvailabilityArchitectureChecksovn OracleVirtualNetworking
platinum Platinumcertificationcheckspreinstall Pre-installationchecksprepatch Checkstoexecutebeforepatchingsecurity Securitychecks
solaris_cluster SolarisClusterChecksstorage OracleStorageServerChecksswitch Infinibandswitchchecks
sysadmin Sysadminchecksuser_defined_checks Runuserdefinedchecksfromuser_defined_checks.xml
41
![Page 42: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/42.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• Profilesprovidelogicalgroupingofchecks whichareaboutsimilartopics
• Runonlychecksinaspecificprofile
• Runeverythingexceptchecksinaspecificprofile
Profiles
./orachk –profile <profile>
./orachk –excludeprofile <profile>
Profile Descriptionasm ASMChecks
bi_middleware OracleBusinessIntelligencechecksclusterware Oracleclusterware checks
dba DBAChecksebs OracleE-BusinessSuitechecks
emagent Cloudcontrolagentchecksemoms CloudControlmanagementserverem Cloudcontrolchecks
goldengate OracleGoldenGate checkshardware HardwarespecificchecksforOracleEngineeredsystems
oam OracleAccessManagerchecksoim OracleIdentifyManagerchecksoud OracleUnifiedDirectoryserverchecksovn OracleVirtualNetworking
peoplesoft Peoplesoft bestpracticespreinstall Pre-installationchecksprepatch Checkstoexecutebeforepatchingsecurity Securitycheckssiebel SiebelChecks
solaris_cluster SolarisClusterChecksstorage OracleStorageServerChecksswitch Infiniband switchchecks
sysadmin Sysadmin checksuser_defined_checks Runuserdefinedchecksfromuser_defined_checks.xml
42
![Page 43: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/43.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
EnterpriseManagerIntegration
•CheckresultsintegratedintoEMcomplianceframeworkviaplugin
•ViewresultsinnativeEMcompliancedashboards
•Relatedchecksgroupedintocompliancestandards
•Viewtargetschecked,violations&averagescore
•Drilldownintocompliancestandardtoseeindividualcheckresults
•Viewbreakdownbytarget
43
![Page 44: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/44.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• IntegrationisviatheEnterpriseManagerORAchkHealthchecksplugin withthefollowingSupport:
• Thefollowingprerequisitesmustbemetbeforeyoucandeploytheplug-in:o VerifythatyourEngineeredSystemshardwareandsoftwareareatthesupportedlevelasdescribedin SupportedHardwareandSoftwareVersions
o AllEngineeredSystemplug-insshouldbedeployed
o InfiniBand switchesandstoragecellsshouldbeanEnterpriseManager-managedtargetfortherespectiveengineeredsystem
o Expectpackageshouldbeinstalledonthehosts
EnterpriseManagerPluginPrerequisites
HardwareTypes SupportedByPlugin
Exadata(physicalconfigurationonly) YesExadata(virtualconfiguration) NoRecoveryappliance YesExalogic(physicalconfiguration) YesExalogic(virtualizedconfiguration) YesOracleSuperCluster NoOraclePrivateCloudMachine No
44
![Page 45: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/45.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
JSONOutputtoIntegratewithKibana,ElasticSearchetc
45
![Page 46: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/46.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
OracleHealthCheckCollectionManagerDashboard
46
![Page 47: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/47.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
OracleStackCoverage• EngineeredSystems
• OracleExadataDatabaseMachine• OracleSuperCluster• OraclePrivateCloudAppliance• OracleDatabaseAppliance• OracleBigDataAppliance• OracleExalogicElasticCloud• OracleExalyticsIn-MemoryMachine• OracleZeroDataLossRecoveryAppliance• OracleZFSStorageAppliance
• ASR
• Systems• OracleSolaris• Crossstackchecks• SolarisCluster• OVN
• Oracle Database• StandaloneDatabase• GridInfrastructure&RAC• Maximum AvailabilityArchitecture(MAA)
Scorecard• Upgrade ReadinessValidation• Golden Gate
• EnterpriseManagerCloudControl• Repository• Agent• OMS
• Middleware• ApplicationContinuity• OracleIdentifyandAccessManagement
Suite(OracleIAM)
• E-BusinessSuite• OraclePayables• OracleWorkflow• OraclePurchasing• OracleOrderManagement• OracleProcessManufacturing• OracleReceivables• OracleFixedAssets• OracleHCM• OracleCRM• OracleProjectBilling
• Siebel• Databasebestpractices
• PeopleSoft• Databasebestpractices
• SAP
• EXAdatabestpractices
47
![Page 48: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/48.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 48
GeneratesDiagnosticMetricsViewofClusterandDatabasesClusterHealthMonitor(CHM)
GIMR
ologgerd(master)
osysmond
osysmond
osysmond
osysmond
12cGridInfrastructureManagementRepository
• Alwayson- Enabledbydefault• ProvidesDetailed OSResourceMetrics• AssistsNodeevictionanalysis• Locallylogsallprocessdata• Usercandefinepinnedprocesses• ListenstoCSSandGIPCevents• Categorizesprocessesbytype• Supportsplug-incollectors(ex.traceroute,netstat,ping,etc.)
• NewCSVoutputforeaseofanalysis
OSData OSData
OSData
OSData
Confidential– OracleInternal/Restricted/HighlyRestrictedConfidential– OracleRestricted
![Page 49: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/49.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 49
Oclumon CLIorFullIntegrationwithEMCloudControlClusterHealthMonitor(CHM)
Confidential– OracleInternal/Restricted/HighlyRestrictedConfidential– OracleRestricted
![Page 50: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/50.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
WhyTFA?(Proactiveandreactive)
Providesoneinterfaceforalldiagnosticneeds
Collectsdataacrosstheclusterandconsolidatesitinoneplace
Collectsallrelevantdiagnosticdataatthetimeoftheproblem,withonlywhatisneededtodiagnosetheproblem
Reducestimerequiredtoobtaindiagnosticdata,whichsavesyourbusinessmoney
50OracleConfidential– Internal
![Page 51: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/51.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• AllmajorOperatingSystems aresupported– Linux(OEL,RedHat,SUSE,Itanium&zLinux)
–OracleSolaris(SPARC&x86-64)– AIX– HPUX(Itanium&PA-RISC)–Windows
• AllOracleDatabase&Gridversions10.2+aresupported
• YouprobablyalreadyhaveTFAinstalledasitisincludedwith:–OracleGridInfrastructure:
• 11.2.0.4+• 12.1.0.2+• 12.2.0.1+
–OracleDatabase:• 12.2.0.1+
• AlsoavailablefromDoc1513912.251
SupportedPlatformsandVersions
![Page 52: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/52.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
MonitoringByTFA&AutomatedCollections
52
Automaticallydetectevent
Collect&packagerelevant
diagnostics
NotifyrelevantDBAandorSysAdminby
UploadcollectiontoOracleSupportforfurtherhelp
Significantproblemoccurs
1
2
3
4
TFADBA(s)/SysAdmin(s)
OracleGridInfrastructure&Database(s)
![Page 53: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/53.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• Trim&collectallimportantlogfilesupdatedinthepast12hours:
• CollectaproblemspecificServiceRequestDataCollection(SRDC):
53
Collect
tfactl diagcollect
• Collectionsstoredintherepository directory• Changediagcollecttimeframewith–since<n>h|d• Forlistoftypesofsrdc collectionsusetfactldiagcollect-srdc help
tfactl diagcollect -srdc ora600
![Page 54: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/54.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
TFAdbglevel profiles• Example
– tfactl dbglevel -setnode_eviction
–wouldbeusedforenhancingdiagnosticswhennode evictions arethebeinginvestigatedandwouldperformthefollowingoperationinternally• crsctl setlogcss "CSSD=4"• crsctl setlogcss "CSSDNMC=4"• crsctl setlogcss "CLSF=4"• crsctl setlogcss "CSSDGMCC=4"• crsctl setlogcss "CSSDGMPC=4"
• Toreverttotheoriginalordefaultlogginglevelsthefollowingcommand– $tfactl dbglevel -unsetnode_eviction
• wouldperformthefollowingoperationsinternally• crsctl setlogcss "CSSD=2"• crsctl setlogcss "CSSDNMC=2"• crsctl setlogcss "CLSF=0"• crsctl setlogcss "CSSDGMCC=2"• crsctl setlogcss "CSSDGMPC=2"
• Inthiswayofsettingthelogginglevelsadegreeofautomationandsimplificationis
OracleConfidential– Internal/Restricted/HighlyRestricted 54
![Page 55: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/55.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
LogManagement• TFAwillbelogmanagementinterface forsoftwarestack– Rotatelogs– Archivelogs– Purgeoldlogs
• Intelligentlogmanagementbasedonunderstandingofwhatisinlogsandwhatisstillimportant
OracleConfidential– Internal/Restricted/HighlyRestricted 55
TFALogManagement AllLogsAcross
SoftwareStack
Rotate
Archive
Purge
Actionthroughpredictionoruser
input
![Page 56: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/56.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
SetEmailNotificationAddresses
56
tfactl set [email protected]
Automaticallydetectevent
Collect&packagerelevant
diagnostics
NotifyrelevantDBAandorSys
Adminbyemail
Uploadcollectionto
OracleSupportforfurtherhelp
Significantproblemoccurs
1
2
3
4
TFADBA(s)/SysAdmin(s)
OracleGridInfrastructure&Database(s)
tfactl set notificationAddress=oracle:[email protected]
• TFAcansendemailnotificationwhensignificantproblemsaredetected
• Tosetnotificationemailforanyproblemdetected:
• TosetnotificationemailforspecificORACLE_HOMEsincludetheOSowner:
OracleConfidential– Internal
![Page 57: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/57.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• Analyzeallimportantrecentlogentries: • Searchrecentlogentries:
57
Analyze
tfactl analyze –since 1d tfactl analyze -search “ora-00600" -since 8h
OracleConfidential– Internal
![Page 58: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/58.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• TFAincludesallkeydatabasesupporttools• tfactlprovidesasingleinterfacetothemall
Analyze
58
MostoftheseSupporttoolsareonlyavailableintheMyOracleSupportdownload,theyarenotincludedinthebaseGridorDatabaseinstall
Tool Description DetailsORAchk OracleStackHealthChecksonnon-engineered
systems1268927.2
EXAchk OracleStackHealthChecksonEngineeredSystems
1070954.1
oswatcher CollectandarchiveOSmetrics,usefulforinstance/nodeevictions&performanceIssues
301137.1
procwatcher Automate&capturedatabaseperformancediagnostics&sessionlevelhangs
459694.1
oratop Nearreal-timedatabasemonitoring 1500864.1sqlt CaptureSQLtracedateusefulfortuning 215187.1
alertsummary ProvidessummaryofeventsforoneormoredatabaseorASMalertfilesfromallnodes
ls ListsallfilesTFAknowsaboutforagivenfilenamepatternacrossallnodes
Tool Descriptionpstack Generateprocessstackforspecifiedprocessesacrossallnodes
grep Searchalertortracefileswithagivendatabaseandfilenamepattern,forasearchstring.
summary Highlevelsummaryoftheconfigurationvi Openalertortracefilesforviewingagivendatabaseandfile
namepatterninthevieditortail Runatailonanalertortracefilesforagivendatabaseandfile
namepatternparam ShowalldatabaseandOSparametersthatmatchaspecified
patterndbglevel SetandunsetmultipleCRStracelevelswithonecommandhistory Showtheshellhistoryforthetfactlshellchanges Reportanynotedchangesinthesystemsetupoveragiven
timeperiod.Thisincludesdatabaseaparameters,OSparameters,patchesappliedetc
OracleConfidential– Internal
![Page 59: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/59.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
• Usesrdc <incidenttype>:• Tospecifysid use–sid <oraclesid>• Tospecifydatabaseuse–db<dbname>• Tospecifyincidentdate&timeuse–inc_date <YYYY-MM-DD>-inc_time <HH:MM:SS>
• TouploaddirectlytotheSRuse–sr<SR#>
• Fordbperf usetheseparameterstospecifythegood&badperformanceperiodstocompare:
59
IncidentBasedCollectionswithSRDC
tfactl srdc ora4030
IncidentType Descriptionora4030 ForORA-04030errorsora4031 ForORA-04031errorsdbperf Forbasicdbperformanceproblemsora600 For ORA-00600errorsora700 For ORA-00700errorsora7445 For ORA-07445errors
tfactl srdc ora4030 -sid orcl –db RDBMS121 \-inc_date 2016-06-15 -inc_time 02:48:23 \-sr 3-123456789
Parameter Descriptionperf_base_sd Startdateforagoodperformanceperiodperf_base_st Starttimeforagoodperformanceperiodperf_base_ed Enddateforagoodperformanceperiodperf_base_et Endtimeforagoodperformanceperiodperf_comp_sd Startdateforabadperformanceperiodperf_comp_st Starttimeforabadperformanceperiodperf_comp_ed Enddateforabadperformanceperiodperf_comp_et Endtimeforabadperformanceperiod
tfactl srdc dbperf –db RDBMS121 \–perf_base_sd 2016-06-15 –perf_base_st 01:30:00 \–perf_base_ed 2016-06-15 –perf_base_et 02:00:00 \–perf_comp_sd 2016-06-16 –perf_comp_st 09:30:00 \–perf_comp_ed 2016-06-16 –perf_comp_et 10:00:00
OracleConfidential– Internal
![Page 60: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/60.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
ClusterHealthAdvisor(CHA)*DiscoversPotentialCluster&DBProblems- NotifieswithCorrectiveActions
60
OSData
GIMR
ochad
• Alwayson- Enabledbydefault• Detectsnodeanddatabaseperformanceproblems
• Provides early-warningalertsandcorrectiveaction
• Supports on-sitecalibrationtoimprovesensitivity
• Integrated intoEMCCIncidentManagerandnotifications
• StandaloneInteractiveGUITool
DBData
CHM
NodeHealth
PrognosticsEngine
DatabaseHealth
PrognosticsEngine
*RequiresandIncludedwithRACorR1NLicense
Confidential– OracleRestricted
![Page 61: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/61.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 61
Oracle12cHangManager
• Alwayson- Enabledbydefault
• Reliablydetectsdatabasehangsanddeadlocks
• Autonomouslyresolvesthem
• SupportsQoSPerformanceClasses,RanksandPoliciestomaintainSLAs
• Logsalldetectionsandresolutions
• NewSQLinterfacetoconfiguresensitivity(Normal/High)andtracefilesizes
AutonomouslyPreservesDatabaseAvailabilityandPerformance Session
DIA0
EVALUATE
DETECT
ANALYZE
Hung?
VERIFY
Victim
QoSPolicy
Confidential– OracleRestricted
![Page 62: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/62.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 62
FullResolutionDumpTraceFileandDBAlertLogAuditReportsOracle12cHangManager
Dump file …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trcOracle Database 12c Enterprise Edition Release 12.2.0.0.0 - 64bit BetaWith the Partitioning, Real Application Clusters, OLAP, Advanced Analyticsand Real Application Testing optionsBuild label: RDBMS_MAIN_LINUX.X64_151013ORACLE_HOME: …/3775268204/oracleSystem name: LinuxNode name: slc05kyrRelease: 2.6.39-400.211.1.el6uek.x86_64Version: #1 SMP Fri Nov 15 13:39:16 PST 2013Machine: x86_64VM name: Xen Version: 3.4 (PVM)Instance name: hm62Redo thread mounted by this instance: 2Oracle process number: 19Unix process pid: 12656, image: oracle@slc05kyr (DIA0)
*** 2015-10-13T16:47:59.541509+17:00*** SESSION ID:(96.41299) 2015-10-13T16:47:59.541519+17:00*** CLIENT ID:() 2015-10-13T16:47:59.541529+17:00*** SERVICE NAME:(SYS$BACKGROUND) 2015-10-13T16:47:59.541538+17:00*** MODULE NAME:() 2015-10-13T16:47:59.541547+17:00*** ACTION NAME:() 2015-10-13T16:47:59.541556+17:00*** CLIENT DRIVER:() 2015-10-13T16:47:59.541565+17:00
2015-10-13T16:47:59.435039+17:00Errors in file /oracle/log/diag/rdbms/hm6/hm6/trace/hm6_dia0_12433.trc (incident=7353):ORA-32701: Possible hangs up to hang ID=1 detectedIncident details in: …/diag/rdbms/hm6/hm6/incident/incdir_7353/hm6_dia0_12433_i7353.trc2015-10-13T16:47:59.506775+17:00DIA0 requesting termination of session sid:40 with serial # 43179 (ospid:13031) on instance 2
due to a GLOBAL, HIGH confidence hang with ID=1.Hang Resolution Reason: Automatic hang resolution was performed to free a
significant number of affected sessions.DIA0: Examine the alert log on instance 2 for session termination status of hang with ID=1.
In the alert log on the instance local to the session (instance 2 in this case), we see the following:
2015-10-13T16:47:59.538673+17:00Errors in file …/diag/rdbms/hm6/hm62/trace/hm62_dia0_12656.trc (incident=5753):ORA-32701: Possible hangs up to hang ID=1 detectedIncident details in: …/diag/rdbms/hm6/hm62/incident/incdir_5753/hm62_dia0_12656_i5753.trc
2015-10-13T16:48:04.222661+17:00DIA0 terminating blocker (ospid: 13031 sid: 40 ser#: 43179) of hang with ID = 1
requested by master DIA0 process on instance 1Hang Resolution Reason: Automatic hang resolution was performed to free a
significant number of affected sessions.by terminating session sid:40 with serial # 43179 (ospid:13031)
Confidential– OracleRestricted
![Page 63: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/63.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.|
Oracle12cDomainServicesCluster(DSC)
63
• HostsFrameworkasServices• Reduceslocalresourcefootprint• Centralizesmanagement• Speedsdeploymentandpatching• OptionalSharedStorage• Supportsmultipleversionsandplatformsgoingforward
DeployswithMinimumFootprintandMaximumManageability
ApplicationMemberCluster
DatabaseMemberCluster
DatabaseMemberCluster
OracleDomainServicesCluster
DatabaseMemberCluster
ApplicationMemberCluster
DatabaseMemberCluster
ORACLECLUSTERDOMAIN
Management Repository ServiceTrace File Analyzer ReceiverORAchk Collection ServiceGrid Names ServiceStorage ServicesRapid Home Provisioning Service
Confidential– OracleRestricted
![Page 64: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/64.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 64
OracleDomainServicesCluster
OracleClusterDomain
IOServiceACFSServices
ASMService
DatabaseMemberCluster
UsesASMService
DatabaseMemberCluster
UsesIO&ASMServiceofDSC
MgmtRepository(GIMR)Service
ApplicationMemberCluster
GIonly
DatabaseMemberCluster
UseslocalASM
SharedASM
AdditionalOptionalServices
RapidHomeProvisioning
(RHP)Service
PrivateNetwork
SAN
NAS
Confidential– OracleRestricted
![Page 65: RAC Troubleshooting and Diagnosability Sangam2016](https://reader031.fdocuments.net/reader031/viewer/2022020113/5886f1141a28abba528b6bc3/html5/thumbnails/65.jpg)
Copyright©2016, Oracleand/oritsaffiliates.Allrightsreserved.| 65Confidential– OracleRestricted