E.S. 3par Basic Training
Transcript of E.S. 3par Basic Training
August 2021
by Carlo Curiale, SME
E.S. 3par Basic Training
OVERVIEW
2
This is an introductory training course on HP 3PAR systems currently supported by Park Place Technologies. By the end of this course, you should be able to:
Identify the different systems
Understand logs to be requested
Read and interpret hard drive logs
Read alerts generated by REM and ParkView
Determine correct hard drive part numbers
TABLE OF CONTENTS
• What is a 3par?
• Terminology of Important Hardware Parts and Components
• Terminology of Important Logical Components
• 3PAR Model Naming Conventions
• Replacing 3PAR Components
• Basic Logs to Request
• Hard Drive Logs to Request
• How to Read Log Output
• How to Interpret 3par Alerts
• How to Interpret ParkView Alerts
• How to Determine Drive Part Number
• Action Plan Example
WHAT IS A 3PAR?
4
A 3par is a disk-based enterprise-class storage array built around a controller/shelf design. Built with the ability to customize the level of redundancy desired by the end user, 3par arrays are highly resilient, scalable, and are built to withstand multiple component failures. The 3par brand was purchased by HP in 2010 and has since been developed into several different 3par product tiers.
7000/8000 series
10000(V-Class)
5
TERMINOLOGY OF IMPORTANT HARDWARE PARTS AND COMPONENTS
Disk
A disk, or drive, is the individual physical device where data is stored on the array. Backline will typically refer to the physical location of a disk by its cage position, notated as <cage>:<magazine>:<slot> (ex. – 3:4:2)
Magazine
Equivalent to a drive “sled”, a disk magazine holds disks within the drive cage. Depending on the 3Par model, magazines hold either one or four disks
Cage
Equivalent to a disk “shelf”, a cage contains the magazines that contain the drives for the array.
Node
Equivalent to a “controller”, nodes run the operating system that control and manage the array. The nodes also serve as the interface between the storage array and any hosts that attach to the array, and they also act as an interface between enclosures. A basic 3Par array will contain at least two redundant nodes but can contain up to eight depending on the model.
Service Processor (SP)
A service processor (abbreviated as SP) is a 1U server that sits in the same rack as the 3Par. Customers can optionally opt to use a virtual SP instead of a physical SP. The SP is responsible for monitoring the array and sending alerts and notifications for it, as well as providing a route for CLI access to the array itself. The SP is standalone and is not required for normal array management and has no impact on array I/O should it fail.
T-Classand
10000(V-Class)
7000/8000LFF
7000/8000SFF
F-Class
0123
Magazine
Drives
Magazine
Cage
6
TERMINOLOGY OF IMPORTANT LOGICAL COMPONENTS
Physical Disk (PD) ID
A physical disk ID (PD ID) is a logical ID number that is dynamically assigned to each unique disk that has been installed in a 3Par. A PD ID number does not indicate any physical location of a disk within an array, unlike the cage position.
Chunklets and Logical Disks (LDs)
A chunklet is the smallest measurable unit in which data is stored on a 3Par. A single chunklet represents either 256MB or 1GB of actual data, depending on the 3Par model. Chunklets get stored and dynamically relocated around all PDs in the array and make up groups of logical disks (or LDs), which make up virtual volumes.
Virtual Volumes (VVs) and Common Provisioning Groups (CPGs)
Virtual Volumes (or VVs) are the larger blocks of storage that are connected to and used by hosts. Virtual volumes are stored within common provisioning groups (or CPGs), which are the equivalent of storage “pools” that make up the physical disk space that store data.
3PAR MODEL NAMING CONVENTIONSThe Model number of the 3PAR StoreServ system can give an idea of the configuration.
The first digit is the Series:
F-Class, T-Class, 10000(V-Class), 7000, 8000, 20000Note: All 20000 series will be handled by Backline AEG as it is a new product
The second digit is the number of Nodes:
2-8, depending on the Series(Note: A 4 or 8 Node system can be initially configured with fewer Nodes for future expansion, so a 7400 can possibly contain 2 Nodesinstead of 4)
The third digit specifies the type of Drives :
0 all spinning, 4 mix of SSD and spinning, 5 all SSD(Note: This is used more commonly on 7000/8000 and newer, not asmuch on F, T, and V-Class)
The fourth digit is always 0
7000 Series These have Converged versions, indicated with a “c” at the end of the model number
Model: 8000 Nodes: 4 Drive type: all SSD drives
7440c
Model: 7000 Nodes: 4 Drive type: mixed Converged
T800
Model: T-Class Nodes: 8 Drive type: all spinning drives
Nodes: 8 Drive type: all spinning drives
V80010800
Model: 10000
8450
V-Class
8
REPLACING 3PAR COMPONENTS
There are few hot swappable parts on a 3PAR system. Most parts are hot pluggable and require a user log into the Service Processor in order to run pre-checks and to prepare the system for service through CLI. The commands differ depending on the type of system being serviced. Post-checks are also done to ensure the service was completed properly. This is especially true when replacing hard drives.
F, T, and (10000)V-Class drives are hot pluggable. The T-Class and (10000)V-Class drive cage contains magazines with 4 drives in each magazine. In order to remove the magazine to replace the failed drive a “servicemag” command must be run in order to prepare the magazine.
This involves vacating data for the drives into spare space, which the system will then relocate to the 4 healthy drives once the failed drive is replaced and the “servicemag” is resumed. There are no spare drives, just spare space dedicated on each drive.
***Best practices limit to only preparing and replacing 2 drives per visit.
7000 and 8000 series systems contain hot swappable drives and multiple drives can be replaced once the required checks are done to ensure drives are ready for replacement.
BASIC LOGS TO REQUEST
Zendesk Macro Logs
Basic System Info
showsysDisplays basic system info such as system type and serial number
showversion -a -bDisplays system OS levels
showinventoryDisplays system installed components
showalert -nShows new system alerts
checkhealthDisplays basic health information of the system
Batteries
showbatteryDisplays basic battery info for all installed batteries
shownodeenvDisplays system environmental conditions
BASIC LOGS TO REQUEST CONT.
Cage IO Modules or Degraded Cages
showcage -iDisplays cage installed components
showcage -dDisplays detailed cage information
showalert -nDisplays new system alerts
Failed Node
shownodeDisplays node status
showeepromDisplays last known boot status
showeeprom -dead <node#>Displays last known boot status of failed node
showalert -allDisplays all system alerts
11
BASIC LOGS TO REQUEST CONT.
Power Supply
shownode -ps (for node PSU issues)
Displays node power supplies
showcage -d cage<#> (for cage PSU issues)
Displays cage power supplies
showcage -d
Request if cage# is not known
InSplore
This will only be requested by Backline AEG if needed, file is very large
HARD DRIVE LOGS TO REQUEST
Engineering Support’s primary focus will be gathering and reading hard drive logs to create Action Plans. All other issues will go to Backline AEG once logs are received.Drives must be in a Failed status, not Degraded for replacement.
The required Zendesk Macro logs to request for hard drive failures are as follows:
Disk Failure
showsysshowpd -failed -degraded -iservicemag status showversionshowcage
showsysProvides system information (serial number, system name, system type)
showpd -failed -degraded -iShows the currently failed and degraded drives as well as the model number needed to determine drive part numberDrive must be in a Failed status for replacement
servicemag status Shows if the drive has been prepped and is ready for replacement. For 7000/8000 series the system almost always automatically runs the servicemag to prep the failed drive for replacement.
HARD DRIVE LOGS TO REQUEST CONT.
showversion
Shows the current OS version in order to determine alternate part numbers if necessary. Always engage Backline if an alternate part is requested.
showcageThis will be used to determine if cage is LFF(Large Form Factor) or SFF(Small Form Factor) Only required for 7000 and 8000 systems
HOW TO READ LOG OUTPUT
The following is a breakdown of the information once logs are received.
showsys
cli% showsys --------------(MB)----------------ID ------Name------ ----Model---- -Serial- Nodes Master ClusterLED TotalCap AllocCap FreeCap FailedCap
25148 ParkPlaceMRO7400 HPE_3PAR 7400 1625148 2 0 Off 1671168 1154048 517120 0
showpd -failed -degraded -i
cli% showpd -failed -degraded -iId CagePos State ----Node_WWN---- --MFR-- -----Model------ -Serial- -FW_Rev- Protocol MediaType -----AdmissionTime-----47 2:1:3 failed 2000B452539B56AA SEAGATE SEGLE0600GBFC15K 6SL5L89E 3P01 FC Magnetic 2013-07-08 10:49:03 MST
Here we can see that PDID 47 has failed. Drive must be in a failed status for replacement. The location is shown under CagePos, 2:1:3 (Cage 2, Magazine 1, Drive 3) If there is a “?” next to the failed drive contact BacklineNote: For 7000/8000 series the drive location will always be 0, 3par considers the magazine location as the driveModel of the drive is SEGLE0600GBFC15K, this is what is used to determine drive part number neededSerial number of the drive is 6SL5L89E
servicemag status
cli% servicemag statusNo servicemag operations logged
This drive has not been vacated and is NOT ready for replacement.
HOW TO READ LOG OUTPUT CONT.
cli% servicemag statusCage 14, magazine 0:The magazine was successfully brought offline by a servicemag start command.The command completed at Tue Jul 6 16:02:44 2021.servicemag start -wait -pdid 272 -- Succeeded
For most F-Class, T-Class and 10000(V-Class) servicemag will not be ran until customer prompts the system, or in the few cases that AEG has remote access to the system. The line servicemag start -wait -pdid <pdid#> -- Succeeded means the drive is vacated and ready to replace.
For 7000/8000 systems the servicemag will automatically be ran when a drive failure occurs
showversion
cli% showversionRelease version 3.2.2 (MU6)Patches: P99,P107,P119,P131,P135,P139,P149,P154,P160,P162,P165,P167
Component Name VersionCLI Server 3.2.2 (P165)CLI Client 3.2.2System Manager 3.2.2 (P165)Kernel 3.2.2 (MU6)TPD Kernel Code 3.2.2 (MU6)TPD Kernel Patch 3.2.2 (P165)
This will only be used in instances where an alternate drive part number is requested and will need to move to Backline to determine compatibility
HOW TO READ LOG OUTPUT CONT.
showcage
cli% showcageId Name LoopA Pos.A LoopB Pos.B Drives Temp RevA RevB Model FormFactor0 cage0 1:0:1 0 0:0:1 0 4 26-27 4082 4082 DCN1 SFF
Use the cage number containing the failed disk provided in the showpd -failed -degraded -i output to determine whether drive needed is LFF(Large Form Factor) or SFF(Small Form Factor)
Example: Drive in 0:3:0 is failed, so we will search the output from the command to determine the form factor size of cage 0.(cage 0, mag 3, slot 0 Remember, drive slot will always be 0 in 7000 and 8000 series systems.)
HOW TO INTERPRET 3PAR ALERTS
The alert shown was generated from the 3par system. The information highlighted is only enough to state when the alert was generated and the physical location of the failed drive.
There is not enough information provided to let us know the drive type, as well as if the drive is ready to be replaced.
Thus, the hard drive logs requested and provided by the customer or ParkView alert will show all the necessary information needed to determine the drive part number and the state of the drive.
Alert example
1203986: 3PAR-1203986-V:(Major)PD213|comp_sw_cage_sled285885908123649 COMP_STACustomer notification from HP SP12927 Realtime Alert Process
Notification id: P21537Notify time: 2020/03/29 01:04:40.00 (User, -0600 MDT)Installed machine: 3PAR INSERV 1203986Site: 1, Customer
Event urgency: alertEvent count: 1Event location: SiteEvent time: 2020/03/29 00:04:40.00 (-0500 CDT)Event description: 3PAR INSERV Component state change
Abstract:(Major)PD213|comp_sw_cage_sled285885908123649 COMP_STATE failed. Physical Disk v
Text:Event id: 86388062 Node 1 Cust Alert - Yes, Svc Alert - Yes Severity: Major
Event time: Sun Mar 29 00:04:40 2020Event type: Component state change Alert ID: 998 MsgID: 600faComponent: Physical Disk 119 Magazine 285885908123649Short Dsc: Magazine 3:0:3, Physical Disk 119 FailedEvent String: Magazine 3:0:3, Physical Disk 119 Failed (Vacated {0x45}, InvalidMedia {0x98}, Failed Hardware {0x99})
TPD level for InServ 1203986 is 3.1.2.592
HOW TO INTERPRET PARKVIEW ALERTS
The alert shown was generated by ParkView. The alert contains enough information to determine the drive location, the model of the drive and the drive failed status.
The drive PDID is highlighted at the end of the PATROL Object ID line to show how to locate it.
With this information it is not necessary to ask the customer for logs, once the drive part number is identified the Action Plan can be created.
Authorized by: user bu_centralpark_prod Creation Date: 2021-07-20 1:45:47 PM Physical Disk problem on 192.168.2.152 with 3:1:0 (HITACHI - 900 GB). This physical disk is in critical/unrecoverable state. Reported status: Error.
Hardware Health Report (Tue Jul 20 13:42:42 2021)======================
Monitored Object : 3:1:0 (HITACHI - 900 GB)Type : Physical DiskOn Host : cd7d9d33-4f0a-e711-ad88-00155d059602 (192.168.2.152)On TrueSight Device: cd7d9d33-4f0a-e711-ad88-00155d059602 PDID PATROL Object ID : /MS_HW_PHYSICALDISK/MS_HW_HP3PARhdfcd7d9d33-4f0a-e711-ad88-00155d059602_5000CCA0579897B3-id-56Internal Device ID : 5000CCA0579897B3-id-56Connector Used : MS_HW_HP3PAR.hdfModel : HCBRE0900GBAS10KSerial Number : KXJPXJMXSize : 900 GB
This Object Is Attached to:
Storage: cage3 (HP M6710/7000 Encl)Type: EnclosureSerial Number: ECMCBA1TF3U78LAlternative Part Number: QR490ASerial Number: MXN3045188
cd7d9d33-4f0a-e711-ad88-00155d059602
============================================================Parameter: Status (Currently in ALARM State)------------------------------------------------------------Current Value: 2 (Failed) - ErrorUnit : 0 = OK ; 1 = Degraded ; 2 = Failed
HOW TO DETERMINE DRIVE PART NUMBERUsing the provided log output or the ParkView alert, locate and copy the drive model number.
cli% showpd -failed -degraded -iId CagePos State ----Node_WWN---- --MFR-- -----Model------ -Serial- -FW_Rev- Protocol MediaType -----AdmissionTime-----56 3:1:0 failed 2000B452539B56AA HITACHI HCBRE0900GBAS10K KXJPXJMX 3P01 FC Magnetic 2013-07-08 10:49:03 MST
ParkView Alert
Monitored Object : 3:1:0 (HITACHI - 900 GB)Type : Physical DiskOn Host : cd7d9d33-4f0a-e711-ad88-00155d059602 (192.168.2.152)On TrueSight Device: cd7d9d33-4f0a-e711-ad88-00155d059602PATROL Object ID : /MS_HW_PHYSICALDISK/MS_HW_HP3PARhdfcd7d9d33-4f0a-e711-ad88-00155d059602_5000CCA0579897B3-id-56Internal Device ID : 5000CCA0579897B3-id-56Connector Used : MS_HW_HP3PAR.hdfModel : HCBRE0900GBAS10KSerial Number : KXJPXJMXSize : 900 GB
Use the {3PAR} Drive Parts Matrix for Engineering Support - KB2200011 to locate the correct drive.
HOW TO DETERMINE DRIVE PART NUMBER CONT.
Navigate to the correct system type in the Drive Parts Matrix for Engineering Support, then use the drive model number copied from the logs or ParkView alert. Here we can see the required drive part number for HCBRE0900GBAS10K is 697389-001.
It is important to verify the drive model is being looked up under the correct system tab as the same drive model number can be used for other 3par systems, but the drives will not be compatible among the different systems. Using drives from different systems can cause some systems to crash.
It is also important to note the whether the cage is SFF or LFF. As we can see with drive model HSSC0480S5xnNMRI, it is offered in both versions. The showcage log requested will state form factor of the cage containing the failed drive.
Lastly, drives are replaced like for like. If the model number is not listed in the Drive Parts Matrix for Engineering Support, or if there are questions about using an alternate part number, Backline must be engaged to determine compatibility.
ACTION PLAN EXAMPLE
HP 3Par T and V(10000) Class Failed Disk Drive - Action Plan
***** ACTION PLAN *****Created by Team: Engineering Support
*** Current Issue ***Failed Disk Drive ID 236 4:7:3
Evidence of the issue:Customer provided logs
*** Resolution steps ***
Parts needed: Qty=1 657889-001
Detailed steps to resolve:Replace Failed Physical Disk ID 236 Cage 4 Mag 7 Disk 3 SN 6SL5L89E
If there are MULTIPLE disk failures, please contact PPT 3Par support for review before running any servicemags (or advising the customer to start any servicemags)
1. Confirm with Customer the servicemag has been run and completed, and the disk is ready for replacement.
**Servicemag MUST be run and confirmed completed prior to disk replacement**
To check the status to see if a servicemag process has been initiated:
servicemag status
Have customer run the following commands to initiate the servicemag procedure if not done so already (this process can take multiple hours to complete):
servicemag start -pdid 236
Using the output gathered, we can then create the Action Plan. The AP template will consist of filling in the blanks with the information.
From the logs or alert we have determined the failed drive is on a 10000(V-Class).
The failed drive PDID is 236
The location of the failure is 4:7:3
The model number is SEGLE0600GBFC15K
The serial number is 6SL5L89E
Servicemag has not been ran
Part number is 657889-001 as determined using the model number and searching the 10000(V_Class) tab in the Drive Parts Matrix for Engineering Support
ACTION PLAN EXAMPLE CONT.
To monitor the status and confirm the servicemag has completed successfully:
servicemag status -d
2.Once parts are available and disk is ready, FE schedules access.3. Bring internet capable laptop with usb-serial dongle4. Connect to the Service Processor using the red crossover cable into the laptop nic port and INT port on the SP5. Set laptop IP to: 10.255.155.49 Subnet: 255.255.255.2486. Browse to: 10.255.155.547. Login with username: spvar8. Passwords are version dependent (contact support) 9. On SPOCC homepage, Click on “Support” in left hand column10. Under “Action” Click on “Guided Maintenance”11. Under “Drive Cage” Click on “Disk Drive”12. Follow prompts to replace the disk and initiate the resume rebuild13. If the initial output shows more than one failed disk contact support14. The original PD# will still show as failed until the rebuild completes, it will then be automatically removed from the system reporting console.
Special instructions: Servicemag MUST be run and confirmed completed prior to disk replacement
Supporting documents:3Par - Connecting to the Service Processorhttp://centralpark/Service%20Delivery/Knowledge%20Base/3Par/Service%20Guides/3Par%20-%20Connecting%20to%20the%20Service%20Processor.pdf
3Par T and V-Class FE Reference Slides - Diskshttp://centralpark/Service%20Delivery/Knowledge%20Base/3Par/Service%20Guides/3Par%20T%20and%20V-Class%20FE%20Reference%20Slides%20-%20Disks.pdf
*** Timeline ***
Next steps: ES to assign FE, FE to order part and schedule with the customerNext owner: Engineereing Support
***** END OF ACTION PLAN *****
Thank You
23
KB LINK IN ZENDESK GUIDE
24
• https://parkplacetechhelp.zendesk.com/hc/en-us/articles/4404540969113