01 BSC6900&BSC6910 WCDMA R15 Troubleshooting Student Book ISSUE 1.00

download 01 BSC6900&BSC6910 WCDMA R15 Troubleshooting Student Book ISSUE 1.00

of 54

description

RAN troubleshooting

Transcript of 01 BSC6900&BSC6910 WCDMA R15 Troubleshooting Student Book ISSUE 1.00

A AppendixBSC6900 GSM Troubleshooting for Clock Faults

BSC6900&BSC6910 WCDMA R15 Troubleshooting Student book6 Basic Service Problems

Revision RecordCourse CodeProductProduct VersionCourse Version ISSUE

OWC403831BSC6900&BSC691015.01.00

Developer/ModifierTimeApproverNew/Update

Wang Chi2013-11New

BSC6900&BSC6910 WCDMA R15 Troubleshooting Student bookISSUE 1.00

ContentsCourse Introduction11 BAM Problems21.1 OMU Service Abnormality21.2 Synchronization Failure of the Active and Standby OMUs51.3 Case Study62 Equipment Problems92.1 Clock-Related Alarms92.2 MSP Switch Fault92.3 Reset Fault of Interface Boards102.4 Case Study113 WCDMA SCCP Connection Establishment133.1 Low Establishment Rate of SCCP Connection133.2 Case Study194 Measurement Result Loss (BSC/M2000)224.1 Measurement Result Loss224.2 Case Study245 Start Failure of New Subracks255.1 Start Failure of New Subracks255.2 Case Study276 Basic Service Problems286.1 Service Setup Failure286.2 HSPA Service Setup Failure366.3 Case Study427 Inter-RAT Handover and Relocation Problems437.1 PS Relocation Failure437.2 Inter-RAT Handover Failure467.3 Case Study50

BSC6900&BSC6910 WCDMA R15 Troubleshooting Student bookContents

Issue 01 (2012-08-24)Huawei Proprietary and Confidential Copyright Huawei Technologies Co., Ltd.39

Course IntroductionPrefaceIf BSC is working improperly, some network KPIs will deteriorate. This course describes how to locate and troubleshoot BSC faults.

IntroductionThe course contents are as follows:Describes BAM Problems of the BSCDescribes Equipment Problems of the BSCDescribes SCCP Connection Establishment problems of the BSCDescribe Measurement Result Loss Problems of the BSCDescribe Start Failure of New Subracks Problems of the BSCDescribes Basic Service Problems of the BSCDescribes Inter-Frequency Hard Handover Failure Problems of the BSC

ObjectiveTask objectives:To analyze possible causes of clock faults from different aspects and preliminarily locate clock faults by using clock maintenance and test methodsTo locate and troubleshoot common BSC/BTS clock faults in the network according to the troubleshooting procedureKnowledge and skill objectives:To learn about the types of BSC/BTS clock source and clock signal flow directionTo learn about the BSC/BTS clock status transitionTo learn about clock-related concepts

Course SchedulingThe following table lists the course scheduling.

CourseDuration (Minutes)

1 BAM Problems60

2 Equipment Problems 60

3 WCDMA SCCP Connection Establishment Problems120

4 Measurement Result Loss Problems120

5 Start Failure of New Subracks Problems120

6 Basic Service Problems120

7 Inter-Frequency Hard Handover Failure120

BSC6900&BSC6910 WCDMA R15 Troubleshooting Student bookCourse Introduction

1 BAM Problems1.1 OMU Service AbnormalityDescriptionThe OMU is running abnormally. OMU processes are stopped and the ALM-20707 OMU Process Abort alarm is generated.

Possible Fault CausesPossible fault causes are as follows:The OMU runs abnormally.The internal and external networks run abnormally.The OMU sub-modules run abnormally

Troubleshooting MethodStep 1Run the command DSP OMU to check whether the active and standby OMUs are normal (the blue part in the following information).

Step 2Check whether the internal and external networks are normal (the red part in the following information).If the internal network is abnormal, check whether the OMU is correctly installed and whether the SCU is working normally.If the external network is abnormal, check whether the external network cables are correctly connected.

+++ MBSC_Huawei 2013-10-12 15:53:55O&M #57526%%DSP OMU:;%%RETCODE = 0 Execution succeeded.OMU server state---------------- Subrack No. = 0 Slot No. = 21 Computer name = omu-1Internal network fixed IP = 80.168.3.50External network fixed IP = 10.161.105.209Backup network IP = 192.168.3.50Operational state = Active normal Version = V900R013ENGC01SPC500 Subrack No. = 0 Slot No. = 23Computer name = omu-2Internal network fixed IP = 80.168.3.60External network fixed IP = 10.161.105.210Backup network IP = 192.168.3.60Operational state = Standby normal Version = V900R013ENGC01SPC500(Number of results = 1)Other state----------- Internal network virtual IP = 80.168.3.40 External network virtual IP = 10.161.105.208 Internal network virtual IP state = Normal External network virtual IP state = Normal Data-sync state = Data synchronization is successful Internal network link state = Normal External network link state = Normal Backup network link state = Normal(Number of results = 1)--- END

Step 3Run the command DSP OMU to check whether the active and standby OMUs are normal (the blue part in the following information).

+++ MBSC_Huawei 2013-03-12 15:55:36O&M #57554%%DSP OMUMODULE:;%%RETCODE = 0 Execution succeeded.Active OMU module state----------------Service name State Startup type host_gate Started Automatic ems_gate Started Automatic authority Started Automatic configure Started Automatic maintain Started Automatic stat Started Automatic alarm Started Automatic software Started Automatic ftp_server Started Automatic sntp Started Automatic btsom Started Automatic ems_agent Started Automatic omu_manager Started Automatic weblmt Started Automatic cfa Started Automatic debug_log Started Automatic (Number of results = 16)Standby BAM modules state-------------------------Service name State Startup type host_gate Stopped Disabled ems_gate Stopped Disabled authority Stopped Disabled configure Stopped Disabled maintain Stopped Disabled stat Stopped Disabled alarm Stopped Disabled software Started Automatic ftp_server Started Automatic sntp Started Automatic btsom Stopped Disabled ems_agent Stopped Disabled omu_manager Started Automatic weblmt Stopped Disabled cfa Stopped Disabled debug_log Stopped Disabled(Number of results = 16)2 reports in total---END

If the OMU is in UMTS Only (UO) mode, BTSOM service does not exist. Step 4Run the command RST OMUMODULE to reset the abnormal modules.

1.2 Synchronization Failure of the Active and Standby OMUsFault DescriptionThe ALM-20704 OMU Data Synchronization Failure alarm is generated on the RNC

Possible Fault CausesPossible fault causes are as follows:The link between the active and standby OMUs is broken.The disk space for the active or the standby OMU is insufficient.The FTP Server module of the active OMU runs abnormally.A switchover was performed between the active and standby OMUs when fault occurs.The database system or the operating system of the active/standby OMU server runs abnormally.

Trouble shooting MethodStep 1Run the command DSP OMU to query the status of the active and standby OMUs. Check whether data synchronization is normal and whether the FE port connection is normal.Check whether the standby OMU starts normally. If the standby OMU does not start normally, start the standby OMU.For example, after a sudden switchover is performed between the active and standby OMUs, if data synchronization fails, you can run the command DSP OMU to find out the direct cause through the Data-sync state field in the result.Step 2If the network is normal, check whether the ALM-20701 OMU Failure Switchover alarm is generated. If the alarm is generated, do as follows:1) Use the troubleshooting method of the ALM-20701 OMU Failure Switchover alarm to clear the alarm.2) If the standby OMU is faulty and cannot be switched back, the onsite personnel must check whether the data is the latest, and whether the data on the OMU and host is consistent. If the data is consistent, you must clear the alarm manually, run the command STP DATASYNC to stop data synchronization, and run the command STR DATASYNC to enable data synchronization again.

1.3 Case StudyFault Description20736 Data Inconsistency between OMU and Host.22832Board Switchover.

SymptomCustomer has reported that "20736 Data Inconsistency between OMU and Host" is being generated on BSC6900.

Possible Fault CausesIn normal scenarios, one OMU provides data for several service boards and interface boards. Therefore, online data configuration involves the sequence and strategy of making data take effect on the OMU and service boards (writing data to databases of the boards). Present online data configuration operation mainly falls into two types: ADD/MOD/SET operations and RMV operations. In ADD/MOD/SET operations, the configuration data takes effect on the OMU first and then the service boards. In RMV operations, the configuration data takes effect on the service boards first and then on the OMU. If configuration data fail to take effect on any service board, the configuration data cannot take effect on the OMU successfully. When the OMU works in active/standby mode, the data changed on the active OMU through online configuration is immediately synchronized to the standby OMU to ensure consistency.When the SCU, XPU, and SPU work in active/standby mode, the data changed on the active SCU, XPU, and SPU through online configuration is immediately synchronized to the standby SCU, XPU, and SPU to ensure consistency.Common Inconsistency ScenariosData inconsistency caused by online configuration failure. The strategy of making data take effect mentioned in section 1 indicates that in ADD/MOD/SET operations, if configuration data fails to take effect on one service board (including failure of message sending), data in the OMU is updated while data in that service board is not.Therefore, data inconsistency occurs. In RMV operations, if configuration data fails to take effect on one service board, data in other service boards where data takes effect successfully is updated while data in OMU is not. Therefore, data inconsistency occurs.In conclusion, as long as online data configuration fails, inconsistency between FAM data and BAM data may occur.The causes for the online data configuration failure are as follows:1) Data check rules on FAM and BAM are different. Data that meets the check rules of BAM fails to meet the check rules of FAM.2) In the online data configuration process, if the boards on FAM are faulty, absent, starting, or loading data, FAM cannot process the configuration requests correctly.3) In the online data configuration process, FAM fails to receive the configuration requests or the OMU fails to receive the response from FAM due to communication problems.Data inconsistency caused by the active/standby switchover of SCUs, XPUs, or SPUs.If data in the active SCU, XPU, or SPU is updated while data in the standby SCU, XPU, or SPU is not, data inconsistency between FAM data and BAM data occurs after the active/standby switchover of SCUs, XPUs, or SPUs.The causes for the data inconsistency between the active SCU, XPU, or SPU and standby SCU, XPU, or SPU are as follows:1) After online data configuration, active and standby SCUs, XPUs, or SPUs are switched over before the latest updated data on the active SCU, XPU,or SPU is synchronized to the standby SCU, XPU, or SPU.2) After online data configuration, abnormal data backup channel makes the latest updated data on the active SCU, XPU, or SPU unable to be synchronized to the standby SCU, XPU, or SPU. Then, active and standby SCUs, XPUs, or SPUs are switched over.

Trouble shooting MethodStep 1According alarm logs Data Inconsistency between OMU and Host appeared after SCU board is Switchovered.Checking operation logs we can see that some commands were executed in time SCU board switchovered so in this case data in the OMU is updatedwhile data in that service board is not so this is reason 20736 appeared there is no consistency betweenBAM and FAM.

Step 2Checking the operation logs

Suggestions and SummaryIt is clear that 20736 Data Inconsistency Between OMU and Host appeared because of some commands were executed in time SCU was reset data in the OMU is updated while data in that service board is not. Checking if SCU board was rested manually please waits to do any operation untilboardis recovered to avoid inconsistency in board DB.If SCU board is resetting or switchovered frequentlyby itself please find if board is abnormal ifit is confirmedplease replace it.

BSC6900&BSC6910 WCDMA R15 Troubleshooting Student book2 Equipment Problems

2 Equipment Problems2.1 Clock-Related AlarmsFault DescriptionThe following clock-related alarms are generated:ALM_ 20205 System Clock Reference Source UnavailableALM_ 20206 Current System Clock Reference Source Status AbnormalALM_ 20207 Failure in Locking System Clock Source

Troubleshooting ProcedureStep 1Clock unavailability may be due to the following causes.Input signals of the reference clock are missing.The frequency offset of input signals of the reference clock is abnormal (the frequency offset exceeds the pull-in range).The board is faultyStep 2Make the preliminary analysis according to the prompt of alarm helps.Step 3Run the DSP CLK command to query and save the clock status of the board.Step 4Run the COL LOG command to obtain the server logs, alarm information, and operation logs.

2.2 MSP Switch FaultDescriptionIn normal cases, Multiplex Section Protection (MSP) switch (EVT-22862) does not affect the services when the BSC6900 is running. In abnormal cases, however, the services are interrupted for more than three seconds.

Possible Fault CausesPossible faults causes are as follows:The optical interface reports an alarm.The MSP switch at the peer end is not ON.The optical cables are incorrectly connected.The protection modes at the two ends of the MSP are different.The configuration parameters at the two ends of the MSP are inconsistent.

Troubleshooting MethodStep 1Check the configuration and connection.Check the connection.If the backup mode at both ends is not the same, modify the local or the peer end backup mode to the 1+1 or 1:1 mode.If the restoration mode at both ends is the same, modify the local or the peer end restoration mode to revertive or non-revertive. In the 1:1 backup mode, only the non-restorable mode is optional.If the switching mode at both ends is not the same, modify the local or the peer end switching mode to single-ended switching or dual-ended switching. Huawei RNC products do not support single-ended switching in the 1:1 backup mode.If the MSP is deactivated, activate the local or the peer MSP.If the working channel and the protection channel are inversely connected, reconnect the optical fiber.If one end is locked for protection, cancel the protection lock at the local or the peer end.Step 2 Check optical interface alarms.Check whether a fault alarm related to the optical interface is generated. When the SF alarms (such as the LOS, LOF, MS_RDI, and MS_AIS fault alarms related to the optical interface and the APS fault alarms related to the standby board) are generated, the RNC automatically triggers switching.Check whether the MSP switching alarm maps the optical interface switching alarm.Step 3Check whether the MSP switching alarm is generated due to the local or the peer end fault. If the alarm is generated due to the peer end fault, check the peer end devices

2.3 Reset Fault of Interface BoardsDescriptionThe alarm (ALM-20241) interface board reset (ALM-20241) occurs together with a switchover between active and standby boards when the BSC6900 is running, and the services are not affected. If a standby board is not configured, the services carried by the board are interrupted for a short period.

Possible Fault CausesPossible fault causes are execution of MML reset commands and hardware/software faults

Troubleshooting MethodOMU Service FaultCheck the reason for board unavailability in the alarm information. If the reason indicates a manual reset, confirm the manual reset in the in the corresponding operation log file. To troubleshoot the fault of other reasons, see the Alarm Online Help.If the hardware is faulty, replace a board.If the software is faulty for unknown reason, run the MML command DSP RSTREASON to check the cause for the board reset and save the displayed information. Then, check the relevant log files.If the problem persists, collect logs according to the following fault information checklist for further analysis.

2.4 Case StudyFault DescriptionMSP Multiplex Section K1/K2 Mismatch.MTP3 link faulty.

SymptomOn removing cable from active slot and then standby card is not able to become active and services are going down at that time.BSC6000 connected to Ericsson MGW directly through a path cords.

Possible Fault CausesConfiguration analyzeThe A interface board OIUa in slot 14 of subrack 0 is configured with one 2 Mbit/s NO.7 signaling links.The switchover mode for the BSC optical port is configured as 1+1 not recover bidirectional mode.Alarm analyzeBSC received MSP Multiplex Section K1/K2 Mismatch alarm when removed fiber from active boardK-bytes log analyze:35 K_DIRECTION H'0(0) 2012-08-13 12:13:5036 SF_DETECTED H'1(1) 2012-08-15 18:14:3137 K_SENDS H'D100(53504) 2012-08-15 18:14:31BSC send switch request38 K_DIRECTION H'0(0) 2012-08-15 18:14:3139 K1K2TIMER_START H'0(0) 2012-08-15 18:14:3140 STATE_TRANSITION H'D102(53506) 2012-08-15 18:14:3141 K1K2TIMER_STOP H'0(0) 2012-08-15 18:14:3642 SF_CLEARS H'1(1) 2012-08-15 18:16:2043 K_RECEIVED H'0(0) 2012-08-15 18:16:20peer end has no response, send nothing back to BSC44 K_DIRECTION H'1(1) 2012-08-15 18:16:20When BSC send switch request, peer end has no response.

Trouble shooting MethodWhen alarmMultiplex Section K1/K2 Mismatch alarmhappened, normally, it means both ends' MPS configuration are not the same.Check theASP configuration of peer end, whether it is the same as BSC end.After check the peer end, theASP switchover is configured in 1+1 bidirectional switchover mode for theOIUa on the BSC and 1+1 unidirectional switchover mode for the interface board on the Ericsson MGW,Therefore, the BSC cannot communicate with the MGW by using the K-bytes. As a result,the OIUa does not switch after the interface board on the MGW switches.this caused OIUa switched failed.This problem had occurred when a Huawei BSC connected to an Ericsson MGW over the A interface.The Ericsson MGWs support only unidirectional APS switchover. Huawei BSC does not support bidirectional selective receiving for data on the signaling plane. Therefore, if the interface board processes signaling, it cannot be configured as unidirectional switchover.

Suggestions and SummaryConnect Huawei BSC and Ericsson MGW by using optical transmission equipmentIn BM/TC combined mode, Signaling System No. 7 (SS7) signaling links are configured for the A interface board which supports only the bidirectional switchover.However, the interface board on the MGW supports only the unidirectional switchover.After the interface boards on both ends are connected by optical transmission equipment,the A interface board on the BSC connects to the optical transmission equipment in APS bidirectional mode,and the optical transmission equipment connects to the MGW in APS unidirectional mode.Requirements on optical transmission equipment:1) The optical port works in SDH mode2) The equipment supports 1+1 ASP (MSP) functionThe equipment support both unidirectional and bidirectional switchover modes.Configure a bidirectional switchovercapable Ericsson MGWIf the MGW supports the bidirectional switchover, the BSC and the MGW can be connected by using APSUse the OIUa to bear data flows on the user plane and the EIUa to bear signaling streams on the signaling plane.In BM/TC combined mode, the OIUa over the A interface bears only data flows on the user plane, supporting both unidirectional and bidirectional APS switchover modes.With APS, you can connect the A interface board on the BSC to the interface board on the MGW in unidirectional switchover mode

3 WCDMA SCCP Connection Establishment 3.1 Low Establishment Rate of SCCP ConnectionFault SymptomThe success rate of SCCP connection establishment is low. The success rate of SCCP connection establishment = SCCP.Tx.Con.Succ/SCCP.Tx.Con.Req.Impact: The message sending and receiving functions of the Iu and Iur interfaces are abnormal. As a result, the location update, short message sending, and call establishment may fail.

Cause AnalysisThe general causes for such faults include:Transmission faultsCongestion on the Core Network (CN) sideIncorrect configuration on the CN side

Troubleshooting MethodStep 1 Analyze performance counters such as the number of SCCP establishment requests and the number of successful SCCP connection establishment attempts to better understand the SCCP connection establishment condition.Step 2Analyze the possible causes of the abnormal SCCP connection establishment. If the Connection Request (CR) is sent, but the Connection Confirm (CC) is not received, go to step 3. If the CREF is received from the peer SCCP after the CR is sent, go to step 4.

Step 3Analyze why the local SCCP fails to receive the CC from the peer SCCP after sending the CR.Step 4The local SCCP receives the CREF from the peer SCCP after sending the CR.Step 5If the problem is not located, collect the fault information and send it to Huawei Customer Service Center for analysis.

Troubleshooting FlowStep 1Analyze performance indicators such as SCCP establishment request and the number of successful SCCP connection establishment requests to further understand the low success rate of SCCP connection establishment.For the Iu interface, view the SCCP performance indicators of the Iu interface.VS.IU.SCCP.Tx.Con.ReqVS.IU.SCCP.Tx.Con.SuccFor the Iur interface, view the SCCP performance indicators of the Iur interface.Step 2Analyze the possible causes of the abnormal SCCP connection establishment.If the number of CCs is smaller than that of the CRs, the SCCP connection establishment is abnormal. In this case, check whether the CREF is included in the SCCP performance indicators of the abnormal original signaling point.

SCCP connection establishment is measured by the original signaling point. It covers the measurement on the Iu and Iur interfaces.OS.SCCP.CREF.TxOS.SCCP.CREF.RxIf the total number of received CREFs (OS.SCCP.CREF.Rx) is 0, the local SCCP sends the CR but does not receive the CC from the peer SCCP. In this case, go to step 3.If the total number of received CREFs (OS.SCCP.CREF.Rx) is not 0, the original signaling point sends the CR and receives the CREF from the peer SCCP. In this case, go to step 4.Step 3Analyze why the local SCCP fails to receive the CC from the peer SCCP after sending the CR.Enable the SCCP trace. If the CR does not exist, go to 1). If the CR exists, but the CC is not received, go to 2).The local SCCP fails to send the CR.The cause may be that congestion or intermittent disconnection occurs in bottom layer SAAL link or SCTP. Therefore, some CRs messages are not sent out.The peer SCCP does not return the CC.If the transmission layer (SAAL link or SCTP) is normal, and the SCCP trace shows that the CR is sent out, the peer SCCP does not return the CC message. In this case, locate the problem with the peer SCCP.Step 4The local SCCP receives the CREF from the peer SCCP after sending the CR. (Conduct root cause analysis and troubleshooting with the peer SCCP).Trace the SCCP and IU messages on the Iu interface or the SCCP and IUR messages on the Iur interface.View the cause value carried in the CREF found in the SCCP trace. The following figure shows the cause value.

Step 5Analyze the cause of CREF according to the SCCP signaling on the Iu interface.Sub-step 1: View the connect-id of DLRN in the CREF message of SCCP

Sub-step 2: Locate the corresponding CR according to the connect-id, that is, to locate the connect-id of OLRN. The user-data in the CR is the initial UE message.

Locate the corresponding initial UE message according to user-data. The following two figures show that the bytes of user data are the same as those of the initial UE message. The following figure shows the bytes of the CR user data.

The following figure shows the bytes of the initial UE message.

Sub-step 3: View the service area code (SAC) or routing area code (RAC) based on the initial UE message.

If the fault occurs in the same SACs or RACs, the CN does not configure the SAC or RAC, or such configuration is incorrect. In this case, instruct the CN to solve the problem. If the fault is not located, collect the fault information and send it to Huawei R&D engineers for analysis.

3.2 Case StudyFault DescriptionA node has a low success rate of SCCP connection establishment.

SymptomThe RAC is configured incorrectly on the SGSN side.

Trouble shooting MethodStep 1Confirm that the success rate of SCCP connection establishment is low. The SCCP Tx success rate of a CN Node ID on the radio network controller (RNC) is between 52.47% and 77.95%.

Step 2Analyze why the success rate of SCCP connection establishment is low.An analysis of performance indicators shows that the CREF message results in the low success rate of SCCP connection establishment.

Step 3Analyze the cause of failure based on the SCCP signaling.According to the SCCP signaling on the Iu interface, the CREF message in the CN shows that the cause is "access failure", which means that the service layer (RANAP) rejects the CR.

Step 4Further analyze the cause of failure according to the IU message RANAP_DIRECT_TRANSFER on the Iu interface.The optional parameter 2 (Para2) in the CREF message in the previous figure is actually the RANAP_DIRECT_TRANSFER message.View the RANAP_DIRECT_TRANSFER message. The refusal cause "Routing-area-update-reject" is carried.

The refusal cause in some of the messages is "Service reject".

Therefore, the cause is that the CN rejects the SCCP connection request from the RNC and such rejection is associated with routing areas. The following figure shows that the failure is caused when other route areas are updated to the 0xF9.

ConclusionThe CN returns the CREF after receiving the CR. The causes of failure are "route area update reject" and "serve reject", and the failure occurs in the RAC 0xf9. According to the communication with the CN, a low success rate of SCCP is caused by incorrect RAC configuration. After the configuration is corrected, the problem is solved.

Measurement Result Loss (BSC/M2000)Measurement Result LossFault DescriptionThe EVT-22806 Subsystem Measurement Result File Loss event is generated on the OMU, or the RNC measurement results cannot be queried on the M2000.

Possible Fault CausesPossible fault causes are as follows:The link between the OMU subsystem and the M2000 is broken.The FTP Server module of the OMU subsystem runs abnormally.The performance module of the OMU subsystem runs abnormally.A switchover between the active and standby OMUs is performed while the traffic statistics file is generated.The M2000 cannot be connected to the RNC because of the firewall settings.The board of the measurement object runs abnormally.The OMU version does not match that of the host board.

Troubleshooting MethodStep 1If the BSC6900 measurement results cannot be queried on the M2000, check whether the result file of the queried time period exists in the NE directory on the M2000. If the file exists, disconnect the RNC from the M2000 and then re-connect it, check whether the problem is solved, and contact the M2000 technical support personnel for assistance.If the result file of the queried time period does not exist in the NE directory on the M2000, check whether the results of all NEs or one NE cannot be queried. If the results of all NEs cannot be queried, check the OMU is connected to the M2000 (ping the physical and virtual IP addresses of the OMU for the external network on the M2000), reset the FTP client/server on the M2000 side, and contact the M2000 technical support personnel for assistance.If the results of one NE cannot be queried, check whether the measurement results are successfully uploaded in the operation logs. If the measurement results are successfully uploaded, contact the M2000 technical support personnel for assistance.[ Y2010M02D03H14N57S28], [ Y2010M02D03H14N57S28][/*95248*/ULD MEASRST:FN="A20100203.1425+0800-1430+0800_EMS-SHORTPERIOD.mrf.bz2",DSTF="/export/home/sysm/ftproot/pm/ne.3221229568.3221233664.3221282840/A20100203.1425+0800-1430+0800_EMS-SHORTPERIOD.mrf.bz2",IP="172.29.47.193",USR="ftpuser",PWD="*****";], [Execution succeeded.]If the measurement results fail to be uploaded, go to Step 2Step 2Check whether the measurement results can be found through the D:\mbsc\bam\version_a\ftp\MeasResult path on the NE.(1) If the measurement results can be found through the path, check whether the NE acts as the FTP server. If the NE acts as the FTP server, check whether the corresponding result file can be found through the D:\mbsc\bam\common\ems path. If the result file can be found, check whether the network is normal (ping the physical and virtual IP addresses of the OMU for the external network on the M2000). If the network is normal, contact the M2000 technical support personnel for assistance.If the result file cannot be found through the D:\mbsc\bam\common\ems path, run the command DSP OMUMODULE to check whether the ftp_server status (ftp server Started Automatic) or the stat service status (stat Started Automatic) is normal. If the service is abnormal, restart the corresponding module to check whether the service is recovered.If the NE does not act as the FTP server, run the command DSP OMUMODULE on the LMT to check whether the stat service status (stat Started Automatic) is normal. If the service is abnormal, restart the corresponding module to check whether the service is recovered.Provide the logs according to the following fault information checklist.(2) If result file cannot be found through the \bam\common\ MeasResult path, run the command DSP OMUMODULE to check whether the stat service status (stat Started Automatic) is normal. If the service is abnormal, restart the corresponding module to check whether the service is recoveredStep 3If the BSC6900 reports the EVT-22806 Subsystem Measurement Result File Loss event, handle the alarm according to the alarm online help.Run the command DSP OMUMODULE to check whether the ftp_server status (ftp_server Started Automatic) or the stat service status (stat Started Automatic) is normal. If the service is abnormal, restart the corresponding module to check whether the service is recovered.If the ftp_server is normal, check whether the corresponding board or subsystem is abnormal according to the EVT-22806 Subsystem Measurement Result File Loss event. If the board or subsystem is abnormal, perform troubleshooting according to the relevant guide. If the board or subsystem is normal, go to Step 4.Step 4If the problem persists, collect logs according to the following fault information checklist for further analysis.

3.3 Case StudyFault DescriptionData is lost randomly on the M2000. The RSSI Performance Stat of the RSSIPerformanceMeasurement indicates that data is randomly lost every day.

SymptomLoss of the M2000 performance data.

Possible Fault CausesThe database is lost. The database on the BAM or M2000 is lost. As a result, the query result indicates that the data is incorrect. Data is lost partially when the M2000 fetches data on the BAM due to other reasons.

Trouble shooting MethodAfter the synchronous measurement of the RSSI Performance Stat, object RSSI data can be partially collected. The problem is not solved completely. The measurement unit must be manually synchronized for multiple times. It indicates that the BAM data is normal. Lost data can be completely restored through the manual synchronization. The problem may result from the transmission.In the synchronization of the trace, failures occur. Data is obtained through the reconnection. The FTP service at the NE side is abnormal from time to time. As a result, some result files fail to be analyzed. This is the cause for the loss of result files.Onsite engineers confirm that the network transmission is instable.Temporary mitigation: M2000 data is synchronized manually. Root solution: The transmission problem is solved.

Suggestions and SummaryFor the data loss of the M2000, perform the manual synchronization to locate the fault.To synchronize data, do as follows:Telnet to the M2000 server and obtain the FDN by running the following command:root@BladeMaster#isql-Usa-Pemsems-SSYB1>useomcdb2>go1>selectfdn,imap_namefromBSCNE2>goDownload result files with the BTS3900_CARRIER* in the NE to the directory /opt/OMC/var/med/ne+ (NE fdn) /PmModule/ftp/queryfiles.Locate the measurement unit with the loss of the traffic statistic in the Performance menu. Click Sync to select the corresponding time segment to perform the synchronization operation. The M2000 analyzes the traffic statistic files in the /opt/OMC/var/med/ne+ (NE fdn) /PmModule/ftp/queryfiles.

Start Failure of New SubracksStart Failure of New SubracksFault DescriptionThe newly added or newly configured subracks fail to be started.

Possible Fault CausesPossible fault causes are as follows:Subracks are not set to effective mode.The internal IP address of the OMU is incorrectly configured.The Gigabit Ethernet (GE) ports of the SCU boards are not enabled.Subracks fail to be loaded due to faults in the OMU Ethernet card.

Troubleshooting MethodIncorrect Subrack Mode.Check the effective mode of subracks on the LMT, as shown in the following figure.

If subracks are not in effective mode, run the command SET CFGDATAEFFECTIVE to set the configuration mode to effective mode.Incorrect Internal IP Address of the OMU.Check whether the internal IP addresses of the onsite OMU are correct according to the IP addresses listed in the OMU Administration Guide.

If the internal IP addresses of the onsite OMU are incorrect, change the IP addresses according to the preceding figure.GE Ports of the SCU Boards not Enabled.Run the command LST SCUPORT, and check whether the port on the SCU panel in subrack 0 is enabled.If GE ports of the SCU boards are not enabled, run the command SET SCUPORT, and set the panel ports of the SCU boards in subrack 0 to OPEN.Fault in the OMU Ethernet Card.Check whether the ALM-20708 OMU Hardware Fault exists. Then, check whether the internal Ethernet card is faulty and whether the alarm can be recovered. Replace the OMU, and check whether the internal OMU Ethernet card is faulty

3.4 Case StudyFault DescriptionDuringexpansiona newBSC6900, thenewEPSsubrackconfiguration is done, but the EPS subrackfailedtoload the data.

SymptomThe RAC is configured incorrectly on the SGSN side.

Possible Fault CausesData configuration is wrongDIP switch setting is wrongCable connection is wrong

Trouble shooting MethodStep 1Check the configuration data, the commands is LST SCUPORT, make sure the corresponding SCU port is open, its normal.Step 2Check the dip switch setting of the new EPS subrack, its also normal.Step 3Check the MAC switching subsystem cable connection is normal or not, its normal.Step 4Try to reset the subrack to load data again, problem still exist.Step 5Check the mode of new EPS subrack is offline mode(INEFFECTIVE MODE), modify the subrack mode to online mode by command SET CFGDATAEFFECTIVE, and reset the new subrack again ,and the problem is solved.

4 Basic Service Problems4.1 Service Setup FailureFault DescriptionCS/PS services cannot be set up after the cell is set up. Users cannot use the AMR voice service or the PS dial-up service.

Possible Fault CausesEmergency faults are as follows:Receives user complaints that the call quality and access for many users in a live network are affected.KPIs, such as the call drop rate and the access success rate, deteriorate so serverely, affecting that the call quality and network access for a large number of users in a live network.The services in one subsystem, one subrack, or the entire RNC are affected. All services in one interface board are affected.Possible causes for non-emergency faults are as follows:Cell activation failure.Power congestionCode resource congestionTransmission congestionCE congestionDSP soft failure

Troubleshooting MethodIf the fault is not an emergency service problem, you can infer that only some users cannot make calls or access the network. To rectify the fault, do as follows:Step 1Check whether the cell status is normal, namely, whether cell setup is normal, whether the cell is available, and whether the cell uplink or downlink traffic is congested. Run the following commands to query the cell status.DSP UCELLDSP UCELLCHKIf the cell is not activated, run the command ACT UCELL to activate the cell. If the cell activation fails, run the command DSP UCELL to query the failure cause

If the cell is successfully set up, run the command LST UCELLACCESSSTRICT to check whether the cell or the user AC is barred. If the cell is barred, unbar the cell.

Run the DSP UCELLCHK to check whether the cell is in congested state, including power congestion, code resource congestion, transmission congestion, and CE congestion.

Power congestion:Power congestion is classified into uplink power congestion and downlink power congestion. Usually, algorithm 1 is used as the downlink connection admission control (CAC) algorithm, whereas the uplink CAC algorithm is turned off.If the access failure is due to uplink power congestion, run the command DSP UCELLCHK to check whether the RTWP is within the valid range (96.5 dB to 105.5 dB). Run the command LST UCELLCAC to check whether the auto-adaptive background noise update switch is ON. If the auto-adaptive background noise update switch is ON, deactivate the cell, and then re-activate the cell to check whether the RTWP is normal.

If the auto-adaptive background noise update switch is OFF, turn off the uplink CAC switch or reset the background noise. You can use the following command to turn off the uplink CAC switch: MOD UCELLALGOSWITCH: NBMUlCacAlgoSelSwitch= ALGORITHM_OFF;Check the average RTWP value (vs-meanrtwp) in the traffic statistics, and set the background noise to the minimum vs-meanrtwp. The command is as follows:MOD UCELLCAC: BackgroundNoise=61;

The parameter value depends on the site requirements. By default, it is set to 61.Actually configured noise floor = 112 + BackgroundNoise x 0.1If the access failure is due to downlink power congestion, run the command DSP UCELLCHK to check whether the latest TCP value is normal (