1 EXAKT SKF Phase 1, Session 3, Part 1 Statistical concepts.

1

EXAKT

SKF

Phase 1, Session 3, Part 1 Statistical concepts

2

Basic data analysis

3

Chronological error detection

Types of errors detected:1. More than one CM inspection acquired at the same date/time and working

age.2. An inspection having working age lower than a an inspection at a previous

date/time.3. Histories with no inspections.4. Missing ending events (see above).5. Missing beginning events.

4

Descriptive statistics – one variable

1. Mean,2. Standard deviation,3. Coefficient of variation,4. Kurtosis,5. Skewness,6. Minimum, Maximum,7. Range,8. Median,9. IntQ (Inter-Quartile) range,10. Distribution (Percentiles)

Select variables for analysls

5

Set conditions for variables1. Exclude record2. Trimming3. Percentiles4. Histogram parameters

6

One variable calculationsMeasures of central tendency:

1.Mean: 2.Standard Deviation: SD= 3.Coeff. of Variation: V=SD*100%/|M|, if M≠0, and not defined if M=0

4.Kurtosis: 5.Skewness:

Measures of position:

1.Mean: 2.Standard Deviation: SD= 3.Coeff. of Variation: V=SD*100%/|M|, if M≠0, and not defined if M=0

4.Kurtosis: 5.Skewness:

7

Report

8

Descriptive statistics – two variables

9

Report

10

Data transformations

To apply a statistical function to a group of records, your program should (as above) contain:1.the name of the output table (MeansVal) where the output variables resulting from the calculation will be stored, 2.the name of the input table (Inspections) to which the transformation is applied, 3.([Ident]) means that records should be grouped per Ident, i.e., all records with the same Ident will be in the same group and the transformations will be applied separately to each group., 4.definitions of the fields in the output tables, using transformation functions (expressions).

1.AvgT_Gas_Stg3=Avg(T_Gas_Stg3), etc.

11

Other statistical operations

• Skewness(var) calculate the skewness (a statistical measure of the distribution symmetry),

• Kurtosis(var) calculates the kurtosis (a statistical measure of the distribution shape, also useful in vibration analysis),

• NonNull(var) shows the number of the values in the group (values which are defined, i.e. not missing, not which are nonzero),

• Count(condition for var) counts the number of var records in the group which fulfill the condition.

12

Example transformation programs

FeGt1000 = Inspections[Ident] Where (WorkingAge>1000)(FeAvR=Avg(Fe);FeSDR=StDev(Fe);FeVrC=StDev(Fe)/Avg(Fe)*100;CountBigFe=Count(Fe>500))

will produce the table FeGt1000, using grouped records from the Inspections table. Providing some “iron characteristics” of each individual equipment unit. Different groups will consist of records with the same Ident, such that their WorkingAge is greater than 1000. For every group, the average value, standard deviation, coefficient of variation in %, and number of records in that group that is bigger than 500, will be calculated and shown as a separate value.

13

Another EXAKT programStat03 = Inspections[Ident] WHERE ((Ident="5506L") OR(Ident="5506R") OR (Ident="5507L"))(avFe = Avg(Fe);rangeFe = Max(Fe) - Min(Fe))

will produce the table Stat03 with the two columns avFe and rangeFe, but only for three idents, specified in the “WHERE” condition.

14

Another EXAKT programStats04 = Events[Ident] Where (Event="ES")(WaES = Avg(WorkingAge))

will calculate the average working age of suspended histories (ended by ES) with the same Ident (information about the WorkingAge and suspension is in the Events table). If there is only one history for an Ident, it will give its working age. The result from this program is obviously useful only if all histories have WorkingAge starting from 0 (zero) (this problem can be solved using history specific transformations, see next section).

Stats04 = Events[] Where(Event="ES") will calculate the average value of all working ages in the table Events, ended by ES. Here "[ ]" (no specification within brackets) means that all records from the table are included in one group (no particular grouping is specified), with the additional condition from the Where expression that they should end by ES. This program will produce one singe number, because all records are now in one group.

15

History specific transformations // OutputVarScript from Tutorial 3

PrevSed1=Sed-Diff(Sed); //transforms value of the sediment into its previous value.CorrSed1=Sed*(Sed>0)+(Sed=0)*PrevSed1; //use actual value for Sed>0. Otherwise use previous Sed PrevSed2=CorrSed1-Diff(CorrSed1); //transforms CorrSed1 into its previous value CorrSed=CorrSed1*(CorrSed1>0) +(CorrSed1=0)*PrevSed2; //gets rid of any remaining 0’s (two consecutive missed tests)// general tranformationsLogSed=Log(1+CorrSed); LogFe=Log(1+Fe); //simple general transformationsCorrSi=Si*(Si<>900)+1.2*Fe*(Si=900); //convert all Si = 900 to 1.2 x Fe

PrevSed1=Sed-Diff(Sed); Diff(Sed) calculates the difference between the current and previous value for variable Sed. With this transformation we have created a new column "PrevSed1" of transformed values. The new variable values are equal to the current value of Sed minus the change in the value of Sed since the previous oil sample. This transforms the value of the sediment into its previous value. We had a reason for making this transformation, as is apparent on the next line:CorrSed1=Sed*(Sed>0)+(Sed=0)*PrevSed1;Using this technique, we deal reasonably with missing sediment tests.

16

History specific transformations

Have two purposes:1.To include new (transformed) variables in

the list of diagnostic variables as an input for modeling, and

2.To create a table which includes transformed variables to be used for purposes other than an input for modeling, for example for statistical analyses

17

History specific transformations History specific transformations require:1. Chronologically ordered data, and2. The history number - an identification of separate histories.

Histories need not be identified in the input data tables.EXAKT will use the installation and removal dates from the

Events table to define and apply the variable HN (history number) to each history.

All records in the first history of a component at a given location will have HN = 1, in the second history HN = 2, and so on.

Ident and HN together will uniquely identify each history.The HN variable will be calculated and included in the

intermediate data tables: C_Inspections, C_Events, and History.

18

The basic history transformation functions are:

1. First(var) , Last(var), 2. Diff(var) , Rate(var, timevar) , RateD(var, timevar), 3. Cum(var) , CumRate(var, timevar) , NonDecr(var), 4. SmoothAve(var, timevar, const) , SmoothLWAve(var, timevar, const), 5. SmoothQWAve(var, timevar, const), SmoothLin(var, timevar, const), 6. SmoothLWLin(var,timevar, const), SmoothQWLin(var,timevar,const), 7. Smooth(var, const),

Example: Oil age

HWAge = WorkingAge-First(WorkingAge);OilAge = HWAge-NonDecr(HWAge*(P = precedence for OC))

NonDecr(var) gives the largest value in the history, up to and including the current record.

OCB t

HWage

OC

19

Getting rid of the drooping artifact of Tutorial 4

leakSmooth0=Smooth(LeakRate,WorkingAge,3);

The smoothing of LeakRate uses a smoothing window of three time units. That is, each value of leakSmooth is transformed into what its value would be on a linear regression line fitted to all the points in the 3 time unit window.

However, there is a problem. Near the end the regression line will be fitted to zeros that are within the 3 time unit window but beyond the end of data. This causes the transformed values to decrease artificially. We have to do something about this. The next transformation to leakSmooth solves this problem.

20

Getting rid of the drooping artifact of Tutorial 4leakSmooth=leakSmooth0*(Last(WorkingAge)-WorkingAge>=3) +NonDecr(leakSmooth0)* ((3-(Last(WorkingAge)-WorkingAge))*.01+1)*(Last(WorkingAge)-WorkingAge<3);

The first term is operative for all points that are located 3 time units or more back from the last point in the history.The second term applies to all points within 3 time units of the last point in the history. It may be simplified as:

NonDecr(leakSmooth0)*Factor*(Last(WorkingAge)-WorkingAge<3);

It reads: The largest value of leakSmooth till now, multiplied by some factor, where we are less than three time units from the end.The factor = ((3-(Last(WorkingAge)-WorkingAge))*.01+1)Its values are: at end Factor is (3 - 0)*1.01 = 3.03at 1 back from end, (3 - (Last - (Last -1)*1.01 = 2.02at 2 back from end, (3 - (Last - (Last -2)*1.01 = 1.01at 3 back from end, (3 - (Last - (Last -3))*1.01 = 0

This guarantees that LeakSmooth will not droop.

21

The statistical model

consists of:

1. the PHM, and

2. the transition probability model

22

The PHM • γi=0 means that the related covariate does not have any significant

influence on the hazard function.

• γi≠ 0 means that the related covariate shows significant influence on the hazard function.

• β = 1 indicates that age as a variable does not influence the hazard function.

• β ≠ 1 means that time directly contributes to the hazard function, in particular β > 1 means that time contributes to the deterioration of an item and β < 1 means that time contributes to its improvement (decreasing of the hazard function). Note that in the latter case it does not necessarily mean that the item will improve with working age, because the covariate values can still indicate an overall deterioration.

• The scale parameter η scales the time axes. If equipment deteriorates or does not improve with time (β ≥ 1) then η gives roughly the average time to failure if the items stay in good condition (covariates have no effect). With influential covariates, η no longer gives the average time to failure and the difference between η and the average time to failure will be larger. In this case there is no simple formula for the average remaining life, but the program calculates it (RULE).

23

An example)(012.0)(1467.0

1166.4

11

4356043560

166.4)( tZtZe

tth

Shape parameter a strong influence of working age on deterioration. (β ≥ 2 can be considered as a strong influence).

Covariate parameters γ1= 0.1467 and γ2= 0.012. A small value for a parameter does not necessarily mean the corresponding covariate will have a small effect on the hazard function. A formal statistical test, applied by EXAKT, is needed to judge whether a covariate parameter is significantly different from zero.

Scale parameter value η = 43560h is much higher than the average life, calculated from the data as about 6500h. This shows a strong influence of the covariates on the equipment's actual average life. The formula for the average life when accounting for significant covariates is not simple. EXAKT performs the calculation of (conditional) average life from the model parameters and current condition.

24

PHM summary

“Shape = 1 tested” indicates that for the shape parameter β, the hypothesis that β = 1 was tested by the software

The Wald Test checks various hypotheses. For example whether the difference between an assumed and estimated parameter value is significant or not, reporting an appropriate p-value.

For every covariate parameter γi in the model, the hypothesis that γi = 0 was tested automatically by EXAKT using the Wald test.

25

Residual analysis

• Graphical techniques and statistical tests are provided in EXAKT for the purpose of uncovering evidence that some data points may not be well represented by the estimated model.

• The method of Cox-generalized residuals is applied to test model fit. Residuals r1, r2, ... are calculated for every event time t1, t2, .... (either failure or suspension). The procedure checks then whether the residuals r1, r2, ... themselves follow statistically a negative exponential distribution, as would be expected if the model fits the data.

26

K-S Test

• The K-S test is an overall test looking for evidence that some data points may not be well represented by the model ('model fits data' for short) is performed automatically by EXAKT. The report of Figure 11 indicates whether the hypothesis that 'model fits data' is accepted or rejected, using the K-S test.

27

5% Significance level A statistical concept from hypothesis testing. It is defined as the probability of making a

decision to reject the null hypothesis (that the data supports the model) when the null hypothesis is actually true (also known as "Type I error"). The decision is often made using the p-value: if the p-value is less than the significance level, then the null hypothesis is rejected. The smaller the p-value, the more significant the result is said to be.

Popular levels of significance are 5%, 1% and 0.1%. If a test of significance gives a p-value lower than the significance level, the null hypothesis is rejected. Such results are informally referred to as "statistically significant". For example, if someone argues that "there's only one chance in a thousand this could have happened by coincidence", a 0.1% level of statistical significance is being implied. The lower the significance level, the stronger the evidence.

A small significance level has both advantages and disadvantages. A smaller significance level gives greater confidence in the determination of significance, but runs greater risks of failing to reject a false null hypothesis (also known as "Type II error"), and so has less statistical power. The selection of a significance level inevitably involves a compromise between significance and power, and consequently between the Type I error and the Type II error.

When EXAKT indicates that a model is "Rejected at the 5% significance level" (or "Not rejected at the 5% significance level"), it is saying that we would be wrong 5% of the time by rejecting the model that really does represent the data.

28

Derived covariates

This will significantly increase the speed of calculation.

29

Derived covariates - lags

Whenever the option "Add Derived Covariates" is used, the program will save the previous model without change. However, it will include a new command line in the EXAKT program, OutputVarScript. Then EXAKT will create a new submodel name, and return the user to the the model building step. The newly created covariate will now appear in the list of covariates available for modeling. Keep adding derived covariates in this way.

1 EXAKT SKF Phase 1, Session 3, Part 1 Statistical concepts.

Documents

Transcript of 1 EXAKT SKF Phase 1, Session 3, Part 1 Statistical concepts.