[IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA...

10
Diagnosis of multi -descriptor condition monitoring data Veli Lumme Institute of Machine Design and Operation Tampere University of Technology Tampere, Finland veli.lumme@tut.fi Abstract-Condition of equipment can be presented by a series of descriptors derived from the raw data. Typically a great number of descriptors are needed and they might not be commensurable. Neural networks can effectively be used as a diagnostic tool to analyze the data for anomalies and known faults. Proper pre- processing of descriptors related to a specific machine condition offer an opportunity to automatically learn typical failure patterns and use this experience to diagnose any similar conditions in other machines operating in comparable environments. It is important to understand that the descriptors not only contain information on the type of the fault, but on the severity as well. Therefore the prognosis of failure severity can be based on the experimental data instead of an imprecise statistical approach. This paper presents several patented solutions for automating the diagnostic and prognostic processes using neural networks. Keywords-diagnosis, prognosis, neural networks, SOM I. INTRODUCTION Vibration data contains concealed information, which can be revealed by using a number of analysis techniques, such as spectrum analysis, envelope detection, symptom extraction, etc, where aſter information can be expressed as series of numerical values sometimes referred to as symptom vectors. It can be said that the symptom vectors contain condensed information of the machine condition at the time of the data collection. Other kinds of knowledge can now be used in diagnosis. We know that some symptoms or actually syndromes mean that some faults or other problems exist in the object under investigation. How do we know? Knowing something is a result of a strong previous experience. We do have evidence om previous occurrences of similar cases and can use this knowledge to interpret any closely resembling syndromes to reach the same diagnosis. Sometimes there is no previous strong evidence available. If the syndrome resembles somewhat any of the previous cases, an assumption with some degree of uncertainty or probability, can be made that it still belongs to the same population of syndromes and should have the same interpretation. Small changes in symptoms, which are more than normal in industrial environment, are allowed without a syndrome losing its interpretation. Aſter several similar cases the uncertainty is reduced. 978-1-4244-9827-7/11/$26.00 ©2011 IEEE In other cases, the syndrome can be significantly different om what has been experienced before. It is novel to the expert and he cannot directly interpret it to any known machine conditions. Creative intelligence is now necessary. Sometimes strong evidence is only available a long time aſter the syndrome was first detected. Combining knowledge and intelligence becomes even more important, when trying to estimate the severi of a fault or a problem. Two faults seldom develop in a same manner, even though the conditions might be almost the same. Also it is typical that two faults, one perhaps being the primary cause and the other one a consequence, occur at the same time. This makes the use of rules extremely difficult in an automated system. If in addition the monitoring is performed in a remote location, the supporting data and information that are available at the plant, does not exist in the place, where the diagnosis and prognosis are performed. For definition of terms used in this paper, see references [1] and [2]. II. CLASSIFIER A. Training a classier The purpose of classification is to locate a class, where a data sample finds closest resemblance to similar samples. In order for the classifier to work, it must first be trained with known data samples. As a part of this process a number of classes will be created. Each class represents a generalization of the data samples used to train it. During the learning process a weight vector will be generated for each class. The weight vector in a hyperspace will reside in the geometric center of samples used for training and will be used to find the closest class in a testing process. The classes will also have boundaries based on the samples used. An example of a two-dimensional classifier is given in Figure 1, which shows the forming of three classes with their geometric centers (shown as circles) and boundaries. The three solid lines in the example give the class boundaries that have been formed using the shortest Euclidean distance method. These boundaries can be used to define the best matching class for any new data sample. More precise class boundaries can be

Transcript of [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA...

Page 1: [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA (2011.06.20-2011.06.23)] 2011 IEEE Conference on Prognostics and Health Management - Diagnosis

Diagnosis of multi -descriptor condition monitoring data

Veli Lumme

Institute of Machine Design and Operation Tampere University of Technology

Tampere, Finland [email protected]

Abstract-Condition of equipment can be presented by a series of

descriptors derived from the raw data. Typically a great number

of descriptors are needed and they might not be commensurable.

Neural networks can effectively be used as a diagnostic tool to

analyze the data for anomalies and known faults. Proper pre­

processing of descriptors related to a specific machine condition

offer an opportunity to automatically learn typical failure

patterns and use this experience to diagnose any similar

conditions in other machines operating in comparable

environments. It is important to understand that the descriptors

not only contain information on the type of the fault, but on the

severity as well. Therefore the prognosis of failure severity can be

based on the experimental data instead of an imprecise statistical

approach. This paper presents several patented solutions for

automating the diagnostic and prognostic processes using neural

networks.

Keywords-diagnosis, prognosis, neural networks, SOM

I. INTRODUCTION

Vibration data contains concealed information, which can be revealed by using a number of analysis techniques, such as spectrum analysis, envelope detection, symptom extraction, etc, where after information can be expressed as series of numerical values sometimes referred to as symptom vectors. It can be said that the symptom vectors contain condensed information of the machine condition at the time of the data collection.

Other kinds of knowledge can now be used in diagnosis. We know that some symptoms or actually syndromes mean that some faults or other problems exist in the object under investigation. How do we know? Knowing something is a result of a strong previous experience. We do have evidence from previous occurrences of similar cases and can use this knowledge to interpret any closely resembling syndromes to reach the same diagnosis.

Sometimes there is no previous strong evidence available. If the syndrome resembles somewhat any of the previous cases, an assumption with some degree of uncertainty or probability, can be made that it still belongs to the same population of syndromes and should have the same interpretation. Small changes in symptoms, which are more than normal in industrial environment, are allowed without a syndrome losing its interpretation. After several similar cases the uncertainty is reduced.

978-1-4244-9827-7/11/$26.00 ©2011 IEEE

In other cases, the syndrome can be significantly different from what has been experienced before. It is novel to the expert and he cannot directly interpret it to any known machine conditions. Creative intelligence is now necessary. Sometimes strong evidence is only available a long time after the syndrome was first detected.

Combining knowledge and intelligence becomes even more important, when trying to estimate the severity of a fault or a problem. Two faults seldom develop in a same manner, even though the conditions might be almost the same. Also it is typical that two faults, one perhaps being the primary cause and the other one a consequence, occur at the same time. This makes the use of rules extremely difficult in an automated system. If in addition the monitoring is performed in a remote location, the supporting data and information that are available at the plant, does not exist in the place, where the diagnosis and prognosis are performed.

For definition of terms used in this paper, see references [1] and [2].

II. CLASSIFIER

A. Training a classifier

The purpose of classification is to locate a class, where a data sample finds closest resemblance to similar samples. In order for the classifier to work, it must first be trained with known data samples. As a part of this process a number of classes will be created. Each class represents a generalization of the data samples used to train it.

During the learning process a weight vector will be generated for each class. The weight vector in a hyperspace will reside in the geometric center of samples used for training and will be used to find the closest class in a testing process. The classes will also have boundaries based on the samples used.

An example of a two-dimensional classifier is given in Figure 1, which shows the forming of three classes with their geometric centers (shown as circles) and boundaries. The three solid lines in the example give the class boundaries that have been formed using the shortest Euclidean distance method. These boundaries can be used to define the best matching class for any new data sample. More precise class boundaries can be

Page 2: [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA (2011.06.20-2011.06.23)] 2011 IEEE Conference on Prognostics and Health Management - Diagnosis

formed based on the trammg data samples (shown as solid rectangles). These boundaries are shown as dotted rectangles.

P" -- - -- --D b D : : 0 ('f d - - ---

Q----- - - - �

· ':::-rJ . . d L - ..D - - - -

U --:- - .q . ·

Q __ '___ _D __ �

• • b • 1____ __ .Ll

Figure 1. Neuron weight factors and boundaries

B. Testing with a class!fier After training of a classifier it can be directly used to

interpret any new data samples. A best matching class (neuron) shall first be located. If the individual symptoms of the data sample fall within the class boundaries, the sample is considered known, and the interpretation shall be the same as with the other members (training samples) of the class. The sample might fall just outside the boundaries, but could perhaps be considered to belong to the same class. The class boundaries should be updated accordingly. If the data sample does not appear to belong to any of the classes, it is considered to represent an unknown or novelty state.

� - - - -- - -- -. • D •

• •

• •

• • •

L ______ _ • • • • • • 1 ______ ----

o

Figure 2. Testing of a classifier

The classifier gives information on which class the new data sample belongs to and if it has any novel features. In case

the class has been calibrated, the same interpretation IS

automatically also given to the data sample.

C. Re-training of a classifier In a novelty case there are usually several data samples

available to be used for retraining the classifier by appending new classes.

After retraining the classifier can identifY and diagnose new similar samples.

r---------• •

• •

• •

• •

• •

• • ----------- - --,

• •

L_______ •

• • • • • • • 1 ______ ----

Figure 3. Re-training a classifier

D. Characteristics of a self-organizing map

• • •

The self organized map displays the neurons (classes) so that neurons, whose weight vectors are nearly same, are organized close to each other [3]. The neurons with a greatest deviation are organized in the opposite corners of the map.

The map is a two-dimensional presentation of a multi­dimensional data in hyperspace. Typically the system attempts to use all of the neurons efficiently. This means that the data is scattered over the whole area of the map. Depending on the data there might be clusters of neurons, which can be identified by the similarity of weight factors. At first glance it may appear that the map is fully occupied after the training with initial data. However, when the classifier is re-trained with new data, cluster of neurons with close neighbourhood (resemblance) will be organized so that smaller number of neurons will be occupied. The appended data from a novelty condition will make room for new classes on the map. Theoretically this means that the boundaries of the old classes will be broader.

Some neurons have not learned from any of the data samples directly. They are typically displayed with a white background and should not be used for testing purposes. These untrained neurons tend to form a border between clusters of neurons deviating significantly from each other.

Page 3: [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA (2011.06.20-2011.06.23)] 2011 IEEE Conference on Prognostics and Health Management - Diagnosis

Each neuron will have a weight factor, mInImum and maximum value for each variable. The weight factors shall be used in a testing process to locate the best matching neuron for a new sample. The minima and maxima shall be used to test, if the variables of the new sample fit into the boundaries of the neuron. If so, the new sample is considered to belong to the same group of data samples used for training. If any of the variables falls outside the boundaries, the new sample has novelty features.

A tree-structured SOM (TS-SOM) has been developed to search for the best matching neurons more quickly. It consists of a selectable number of neurons in the power of four (I, 4,

16, 64, etc). In a tree structure the top neuron has four sons, who again have four sons and so on. The search process is accomplished by searching the tree-structured classifier hierarchically from the top down both during the training and testing processes.

- __ w ---

Figure 4. A Tree-Structured Self-Organizing Map (TS-SOM)

For example, on a map with 64 neurons we need 64

searches to find the best matching neuron (BMN). When using hierarchical search, we first locate the BMN on the second layer, which takes four searches. We then locate the BMN among the sons on the third layer, which takes another four searches. Finally we do the search on the lowest layer, which sums up to only 12 searches. This efficiency is emphasized during the training process, which involves several iterative search loops.

III. DATA PRE-PROCESSING

A. Challenges

There are several general challenges that need to be considered in data analysis before inputting the data in the classification. These include missing or erroneous data, missing feature values, irrelevant data, random values, outliers etc. The data analysis may lead to wrong conclusions, if these aspects are not recognized and handled properly.

An additional concern is the presentation of features in two or more different units. In conditioning monitoring applications we may wish to analyze vibration velocity and acceleration values or vibration and process values simultaneously. Some feature values may vary between 0.1 and 5 and others between 20 and 100. Without data pre-processing the former values will probably lose their significance in the analysis. In order take advantage of all feature values the data should be normalized.

By definition a symptom is a perception made by means of features (descriptors), which may indicate the presence of one

or more faults with a certain probability. In other word, if a feature value stays stable, it does not indicate a presence of a fault and therefore is not a symptom.

One of the goals of this project is to generalize the data collected from different objects. This would allow use the data related to a specific fault in a single machine in order to benefit the other machines in comparable conditions. Machines are individual and the feature values may not be same, even if the conditions are equivalent. Use of symptom values instead of feature values will likely solve this problem.

B. Missing data

In an ideal situation a complete set of data including all faults in various progressions for all machines is available. This way the system could be trained to identifY all potential states. In practice this is impossible and especially fault data will typically be missing in the beginning. Also it is more than likely that all of the normal states cannot be experienced during the initial training period. Due to the ability to retrain the system, this is really not a major problem. Whenever data that was initially missing is tested with the system, it will appear as a novelty and as such will be drawn attention to.

C. Erraneous data

Errors in data might be generated during the data acquisition and collection. The use of erroneous data is difficult to handle. In fact the neural network system as such has no means to detect errors in the data. The errors may, however, cause novelty observations and would be investigated closely. Data values should be evaluated during the pre-processing. Any abnormal set of data should be discarded from further processing. In a condition monitoring application the changes in data values between successive measurements are typically small. Even during a fault progression the feature values change slowly. If a significant change or deviation is detected, a re-measurement could perhaps be taken. If the change is not permanent, the data should be discarded.

D. Missingfeature values

Some of the feature values might be missing from the data set because of various reasons. It is not desirable to discard the whole data set because of this. These values should not be replaced by zero, which has a definite value. This would result in errors during data analysis. The system allows inserting a Not a Number value (NaN), if a feature value is not available. A NaN value causes no operations to be performed for the variable.

E. Outliers

Sometimes a collected value looks random. The value might deviate significantly from the successive values taken in comparable situations in the same position. In such case, the measurement should be re-taken or the value replaced by NaN.

F. Imprecise values

The values may be imprecise for several reasons. Some of the values may suffer from poor dynamic range. This causes low amplitude values to fluctuate relatively strongly from the

Page 4: [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA (2011.06.20-2011.06.23)] 2011 IEEE Conference on Prognostics and Health Management - Diagnosis

average value. A small change in the actual value causes a significant change in the collected value. This may result in unexpected classification results depending on the type of normalization.

G. Irrelevant data

In some cases the collected data may include samples that are not representatives to the current condition of a machine. In particular this happens at low speeds. The system can handle irrelevant data, but neurons in the classifier will be unnecessarily occupied.

H Normalization

The purpose of normalisation is to isolate statistical error in measured data by making it commensurable. Normalization refers to the division of multiple sets of data by a common variable in order to negate that variable's effect on the data, thus allowing underlying characteristics of the data sets to be compared. This allows data on different scales to be compared, by bringing them to a common scale.

It offers several benefits:

• values in different units made commensurable

• values with different ranges of change

• symptom extraction from descriptor values

• generalization of data from various sources

Berthold & Hand advice to use the standard score method with principal component analysis [4]. In statistics, a standard score indicates, how many standard deviations an observation or datum is above or below the mean. It is a dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. The data can be normalized with a standard score by carrying out the following steps.

Centre each variable by subtracting the mean of each variable to give

(1)

Divide each element by its standard deviation

(2)

The standard score method was tested with SOM and found successful. The method in fact makes data commensurable by being dimensionless and equalizing the ranges of changes. With the centring of the variables the original feature values Xi]

turn to symptom values Zlj.

IV. GENERALIZATION

The estimation of the condition of an individual machine is difficult, because the reference data are generally deficient. The amount of existing measured data is scant especially at the initial phase and such data are not available in all operational states of the machine. Therefore, the determination of a deviation cannot be based on the historical data, but general information has to be utilized instead. Consequently, the determination of a deviation is uncertain and inaccurate.

The absence of empiric data is a special disadvantage in the identification of faults. In practice, identification of faults is based on known symptom rules, which have been widely published. In many cases, the rules are of a general nature and are not based on the measured values obtained from the machine in question and on symptoms extracted from them. Finnish patent FI 102857 discloses a method, whereby a system can be made to learn from measurement results [5]. However, the problem is that in order to perform a fault diagnosis, an individual machine must first experience all the faults that the system is expected to identifY. This is not possible in practice.

This is one of the reasons, why no viable remote diagnostics systems have been developed. Even if some kind of systems of this type do exist, they are only capable of solving simple diagnosing tasks based on using a single symptom, but they are unable to handle syndromes.

European patent 1292812 discloses a method to eliminate the above mentioned disadvantages. Measurements are performed on a maximal number of preferably but not necessarily identical machines or machines of substantially the same type to obtain quantities descriptive of the operation and condition of machines.

The essential point about the use and functionality of the patented method is that the database should be as large as possible and to contain characteristic vectors descriptive of different operational states of machines in question in as large an area of application as possible.

V. PROBABLILITY OF A F AlLURE

There are several points that should be taken into consideration, when the probability of a failure is evaluated. This is equally important, when using a neural classifier for fault mode and severity assessment.

First, it is a common assumption that the probability of a failure is proportional to the magnitude of descriptors. In many cases this might be true, but more often the fault progression can be seen as the change in the symptom distribution, i.e. in the syndrome. Depending on the fault mode some symptom values may or may not increase, stay steady or decrease in magnitude. Therefore the estimation of the probability of a failure cannot rely on the magnitude of a single symptom only.

Also the relationship between the severity of a fault and the symptom magnitude is often not linear. Usually this relationship is not known. Even, if the failure has happened before and the symptoms were recorded in detail, the next occurrence of a same failure mode might not give the same

Page 5: [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA (2011.06.20-2011.06.23)] 2011 IEEE Conference on Prognostics and Health Management - Diagnosis

symptom values. It is therefore often considered satisfactory, if we can estimate the probability of a failure at a certain time. This can typically be expressed as a probability that a machine will operate without failure for a given period.

Some additional uncertainty is added, when we take into consideration the ability of the vibration analyst to diagnose the symptoms to the fault modes and severity. The symptoms might have been noticeable, but the analyst could not interpret them. He could perhaps diagnose the fault mode, but not the severity. If the fault has been misdiagnosed once and was not verified, it could be misinterpreted again.

Another problem is introduced, when two or more fault modes appear simultaneously. Some of the symptoms might be a result of a root cause, such as imbalance or misalignment, while other symptoms might be related to the consequential fault modes, such as a bearing defect.

A classifier relies on the weight vectors that have been defined using data samples with syndromes. In order for a classifier to give a plain text diagnosis and severity as an output, it needs to be calibrated. This means that all classes, which were trained using data samples from a known condition, should be made identifiable. Typically the training samples used to train a single class would have a same interpretation. If not, the symptom extraction might have been imperfect. In other words, the symptom extraction chosen could not differentiate between two (or more) fault modes and should be improved. If this is not possible, the interpretation of a new sample falling into this particular class, would mean that in some probability the new sample represents any or combination of all possible fault modes.

The probability of a failure is very difficult to be defined as an absolute value. Fuzzy logic may here be useful. Let's assume that the classifier has been calibrated and verified correctly so that each class has been given a correct interpretation. This should include the fault mode or modes, if there are many in a single class, and the severity. The description of a fault mode can in many cases be deterministic, but for severity fuzzy groups may have to be used. The calibration should be verified using strong evidence, such as confirming the fault mode and severity by other methods.

When interpreting a new sample, a classifier looks for a best matching neuron, which obviously will always be found. When comparing the new sample with the samples used to train the class, an assessment can be made on the membership within the class. If the new sample falls close to the geometric centre of the training samples, the membership is high. If on the other hand, the sample is further away from the centre than any of the training samples, the membership is low. Taken that the classifier has been accurately calibrated, the confidence of diagnosis would then be high or low consequentially.

The TS-SOM algorithm used gives the weighting factors, for each descriptor. In addition the maximum distance of the far most training sample from the centre will also be calculated. If a new sample's distance from a centre is close to the maxImum distance, the confidence of diagnosis has been reduced.

Membership functions are typically used in fuzzy logic, but let's define a useful function for classification. We may take a linear approach by defining that we have a maximum (100 percent) membership (confidence of diagnosis) at the medium and a defined (for instance 50 percent) membership at the maximum distance. This can be expressed as follows:

,,= ma."'t -" -- ; 0, S'rmlx. (3)

where is the membership value, c is the membership constant, s is the new sample's distance from median and Smax is the distance of furthest away training sample from the median.

The calculated distances are Euclidean distances from the median and therefore always of a positive value. When s is zero, i.e. the sample is at the median, the membership value will be 1 (100 percent). When s is Sma" the function will return l-c as the membership value. For any distances greater than smax/c, the membership will be zero. A value of .5 is recommended for the membership constant. This will result in a 50 percent membership at Smax and zero membership at distances longer than two times Smax.

A linear model might not be ideal, because it results in reasonably high membership values at a long distance from the maximum and low membership values close to the median. A better solution would be a sigmoid function, which offers an "s" type membership descriptor. A traditional sigmoid function can be expressed as follows:

(4)

In order for the sigmoid function to produce a desired response in this case, it has to be modified slightly. Let's assume again that a 50 % membership is received at the maximum distance. Let's also adjust the parameter t as follows.

(5)

where c is a skew factor, s the distance from median and the Smax the maximum distance from median.

A skew factor of 5 would yield a membership function given in Figure 5. The function gives 99 % membership at the centre, 50 % at the maximum distance and 1 % at two times the maximum distance.

Page 6: [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA (2011.06.20-2011.06.23)] 2011 IEEE Conference on Prognostics and Health Management - Diagnosis

���::::����========================= :: ++-----------��� -----------------------M� +_-------------,,��--------------------..,.. "-� .. +----------------� �,,-----------------­��+-------------------�� -----------------�.. "-M� +_---------------------�� __ ------------

�S t_�� __ --���--��-"������

C;" I I I I I

Figure 5. Membership function

During this experiment it became obvious that some additional methods should perhaps be applied to determine the confidence level from the membership function. For instance, if several data sets have been used to train a neuron, the confidence level should be higher than in a case, where only a single or a few data sets were used. On the other hand one should always pay attention to cases, where a best matching neuron is calibrated as representing a severe failure mode even, if they may contain novelty characteristics.

VI. EXPERIMENTS WITH DATA

A. General notes The machine condition data consisted of results from

lubrication, temperature and vibration measurements, but the main emphasis was on the vibration data. The rotation speed of the machine was used as an explanatory variable only, but not as a feature.

The vibration data consisted of 126 descriptors extracted from a vibration spectrum. The data sets typically consist of approximately 1000 samples taken 90 minutes apart in a total period of 60 days. The data was collected, when the machine was operating in a normal condition and in a known fault condition.

Several sets of data were available for the same machine or similar machines in the same location. These data sets related to different machine conditions were combined and handled as one data set. The following figures illustrate examples of the SOM generated from various data sets.

B. Data analysis The map in Figure 6 is based on the 1766 data samples. It

can be seen that a great number of training samples fall in the neurons in the left top corner. The number of training samples can be seen in the brackets. These samples are related to the low speed data, which is irrelevant to the machine condition.

The map also shows several distinguishable clusters of neurons. These can be easily identified as particular conditions of a machine. Successive data samples seem to fall in the same clusters, which is to be expected. Knowing that there are two separate fault conditions (bearing failures) in the data set, it is fairly easy to follow the progression of these faults on the map.

Note that the progression of a fault will first show differently in a real application. The first occurrence of a fault symptom will cause a novelty condition, because a full

matching neuron cannot be found on the map. Upon re-training the map will then be reconstructed to include neurons for the fault condition.

Figure 6. Vibration data - Symptom map

The two faults show slightly differently on the map. In both cases the fault progresses through neurons in the red circle close to the bottom left corner. In fact both failure paths go through neuron number 195 (on 20.0 I and 01.02). It can be concluded that the symptom patterns have a great resemblance. It can also be concluded that the failure progression proceeds to the very leftmost bottom corner and therefore is more severe than the other one. These perceptions give a strong evidence to prove that two data sets can be combined to form a generalized basis for a common classifier.

f'�" �U'

� - �

F"" 1"'" �� r'" I�-'

-,�, r�

f"'" i""""

) I

. � �= ..... .....

I""'" I ..... r"'III" � fii � � J. � � ... �

, I�-" >= """1 '-'''1 f!'-'!!' I� .......... -- ---..........

Figure 7. Vibration data - Symptom map with low speed data excluded

Page 7: [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA (2011.06.20-2011.06.23)] 2011 IEEE Conference on Prognostics and Health Management - Diagnosis

Figure 7 shows a map with the same data, when all data samples related to low speed have been discarded. This can be seen as a smaller number of neurons in the top left comer of the map. It can also be noticed that the clusters no longer are as much distinguishable as in the previous figure. This is not a problem, because the system will work on the classes and not on the clusters.

The data from the first turbine consisted of 4369 data sets (events) and from the second turbine 2433 data sets. The initial training data consisted of 1952 samJ?les from a period between September 1 st and December 30 2008. Before using the classifier, the data was pre-processed as discussed before.

This experiment aims to demonstrate and explain the normal process of initial training, novelty detection and re­training. During this process methods to improve the application, automation and visualization of the classification are presented. At the same time the document attempts to highlight some of the problems and concerns that could be encountered.

Figure 8 illustrates the classification of data on a two dimensional symptom map after initial training. The data appears to be somewhat unevenly distributed on the map. The lower right comer classes represent the data sets with very low symptom values. The opposite corner represents data sets with higher values.

Our attention is drawn on neuron #85, which appears to be separated from the other neurons. The weights of this neuron are presented in Figure 9. The values are extremely high. The neuron has learnt from a simple sample "18.12.2008 16:30" and therefore would probably be considered as an outlier in a practical application. Just for curiosity, let's examine this neuron more closely.

The neuron and data sample used to train it are clearly distinguishable from the other samples. An expert should now evaluate, if these symptoms indicate a problem in a machine condition or are a result of a measurement or a calculation error. The other samples on the same date fall into neurons quite far from neuron #85, which would support the second option. However, for demonstration purposes this sample will be left in the training data, even if in a practical application it would probably be discarded. The phenomenon related to this data is handled as Case I and the neuron #85 is calibrated respectively. For future purposes the weights of neuron #85 were saved.

In the beginning of the year 2009 after the training session new data sets are being classified against the map. During the first three weeks all data sets are classified in other neurons than #85. There were several violations of symptom boundaries, but in all cases these were related in the values below the neuron minima derived from the training samples. This is most probably caused by the large number of symptom values (126) in a single data set. In this particular case the data from the three first weeks after initial training were not saved and no re-training was performed.

Novelties exceeding the maxima can be observed, when entering week 04 of 2009. The same progress can be seen on week 05. Because of this the data from these two weeks was

saved and added to the initial training data. Throughout this experiment all data in a period is added to the training data. In a practical application it would be advisable to save only novelty data. The classifier was re-organized and shown in Figure 10.

� - r"'"'

I

....

.. I � !!I!I - ..., "'"' r-

"'"

0 I""" ..., r-

� r- """'I r-

Figure 8, Symptom map after intia! training

r== ..I ... III !.o III .Jih-_

Figure 9, Symptom weights for neuron #85

P II I"""'"

JI tl

- 1-

!""- - - 1-

JI �

r- I ...... !"""- ......

-r""" 11"""1 I"""'" -

r-' 1'""""1

r--'" Il"""'I r-- ....

�"

D I ...... r"'"

I""'" - r"""" -

Figure 10, Symptom map after re"training induding weeks 04"05

Page 8: [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA (2011.06.20-2011.06.23)] 2011 IEEE Conference on Prognostics and Health Management - Diagnosis

Using the weights of the neuron #85 on the original map a best matching neuron was located again in neuron #85 on the new map. Obviously the weights of this neuron would be the same as after the initial training, because there is only one hit in the neuron.

After re-organization of the map some neurons in the top left corner now represent different kinds of data sets. The weight values in neurons #90 and #97 are now considerably higher than before. Using human expertise one might conclude that these neurons represent an early failure mode. By analyzing the symptom weights it can be seen that the distribution of symptoms (syndromes) are significantly different suggesting that they represent different failure modes. For future reference these cases are identified as 21 and 22 and the neuron weights will be saved.

The testing was continued on weeks 06 to 07. A further increase in symptom values was observed as a large amount of new novelties appeared. None of the data sets were classified to any of the calibrated neurons (#85, #90 or #97), but there were several hits (with novelty characteristics) to the neighboring neurons # 101 to # 1 08. This suggests that these neurons should have been calibrated after the previous training process. This would have been obvious, if all neurons were analyzed separately. This process could also be automated by using a weight norm calculation, which gives the following sequence for the neurons: #85, #111, #97, #101, #102, #90, #103, #105, etc). Also calculation of the neuron distances can be useful in determining, which neurons belong to the same cluster. Obviously neighboring neurons would make a cluster as long as the distance between those is short. On the other hand distinguishing between two modes is rather difficult. It should be understood that fuzzy rules apply, when calibrating a neuron. Using expertise and the distance between two neurons a certain syndrome could for instance be interpreted as an early failure mode, while a slightly different syndrome is interpreted as a normal mode. One has to draw a line somewhere.

The clustering approach might give some benefits to visualize the relationships between neurons. However, the map should be understood as a dynamic tool changing after each re­training, which might be many in the case of progressive failure modes, such as in this study. Because of the excessive number of novelty states the data from weeks 06 and 07 was saved and the map was re-organized. The classifier after the training is shown in Figure II.

Some interesting observations can be made. First the new neurons trained by the novelty data pushed case I from the top left corner (#85) towards the bottom left corner (#247). Six data sets were now used to train neuron #247. Even if the data sets are not close to each other, they are, however, the closest ones. As a result all of the training samples are not close to final weight values of the neuron. In fact the maximum distance within the neuron is quite high. The situation would be different, if there were more data sets available similar to the last one. It is now obvious that this outlier is not a good source of information and should have been discarded in the first place even, if it was taken on a failure mode.

1'"'- i"'� r�· � � � -

II II go<-I r-"" r- � r"'". - .. - I� ..

����i �

- - -

-

i!iiiilI

- "11'1 -

--- """" """" .....

-, - -

..... I"""'l ...... ......

r-' r"" I=� - -

Figure II. Symptom map after re-training induding weeks 06-07

The map after removal of an outlier is shown in Figure 13. It can be noted that new neurons (#85, #86, #87, etc) related to the latest novelty data have appeared.

....",

I r- r-' - I� � I�I I""" r""" ..... """"'I

� r- ....., - -

� I"""" - I"""'" - """'" 1"'"""'1

r'" ..... i"'"

!""-I

r-- ......, ..... I ..... I'"""" - -

F"" - -� r- .....

Figure 12. Symptom map after removal of the outlier

As an example the weight values of neuron #85 are given in Figure 13. An expert might conclude that this neuron represents data from a severe failure mode. Eleven data sets were used to train the neuron #85. The first four sets were from week 06 and the remaining seven from week 07.

Page 9: [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA (2011.06.20-2011.06.23)] 2011 IEEE Conference on Prognostics and Health Management - Diagnosis

4 �------------r---�--�----------------

o

_1 L-______________________________________ __

Figure 13. Weights for neuron #85

Let's calibrate these neurons as cases 31, 32 and 33

respectively and we are ready to continue testing. Again, when testing data sets on weeks 08 and 09, a lot of novelties are observed. This would be expected, especially if the failure mode is progressing. There are, however, only a few hits to the calibrated orange neurons and none to the red neurons. It can be concluded that during the fault progression new novelty characteristics have developed in the syndromes.

. �, rn j,;-I-

f-'"

� k !=' -

r-;;:-I- s: ....

� 1/ V .... ..., - r"" r-

V ,...., V

� / rl P"'" 1""""1 I""'" I"""

�::Y r:Jl ..., � r"" -

� ..... I"""" I'"""

!'JIII'f

Figure 14. Symptom map including weeks 08-09

Testing was continued on weeks 10. Almost all data sets hit neuron number #120, which has the maximum norm on the map. The samples do not seem to fit within the neuron minima and maxima and therefore have novelty characteristics. Especially on weeks 13 and 14 the data sets seem to violate the neuron boundaries heavily. In a practical application this would cause several novelty detections with indications of a severe failure mode. This phenomenon ends on week 15, when the machine appears to have been stopped until week 19. Figure 15

summarizes the testing process from week 01 to week 2l.

In a case like this it is extremely valuable to collect data and train the classifier to recognize the progress of failure mode. An expert should attempt to identifY the failure modes either directly from the data or from the symptom weights after re-

training. All data from weeks 10 to 14 has been added to the previous data and the map has been re-organized.

It can be seen that the most recent data sets hitting neuron 120 on the previous map now hit a wide range of neurons in the top left corner. The highest norm is now with neuron 85 and its neighbors. They do thus represent the utmost condition before the machine stop. We have already made a preliminary estimation on the failure severity during the fault progression. At this stage we should re-analyze the symptom weights and calibrate the map carefully.

.... OJ. os - as "" OJ' ... ... 2D

""""

I • "

_I Ii' -i

-

I •

11 12 U 101. 15 � 17 UI 1\1 20 n

I I •

Ii I I

-II

I -

• -

Figure 15. Summary of testing process during weeks 01 to 21

Testing is continued from week 19 on. On a few occasions the neuron maximum for a certain symptom is exceeded causing novelty detection. In most cases, however, the values of the tested data sets are within the neuron limits. None of the calibrated neurons are hit by new data sets. In a practical application this would mean that the machine is in a good condition.

From week 31 on the measurement results have been received from another similar turbine. The results are shown in Figure 16, while using the same classifier.

On 14.09.2009 there are several hits to neuron #157, which is calibrated as Case 22. The symptoms do not, however, fit into the neuron boundaries. Several symptoms violate the maximum values.

The syndromes indicate the possibility of an early stage failure, but the magnitude of symptoms suggests that it could be a more severe problem. Even if the confidence level of the diagnosis (membership in the neuron) is very low, this case should be taken seriously. Let's add the data from week 37 into the training data and re-organize the map.

The data sets previously interpreted as Case 22 were now used to train neuron #163. Case 22 now resides in neuron #154,

which now has the highest norm on the map. An expert should evaluate, where an optimal boundary between the severe and early stage neurons would be. For instance, it seems that neurons 31, 32, 33 and 41 would rather represent data in an

Page 10: [IEEE 2011 IEEE Conference on Prognostics and Health Management (PHM) - Denver, CO, USA (2011.06.20-2011.06.23)] 2011 IEEE Conference on Prognostics and Health Management - Diagnosis

early stage failure mode. This is also attempted using orange circles. The data from the following weeks have been classified using this approach.

I

I 41

I

I I -

'"'" 34

-

43

3!i 3IIi 37 3B 31!1

Figure 16. Summary of testing process during weeks 31 to 51

40

There are a few other hits on weeks 41 and 42 to calibrated (red) neurons. These could be understood as pre-warnings before the major changes on weeks 46 to 48. During these weeks there are several hits to neuron #85. The data sets exceed the symptom boundaries by far and therefore can be understood to represent even more serious failure modes than experienced before.

--

--

Figure 17. Symptom map after week 37

There are a few other hits on weeks 41 and 42 to calibrated (red) neurons. These could be understood as pre-warnings before the major changes on weeks 46 to 48. During these

weeks there are several hits to neuron #85. The data sets exceed the symptom boundaries by far and therefore can be understood to represent even more serious failure modes than experienced before.

VII. CONCLUSIONS

It is possible to perform re-training of a classifier at any time. The user might wish to wait until an event has progressed through all failure modes or he might wish to re-train, whenever new novelty data is available. The re-organization of a map is not time consuming.

Upon re-training one should consider methods to limit the amount of data. In a practical application there would always be a lot of data from normal modes.

Whenever a failure mode is observed, all relevant data sets should be saved so that the fault progression can be identified. When calibrating the neurons, the interpretation should state the mode and severity of a fault. The severity could be expressed in fuzzy terms, such as "low severity", "medium severity" or "high severity".

The novelty detection needs to be carefully designed. When there are 126 symptoms in use, it is more than likely that each data set is unique. By nature there would be a lot of novelty cases, if traditional novelty detection is used. For instance, a symptom value falling below the neuron minimum should probably not raise novelty detection. Also minor crossings of the maximum could perhaps be allowed.

It appears that using training data from one wind turbine can be used to classifY data from another wind turbine. There will be unique characteristics, but it seems that a fault on the second turbine could be identified by the classifier trained with a data from the first turbine.

REFERENCES

[1] lS0 13372, "Condition Monitoring and Diagnostics of Machines -Vocabulary", 1" ed 2004.

[2] lS0 2041, "Mechanical vibration, shock and condition monitoring -Vocabulary", 3'd ed 2009.

[3] T. Kohonen, "Selt:Organizing Maps", 3'd ed. 2000.

[4] M. R Berthold, D. J Hand, "Intelligent Data Analysis", 1 ,[ ed. 1999.

[5] Fl 102857, "Self Learning Method in the Condition Monitoring and Diagnostics of Rotating Machines", granted 26.02.1999.

[6] EP 1292812, "Method in Monitoring the Condition of Machines", granted 03.06.2009.