Using abductive machine learning for online vibration ...

253

Using abductive machine learning for onlinevibration monitoring of turbo molecularpumps

R.E. Abdel-Aal∗ and M. Raashid

Center for Applied Physical Sciences, Research

Institute, King Fahd University of Petroleum and

Minerals, Dhahran 31261, Saudi Arabia

Received 26 April 1999

Revised 13 September 1999

Turbo molecular vacuum pumps constitute a critical compo-nent in many accelerator installations, where failures can becostly in terms of both money and lost beam time. Catas-trophic failures can be averted if prior warning is giventhrough a continuous online monitoring scheme. This paperdescribes the use of modern machine learning techniques foronline monitoring of the pump condition through the mea-surement and analysis of pump vibrations. Abductive ma-chine learning is used for modeling the pump status as ‘good’or ‘bad’ using both radial and axial vibration signals mea-sured close to the pump bearing. Compared to other statisti-cal methods and neural network techniques, this approach of-fers faster and highly automated model synthesis, requiringlittle or no user intervention. Normalized 50-channel spec-tra derived from the low frequency region (0–10 kHz) of thepump vibration spectra provided data inputs for model de-velopment. Models derived by training on only 10 observa-tions predict the correct value of the logical pump status out-put with 100% accuracy for an evaluation population as largeas 500 cases. Radial vibration signals lead to simpler mod-els and smaller errors in the computed value of the statusoutput. Performance is comparable with literature data on asimilar diagnosis scheme for compressor valves using neuralnetworks.Keywords: Vibration monitoring and diagnostics, statisticalvibration analysis, turbo molecular pumps, machine learning,abductive networks

* Corresponding author: R.E. Abdel-Aal, P.O. Box 1759, KFUPM,Dhahran 31261, Saudi Arabia. Fax: + 966 3 860 4281; E-mail:[email protected].

1. Introduction

The 350 kV light ion accelerator facility [4] at KingFahd University of Petroleum and Minerals (KFUPM)employs some 15 Balzers turbo molecular vacuumpumps of various capacities to achieve a minimum vac-uum level of 1.33× 10−4 Pa. Table 1 gives a summaryof the specifications and operating conditions for a typ-ical 0.5 m3/s pump, model TPU 510, and its electronicdrive unit model TPC 300. Many of the pumps run con-tinuously for extended periods, and operational experi-ence has shown that bearing failures while the pump isrunning at full speed can completely destroy the pump.Such failures often occur without adequate warningsigns that can be detected through routine manual in-spection. Even in cases when there is a change in thepump noise, this may go unnoticed in the noisy envi-ronment of the accelerator vault or may occur after nor-mal working hours when the facility is left unattended.A turbo pump is an expensive piece of equipment, andpump failures can also be costly in terms of lost beamtime if there is a need to wait for in-house repair or for areplacement pump to be ordered from abroad. We haverecently initiated work on the development of an onlinemonitoring scheme for the accelerator pumps with theobjective of automatically detecting abnormalities inthe pump condition and warning the accelerator opera-tor in advance to avert serious failures. The importanceof continuous online monitoring for critical machineryis well established [21], since monthly or weekly man-ual measurements may not be frequent enough or con-sistent enough to detect developing problems.

Vibration analysis truth tables have been used formany years as a guide for diagnosing vibrations in ro-tating machinery, but conclusive results often requirefurther evidence [24]. Recent advances in computers,instrumentation, and signal processing techniques havemade online predictive vibration monitoring of ma-chinery available and cost-effective approach in many

Shock and Vibration 6 (1999) 253–265ISSN 1070-9622 / $8.00 1999, IOS Press. All rights reserved

254 R.E. Abdel-Aal and M. Raashid / Using abductive machine learning

Table 1

Summary of technical data and operating conditions for theBalzers model TPU 510 turbo molecular pump and its elec-tronic drive unit model TCP 300

Pump:

Nominal diameter

Inlet 0.160 m

Outlet 0.040 m

Volume flow rate for nitrogen 0.5 m3/s

Compression ratio for nitrogen 108

Theoretical ultimate pressure 10−14 bar

Speed

Rated 1000 s−1

Standby 666 s−1

Rated blade passage frequency 40000 s−1

Run-up time to 90% of rated speed 300 s

Weight 30 kg

Cooling Air

Lubricating oil Pfeiffer

Turbo oil

TL 011

Permissible ambient temperature 0–35 ◦C

Electronic Drive Unit TCP 300:

Input voltage (AC) 110–240 V

Maximum power input 300 VA

Maximum output voltage 40 V

Run-up current 7 A

Rated frequency (±2%) 1000 Hz

situations [21]. Techniques used include time domainand frequency domain analysis as well as combina-tions of both. Univariate time series analysis [29] andmultivariate linear regression methods [19] have beenemployed to model normal vibration behavior in thetime domain. Problems with the first approach includestrong nonstationarity of the vibration time series, as inthe case of reciprocating machinery. The second tech-nique suffers from difficulties in determining suitablerelevant time series that explain variations in the vi-bration data, as well as strong correlations between thevarious input time series. The two techniques requirecomplex computations and considerable user interven-tion for each analysis performed, which makes themdifficult to implement online using simple portableapparatus. Frequency domain techniques use the fre-quency spectrum of the vibration signal as a signaturefor the pump condition, e.g. [8].

A recent trend in many areas of applied sciences hasbeen to resort to a machine learning approach when arigorous algorithmic solution becomes too complex or

when the underlying relationships between inputs andoutputs are not known. With this approach, a modelis developed automatically through training on an ad-equate number of solved examples. Once the modelis synthesized, it can be used to perform fast predic-tions of outputs corresponding to new cases. A ma-jor advantage of this approach in vibration analysisis that intensive computations are now required onlyonce, i.e., during training for model synthesis, ratherthan being repeated for every vibration record analyzedduring actual monitoring. Using the model to processnew records becomes a simple and speedy operationthat can be implemented in real time using compactand portable apparatus. With the shape of the vibrationspectrum as input, the problem is reduced to that of au-tomatic pattern recognition which has been applied inmany disciplines, e.g., [28].

Numerous techniques currently exist for the devel-opment of machine learning systems [27]. These in-clude statistical pattern recognition methods such asBayesian classifiers and discriminant functions [11],artificial neural networks modeled roughly on how thehuman brain is believed to function [26], as well asmethods for the induction of decision trees [9]. Thesetechniques vary in their accuracy, complexity, compu-tational requirements during training, and their abil-ity to provide human-like explanations for their con-clusions. Such variations have led to newer techniquescombining good features from various methods. An ex-ample of such ‘hybrids’ is the AIM abductive networktool [20] which draws on statistical and multiple re-gression analysis methods as well as neural networks,resulting in a faster and more automated approach tomodel synthesis. The development of this approachfor machine learning through self-organization has fol-lowed the track of the group method of data handling(GMDH) algorithm [12] and the closely related adap-tive learning network (ALN) technique [7]. A math-ematical description of the GMDH-ALN foundationsfor AIM is given in Section 2. Previous experiencewith this approach in modeling and forecasting dailyminimum temperature [1] has indicated improved pre-diction accuracy compared to neural network models.Model synthesis is also more automated, requiring lit-tle or no user intervention and the resulting models canprovide more insight into the phenomenon being mod-eled. AIM models take the form of simple equationsthat can be easily implemented on a portable apparatus.

Artificial neural network techniques have been pro-posed as a new approach to automate vibration anal-ysis, primarily in the frequency domain [5,17,19,23].

R.E. Abdel-Aal and M. Raashid / Using abductive machine learning 255

One application diagnoses a compressor valve as goodor bad by training a back propagation network using aset of features extracted from the acoustic spectra [5].Kotani et al. [17] reports on the use of an acousticfeature extraction auto-associative neural network fol-lowed by a fault discriminating network to diagnoseeight types of compressor faults, also using acousticspectra. Time series of the vibration levels for the ro-tor of a 500 MW generator have been modeled usingradial basis function neural networks [19]. This paperdescribes the use of GMDH-based abductive machinelearning to diagnose a turbo molecular pump as goodor bad based on the vibration spectra measured usingan accelerometer. Following an overview of the ma-chine leaning approach used, the experimental set upis described and models are derived for predicting thepump status through training on a number of knowncases. Performance of these models in predicting thepump status from new spectra is described. We com-pare results obtained using both radial and axial vibra-tion signals from the pump. Results are also related todata in the literature on the performance of similar neu-ral network approaches.

2. GMDH-type modeling techniques

2.1. The classical GMDH approach

In quest for optimal objective models that lookonly at collected data representing system behavior,Ivakhnenko has proposed the GMDH algorithm in1966 [15]. He observed that physical models oftenrequire information that is not readily available andare therefore subject to many assumptions and sim-plifications which degrade the quality of the resultingmodel. Other techniques for quantitative modeling in-clude time-series analysis using various linear statis-tical methods and multivariate regression [18]. Thesetechniques have difficulties in handling nonlinearitiesin the modeled phenomena and in dealing with smalldata sets, which is the case in many environmental,ecological, and social applications [14]. Attempts toincorporate nonlinear relationships in such models re-quire the nonlinearity forms to be presumed a priori,rather than being naturally and automatically derivedfrom the data. Inclusion of postulated nonlinearitiesin this way also increases the possibility of the modelcurve-fitting noise in the data [25]. GMDH-type algo-rithms solve these problems since they automaticallydetermine the inherent structure of complex and highly

nonlinear systems and can synthesize adequate mod-els with relatively few data points. The automation ofmodel synthesis not only lessens the burden on the an-alyst but also safeguards the model generated from be-ing influenced by human biases and misjudgements.

The GMDH approach is a formalized paradigm foriterated (multi-phase) polynomial regression capableof producing a high-degree polynomial model in ef-fective predictors. The process is ‘evolutionary’ in na-ture, using initially simple (myopic) regression rela-tionships to derive more accurate representations in thenext iteration. To prevent exponential growth and limitmodel complexity, the algorithm selects relationshipswhich have good predicting powers and discards allthe others within each phase. Iteration is stopped whenthe new generation regression equations start to havepoorer prediction performance than those of the pre-vious generation, at which point the model starts tobecome overspecialized and therefore unlikely to per-form well with new data. It is seen that the algorithmhas three main elements: representation, selection, andstopping. The algorithm applies abduction heuristicsfor making decisions concerning some or all of thesethree aspects. A detailed description of the steps of theclassical GMDH algorithm can be found in [12].

To illustrate these steps for the classical GMDH ap-proach, consider an estimation data base ofne obser-vations (rows) andm + 1 columns for m independentvariables (x1,x2, . . . ,xm) and one dependent variabley. In the first iteration we assume that our predictorsare the actual input variables. The initial rough predic-tion equations are derived by taking each pair of inputvariables (xi,xj ; i, j = 1, 2,. . . ,m) together with theinput variabley and computing the quadratic regres-sion polynomial

y = A+Bxi + Cxj +Dx2i +Ex2

j + Fxixj . (1)

Each of the resultingm(m−1)/2 polynomials is eval-uated using data for the pair ofx variables used togenerate it, thus producing new estimation variables(z1, z2, . . . , zm(m−1)/2) which would be expected to de-scribey better than the original variables. The resultingz variables are screened according to some selectioncriterion and only those having good predicting powerare kept. The original GMDH algorithm employs anadditional and independent ‘selection set’ ofns obser-vations for this purpose and uses the regularity selec-tion criterion based on the root mean squared errorrkover the selection data set, where


r2k =

ns∑l=1

(yl − zkl)2

ns∑l=1

y2l

,

k = 1, 2,. . . ,m(m− 1)/2. (2)

Only those polynomials (and associatedz variables)that haverk below a prescribed limit are kept and theminimum value,rmin, obtained forrk is also saved.The selectedz variables are more effective in describ-ing and predicting y than the original input variables,and the corresponding columns represent a new database for repeating the estimation and selection steps inthe next iteration to derive a set of higher-level vari-ables. At each iterationrmin is compared with its previ-ous value and the process is continued as long asrmin

decreases or until a given complexity is reached. Anincreasingrmin is an indication of the model becomingoverly complex, thus over-fitting the estimation dataand performing poorly in predicting the new selectiondata. Keeping model complexity checked is an impor-tant aspect of GMDH algorithms which keep an eye onthe final objective of constructing the model, i.e., us-ing it with new data previously unseen during training.The best model for this purpose is that providing theshortest description for the data available [6]. Compu-tationally, the resulting GMDH model can be seen asa layered network of partial quadratic descriptor poly-nomials, each layer representing the results of an al-gorithm iteration. The single polynomial in the finallayer predicts the independent variabley in two vari-ables, which are themselves quadratics in two lower-level variables, etc. The lowest-level polynomials inthe first layer operate directly on them independent in-putx variables. Making necessary substitutions for thecomplete model, we can reach a highly complex poly-nomial (known as the Ivakhnenko polynomial) of theform:

y = a+m∑i=1

bixi +m∑i=1

m∑j=1

cijxixj

+m∑i=1

m∑j=1

m∑k=1

dijkxixjxk + · · · . (3)

It is worth pointing out a number of advantages forthe GMDH approach compared to conventional regres-sion analysis, particularly for modeling large and com-plex systems. For the modest case ofm = 10 andorderp = 8, obtaining the coefficients of the regres-

sion polynomials directly requires solving 43758 ill-conditioned linear equations simultaneously, while a 3-layer GMDH model obtains the equivalent Ivakhnenkopolynomial by repetitively solving regression equa-tions in 6 variables only (Eq. (1)). This reduces ill-conditioning effects (smaller matrices) and allowsGMDH to model complex relationships using only asmall number of data points, while with conventionalregression we need at least as many observations as co-efficients. GMDH also keeps generating new variablesby intermixing lower-level variables, thus reducing lin-ear dependence [12].

2.2. GMDH variations and the ALN approach

The original GMDH algorithm has been subject tomany variations in the methods used for estimatingthe partial descriptor functions and in the choice ofdecision rules for descriptor selection and criteria forstopping the iterations. Optimized polynomials as wellas different descriptor functions have been proposed.The external regularity criterion used for selection andstopping required the splitting of the training dataavailable into two sets which reduces the amount ofdata available for estimation. Moreover, results dependon the way the data was divided and some splittingheuristics and cluster analysis were often required [7].A number of methods have been proposed which op-erate on the whole training data set for estimation, se-lection, and stopping, thus avoiding these limitations.GMDH-related activities in the U.S.A. has been gener-ally identified as the adaptive learning network (ALN)approach (AIM being an example), which emphasizedthe use of the predicted squared error (PSE) criterionfor selection and stopping to prevent model overfitting.The PSE criterion minimizes the expected squared er-ror that would be obtained when the network is used forpredicting new data which is different from that usedduring training [7]. For example, AIM expresses thisPSE error as

PSE= FSE+ CPM

(2kn

)σ2

p, (4)

where FSE is the fitting squared error for the model onthe training data, CPM is a complexity penalty mul-tiplier selected by the user,k is the number of modelcoefficients,n is the number of samples in the trainingdata set, andσ2

p is a prior estimate for the variance forthe error obtained with the unknown model. This es-timate does not depend on the model being evaluated


and is usually taken as half the variance of the depen-dent variabley [6]. It is noted that as the model be-comes more complex, relative to the size of the train-ing set, the second term increases linearly while thefirst term decreases. Therefore, the PSE goes througha minimum at the optimum model size which strikesa balance between accuracy and simplicity or exact-ness and generality. CPM has a default value of 1 inAIM. Lower values allow more complex models whilehigher values allow simpler ones.

3. AIM abductive machine learning

AIM is a supervised inductive machine-learningtool for automatically synthesizing abductive networkmodels from a data base of input and output valueswhich represent a training set of example situations.Once synthesized by training on a training data set,the network can be queried with new input data toprovide the corresponding predicted output. Abduc-tive networks[20]combine the advantages of the neu-ral network approach with those of advanced statisti-cal methods. While the processing elements in neuralnetworks are restricted by the neuron analogy, abduc-tive networks consist of various types of more pow-erful numerical functional elements based on predic-tion performance. The network size, element types,connectivity, and coefficients for the optimum modelare automatically determined using well-proven opti-mization criteria, thus reducing the need for user in-tervention. With neural networks, the user has to ex-periment with various architectures and there are nohard and fast design rules to determine optimum val-ues for the number of hidden layers, number of neu-

rons in each layer, and the training parameters, andoften a number of combinations need to be tried insearch of the best solution. With the commonly usedstandard back propagation algorithm, training timescan be huge and there are many training parametersto adjust which may have a major effect on the re-sults [27]. The algorithm is not guaranteed to con-verge to a good solution [10], and because the methodmay be unstable and oscillate between solutions, itmay not be clear when to stop [27]. This makes ab-ductive networks easier to use and considerably re-duces the learning/development time and effort. AIMadvantages over back-propagation neural networks inforecasting the daily minimum temperature are demon-strated in [1] while large improvements in trainingspeed are reported in [20]. It should be mentioned,however, that improvements are becoming availablewhich attempt to alleviate some of the problems asso-ciated with older standard neural network paradigms.For example, using a stiff ordinary differential equa-tion solver with the back propagation algorithm re-duces the number of parameters to be adjusted, cutson training time, and improves accuracy [27]. Usingconjugate gradient methods for unconstrained opti-mization has been suggested for ensuring guaranteedand faster convergence during training [10]. A step-wise construction procedure has also been proposedfor building and training single-layer neural networkswithout the requirement for the number of neuronsin the hidden layer to be fixed a priori by educatedguesses [16].

The work reported here used AIM version 1.0 forthe Macintosh computer. AIM models take the form oflayered feed-forward abductive networks of functionalelements (nodes) [3], see Fig. 1. Elements in the first

Fig. 1. A typical AIM network structure showing various types of functional elements.


layer operate on various combinations of the indepen-dent input variables (x’s) and the single element in thefinal layer produces the predicted output for the depen-dent variabley. In addition to the functional elementsin the main layers of the network, an input layer of nor-malizers converts the input variables into an internalrepresentation asZ scores with zero mean and unityvariance[3], and an output layer of unitizers restoresthe results to the original problem space. Both the el-ement type and the combination of inputs to it fromall the previous layers are selected automatically forbest prediction performance according to the predictedsquared error (PSE) criterion[6]. The following mainfunctional elements are supported:

(i) A white element which consists of a constantplus the linear weighted sum of all outputs ofthe previous layer, i.e.,

OutputWhite =W0 +W1X1 +W2X2

+W3X3 + . . .+WnXn, (5)

where X1,X2, . . . ,Xn are the inputs to theelement andW0,W1, . . . ,Wn are the elementweights.

(ii) Single, double, and triple elements which im-plement a third-degree polynomial expressionwith all possible cross-terms for one, two, andthree inputs, respectively; for example,

OutputDouble =W0 +W1X1 +W2X2 +W3X21

+W4X22 +W5X1X2 +W6X

31

+W7X32 . (6)

The first step in solving a problem is preparinga data base of input-output training examples whichAIM uses to derive the abductive network model. Thismodel network is synthesized layer by layer until nofurther improvement in performance is possible or apreset limit on the number of layers is reached. Withineach layer, every element is computed and its perfor-mance scored for all combinations of allowed inputs.The best network structure, element types and coeffi-cients, and connectivity are all determined automati-cally by minimizing the PSE criterion. This selects themost accurate model that does not overfit the trainingdata, and therefore strikes a balance between the ac-curacy of the model in representing the training dataand its generality which allows it to fit yet unseenfuture data. In this way the model is optimized for

the actual use for which it is developed, rather thansimply fitting the training data. The user may option-ally control this trade-off between accuracy and gen-erality using the complexity penalty multiplier (CPM)parameter[6]. Larger values than the default value of1 lead to simpler models which are less accurate butare more likely to generalize well with unseen data,while lower values produce more complex networkswhich overfit the training data and may therefore de-grade prediction performance with noise. An ‘Evalu-ate’ utility allows evaluation of the resulting abductivenetwork on an independent set of data and generatesa report of the results. To obtain good AIM models,the training set should be a good representation of theproblem space. The learning task is also simplified bybreaking the problem into smaller and more manage-able assignments, and by utilizing any human knowl-edge on parameters relevant to the model in the choiceof input variables to be included in the training database.

4. Experimental setup

Figure 2 shows the experimental setup used fordata acquisition and analysis of pump vibration. Vi-bration signals are detected using a ceramic shearmode accelerometer, PCB Piezoelectronics Inc., model352B22 [22]. Table 2 gives a summary of the dynamicperformance of this tansducer. Vibration perpendicu-lar to the flat mounting surface of the sensor are inter-nally converted into shear forces. Sensitivity is low atvery low and very high frequencies, with a reasonablyflat response (±3 dB variations) over the frequencyrange 1 Hz to 16 kHz. Sensitivity increases steadilywith higher frequencies up to a mounted resonant fre-quency exceeding 32 kHz. The accelerometer responsedrops at frequencies above that resonant frequency.The sensor was adhesively mounted on the surface ofthe pump being monitored through a thin layer of petrowax. Two positions have been investigated for mount-ing the acceleration sensor as shown in Fig. 2. In posi-tion R the sensor is sensitive to radial vibrations, whilein position A the sensor registers axial vibrations (par-allel to the pump axis). One of the objectives of thiswork has been to compare vibration monitoring per-formance for these two types of vibration signals. AC-coupled signals from the built-in preamplifier of the ac-celerometer were further magnified in two timing fil-ter amplifiers type ORTEC 474 with no CR or RC tim-ing networks inserted (for a flat frequency response).


Fig. 2. Experimental setup for the acquisition and analysis of pump vibration signals.

The frequency spectrum of the unfiltered vibration sig-nal was observed on a Hewlett Packard spectrum ana-lyzer model HP 8568B which showed frequency com-ponents as high as 75 kHz with the pump running atthe full speed of 1000 s−1. It appears that the highfrequency response of the accelerometer extends be-yond the mounted resonant frequency with adequatesensitivity to allow the detection of vibration signalshaving such higher frequency components. An anti-aliasing low pass filter with 30 dBs/octave attenuationin the stop band was inserted between the two amplifierstages. The filtered signal had a peak-to-peak ampli-tude of about 4 V, and a DC offset of –4 V was appliedto this signal using an ORTEC dual sum-and-invertamplifier model ORTEC 553. This converts the bipolarsignal into a unipolar 0–8 V signal which was sampledby a LeCroy 3511 CAMAC ADC at a sampling fre-quency of about 84 kHz (sampling interval= 12µs).This sampling frequency is about five times the highend of the frequency range where the transducer re-sponse is flat within±3 dBs, see Table 2. Cut-off fre-quency on the anti-aliasing filter was selected to pre-vent aliasing effects due to the higher frequency com-ponents of the vibration signal. Vibration waveformrecords, each being 4K-samples in length, were ac-quired by a VAX 3200 workstation running the XSYSdata acquisition and analysis package [2,13]. Softwarein the VAX computed the magnitude of the fast Fouriertransform (FFT) of the sampled record and generateddata bases for training and evaluating the AIM models.These data were transfered to a Macintosh computer,connected to the VAX as a terminal, for use in the de-velopment and evaluation of AIM models for vibrationmonitoring.

Table 2

Summary of the dynamic performance data for the PCB 352B22accelerometer

Voltage sensitivity (±15%) 1.02 mV/ms−2

Frequency range (±5% variations) 5–8000 Hz

Frequency range (±10% variations) 2.5–9000 Hz

Frequency range (±3 dB variations) 1–16000 Hz

Mounted resonant frequency > 32 kHz

Measurement range ±3924 ms−2 Peak

Resolution (broadband) 0.015 ms−2 Peak

Amplitude linearity ±1 %

Transverse linearity 6 5 %

5. Data analysis

Figure 3 shows 500-sample sections of the vibrationwaveform records from a TPU 510 pump. The recordswere measured after approximately 5000 hours of op-eration (when bearing change becomes due) as wellas shortly following bearing change. The former con-dition was used to represent a ‘bad’ pump, and thelatter a ‘good’ pump. Shown in the figure are wave-forms measured in both the radial and axial positionsof the accelerometer (positions R and A in Fig. 2, re-spectively). It is noted that the good pump is gener-ally much quieter, where the vibration signals are lowerin amplitude and have predominantly lower frequencycomponents. It is also noted that signals at the radialposition of the sensor are relatively richer in frequencycomponents compared to the axial position. Figure 4shows the complete 4K-point FFT spectrum of the 4K-sample waveform record of the radial vibration sig-nal for the bad pump. Each channel increment repre-sents a frequency interval of 1/(4095× 12× 10−6) ≈20.35 Hz. The figure indicates that the majority ofthe frequency content of the sampled vibration sig-nal lies below 10 kHz. Frequency analysis in this re-


Fig. 3. 500-sample waveform records of the pump vibration signals: (a), (b) at the radial (R) position for the ‘good’ and ‘bad’ pump conditions,respectively; (c), (d) at the axial (A) position for the ‘good’ and ‘bad’ pump conditions, respectively. Sampling interval= 12µs.

gion is acknowledged as the most effective method fordetecting imbalance, misalignment, mechanical reso-nances, and looseness in rotating machinery [21]. Thisregion (channels 1 to 500 of the FFT record) was usedthroughout as the vibration signature of the pump con-dition. Figure 5 shows plots of this region of the vi-bration spectra corresponding to the cases of the fourwaveform records in Fig. 3. The fundamental compo-nent in all cases is about 1 kHz, which is the frequencycorresponding to the rated pump speed of 1000 s−1.The plots confirm observations made above on the timerecords regarding relative density of frequency compo-nents for good vs bad pumps as well as for radial vsaxial sensor positions.

Data bases for training and evaluation were de-rived from the 500-channel frequency spectra shownin Fig. 5 through a data reduction procedure, since theversion of AIM used allows a maximum of only 50 in-put parameters for use in model synthesis and evalua-

Fig. 4. FFT spectrum for a 4096-sample waveform record for the vi-bration signal at the radial (R) position and the ‘bad’ pump condi-tion. 1 channel= 20.35 Hz.


Fig. 5. FFT spectra in the low-frequency region (0–10 kHz), 1 channel= 20.35 Hz: (a), (b) at the radial (R) position for the ‘good’ and ‘bad’pump conditions, respectively; (c), (d) at the axial (A) position for the ‘good’ and ‘bad’ pump conditions, respectively.

tion. We employed a method adopted by [5] to reducethe 500-channel spectra to 50-channel spectra. Withthis method, the original 500-channel spectrumS is di-vided into 50 segments;S1,S2, . . . ,S50, each consist-ing of 10 channels. The 50-channel percentage area ra-tio spectrumR used for AIM training is derived suchthat itsjth channelRj contains the percentage ratio ofthe sum of the contents of all channels within segmentSj to the total sum of the contents of all channels in allthe 50 segments, which is the total area of the originalspectrumS, i.e.,

Rj = 100×

10∑i=1

Sji

50∑k=1

10∑i=1

Ski

. (7)

In addition to satisfying the data reduction requirementby AIM, this averaging approach is useful in reduc-

ing sensitivity of the resulting models to changes in in-dividual spectrum components which may result, forexample, from slight changes in sensor location if thespectrum data were to be applied directly to the AIMnetwork. The method also produces normalized train-ing/evaluation data such that the models are derivedand used with input data always in the same numberspace (range 0–100). Using ratios of the spectral com-ponent levels, rather than amplitudes of the spectralcomponents themselves, makes the resulting modelsmore robust against variations in vibration signal lev-els which are reflected directly in the spectrum am-plitude. Therefore, the normalized area ratio spectrumused to derive the training/evaluation data bases pro-vides a spectral signature that contains information onthe relative strength of frequency components in thevarious regions of the spectrum, while being fairly im-mune to signal amplitude variations including thosecaused by gain and offset changes.


6. AIM modeling

As a first exercise, we considered the developmentof abductive network models that continuously per-form go-no go checks on the pump condition by clas-sifying the frequency spectrum as representing a goodor a bad pump. Figure 6 shows typical 50-channel per-centage area ratio spectra for good and bad pumps us-ing signals from both the radial and axial sensor posi-tions. In each position, spectra for both good and badpumps are shown superimposed on the same plot to in-dicate the relative ease of separating the two patterns.It is seen that discrimination appears easier with sig-nals at the radial sensor position, and therefore this taskshould be achieved using simpler models. A typicaldata base record for modeling the pump status in termsof the percentage area ratio spectrum is given below:

Inputs: Output:Channel contents Logical variableof the percentagearea ratio spectrumCh1, Ch2, Ch3, . . . , Ch50 Pump status

(1 = ‘Good’, 0 = ‘Bad’)

Throughout this paper, the numbers of training andevaluation observations will be designated NT and NE,respectively. All models were developed using the de-fault CPM value of 1. Models were synthesized usinga training data base of 1000 records (NT= 1000) withboth good and bad pump status equally represented(500 records each). Structure of the resulting networksfor both the radial and axial sensor positions is shownin Fig. 7. The network for the radial case is simpler,since the ‘good’ and ‘bad’ spectrum signatures are eas-ier to differentiate as noted above. The radial networkis a single-layer network that uses only the contents ofchannel 5 to achieve the required discrimination, whilediscrimination in the axial case requires a 2-layer net-work operating on two variables (Ch5 and Ch15). Sub-stituting the equations shown in Fig. 7(a) for the vari-ous functional elements produces the following modelrelationship for the pump status at the radial sensor po-sition:

StatusRadial,NT=1000 =−0.081056+ 0.21675(Ch5)

− 0.01076(Ch5)2. (8)

The corresponding polynomial expression for the ax-ial configuration is much more complex, consistingof 28 terms with powers as high as 9 and 3 for Ch5

Fig. 6. Percentage area ratio spectra for the ‘good’ and ‘bad’ pumpconditions used for developing the AIM models: (a) at the radial (R)position, (b) at the axial (A) position.

and Ch15, respectively. Both networks were evaluatedon a mixture of 250 ‘good’ and 250 ‘bad’ new cases(NE = 500). The resulting logical value for the pumpstatus (ideally 1 for ‘Good’ and 0 for ‘Bad’) was de-rived from the real number predicted by the networkfor the status output through simple thresholding at 0.5.Table 3 lists the maximum, average, and standard de-viation of the error in the predicted pump status out-put as well as the good/bad classification accuracy forboth sensor positions. The table indicates 100% accu-racy for the AIM model in predicting the pump statusfor the 500 evaluation cases, since the error in the valueof the computed status output never exceeds 0.5. It isnoted that the maximum error is higher for the axialsensor configuration, although a lower average error isobtained.


Table 3

Summary of errors in the predicted pump status output for two sizes of the training data base, NT=1000 and NT= 10. In both cases the evaluation data base has 500 cases. Classification accuracy isfor the logical status (0, 1) obtained from computed status output by simple thresholding at 0.5

NT = 1000, NE= 500 NT= 10, NE= 500

Radial Axial Radial Axial

position position position position

Average absolute error 0.0077 0.0031 0.0602 0.0973

Standard deviation of absolute error 0.0116 0.0082 0.0804 0.1122

Maximum absolute error 0.0532 0.1072 0.3064 0.3583

Classification accuracy, % 100 100 100 100

Fig. 7. Structure of the AIM abductive network models obtained forthe pump status: (a) using radial vibration signals, (b) using axialvibration signals.

The 100% ‘good’ vs ‘bad’ classification accuracyfor the above models suggests that adequate resultsmay still be obtained with much smaller sizes for thetraining data base, i.e., lower values for NT. It wasfound that the 100% classification accuracy with thesame 500 evaluation cases used above was maintainedwith NT as small as 10 training cases (5 ‘good’ and 5‘bad’). Networks obtained in this case are much sim-pler, leading to the following linear relationships forthe pump status:

StatusRadial,NT=10 = −0.03+ 0.095(Ch5), (9)

StatusAxial,NT=10 = −0.05+ 0.0274(Ch5). (10)

Both models use only the contents of channel 5 to de-termine the pump status. The simpler models obtainedwith NT = 10 are attributed to the fact that they needto reconcile much less statistical variations in the inputparameters (channels contents of the percentage area

ratio spectrum) during training with NT= 10, as com-pared with NT= 1000. However, it is expected thatsuch models would be less robust with changes in thetraining sets than those obtained with larger values ofNT. In other words, a different model may be obtainedby training on a different set of 10 training spectra. Ta-ble 3 lists data on the error in the predicted real valuefor the pump status output at both sensor positions. Theresults indicate that, for the same network complexity,axial vibrations are associated with larger errors in thestatus output.

7. Discussion

GMDH-based abductive machine learning has beenused for diagnosing a turbo molecular pump as goodor bad as judged by low frequency vibration spectracollected in both the radial and axial directions. 100%diagnosis accuracy for a 500-case evaluation popula-tion is maintained for training data bases as small as10 cases. Similar accuracy has been reported in the lit-erature with neural networks diagnosing a compressorvalve [5], but with a larger training data base (NT= 20)and a much smaller evaluation data set (NE= 21).Results indicate that radial vibrations produce simplerand more accurate models in general, due to more dis-tinct signatures for the good and bad cases. AIM pro-vides a fast, convenient, and accurate approach to mod-eling and classifying the vibration spectra. Adequatemodels were synthesized automatically with the de-fault value of the CPM parameter without the user hav-ing to experiment with various architectures as in thecase with presently available neural network tools forwhich no design methodologies exist. Network train-ing is also expected to be faster with AIM, which con-siderably reduces development time and effort. Result-ing models in the form of a few polynomial equationsreadily reveal input variables that influence the classi-


fication and allow fast and efficient online monitoringusing simple portable apparatus.

The way ‘good’ and ‘bad’ pump conditions were de-fined here may be somewhat idealistic, since ‘good’was represented by a pump having a brand new bearingand ‘bad’ with the bearing approaching the end of itsuseful life. In practice, a pump would normally be con-sidered satisfactory for a considerable interval follow-ing bearing change. This can be taken into account byextending the spectrum measurements for the ‘good’pump over such an interval to improve the represen-tation of natural variability within the ‘good’ class.A similar approach can be applied on the other extremefor the representation of the ‘bad’ condition. How-ever, representing diverse and truly bad conditions in aunique way would be more difficult. Mayes [19] arguesthat it is not possible to model the abnormal behav-ior since it is unknown, and therefore only the normalbehavior should be modeled. Diagnosis is then madeby looking for clear departures from this model whichgo beyond acceptable normal variability, thus indicat-ing genuine changes. This approach is possible withtime series modeling using either ARIMA [29], regres-sion analysis [19], or neural networks [19]. The firsttwo techniques would be particularly easier to imple-ment, since deviations would show simply as changesin a few model coefficients. However, the categoricalclassification approach adopted here and in [5] requiresthat the class type (‘good’ or ‘bad’) is known for eachtraining or evaluation observation, which implies mod-eling abnormal behavior. In situations where a finitenumber of fault modes can be identified and modeled,the technique can be useful in diagnosing a good unitand classifying the fault type [17]. It is also possible toextend the 2-state classification method described hereto model a finite number of pump operating conditionsspanning the bearing lifetime as represented by theirvibration frequency spectra.

Work described here has utilized existing VAX-based hardware and software designed originally forthe acquisition and analysis of nuclear physics experi-mental data. This has limited the frequency range an-alyzed due to the limited sampling frequency of 84kHz. Higher frequency parts of the vibration spec-trum, presently excluded by the anti-aliasing filter, mayprove useful in identifying problems such as pittingand cracking in bearings, insufficient lubrication, shaftrubbing and pump cavitation [21]. To include suchhigher frequency components requires sampling at ahigher frequency and using an accelerometer with abroader bandwidth. The present sensor may still be

used if proper equalization is included to account forvariations in the frequency response over the widerband of interest. Future work would consider faster andmore efficient modern PC-based hardware and soft-ware platforms for sampling and analyzing the vibra-tion signals. Moreover, more uptodate versions of theAIM software are now becoming available on the PC.These factors will allow an integrated approach to theacquisition, analysis, and monitoring of the vibrationdata as well as the modeling of the vibration status forall the pumps of interest. It is worth noting that thenewer AIM versions support a larger number of the in-put parameters, which allows greater resolution for thevibration spectra that can be modeled.

Acknowledgement

This work is part of KFUPM/RI Center for AppliedPhysical Sciences project supported by the ResearchInstitute of King Fahd University of Petroleum andMinerals, Dhahran, Saudi Arabia.

References

[1] R.E. Abdel-Aal and M.A. Elhadidy, A machine-learning ap-proach to modeling and forecasting the minimum temperatureat Dhahran, Saudi Arabia,Energy – The International Journal19 (1994), 739–749.

[2] R.E. Abdel-Aal, H.A. Al-Juwair and M.A. Al-Sunaidi, A com-puterized system for beam measurements and control at theKFUPM 350 kV ion accelerator,Nuclear Instruments andMethods in Physics ResearchA301 (1991), 489–496.

[3] AIM User’s Manual, AbTech Corporation, Charlottesville, VA,USA, 1990.

[4] H.A. Al-Juwair, G. Blume, R.J. Jaarsma, C.R. Meitzler andK.H. Purser, 350 keV accelerator facility for the University ofPetroleum and Minerals,Nuclear Instruments and Methods inPhysics ResearchB24/25 (1987), 810–812.

[5] K. Arai, H. Shimodaira, Y. Sakaghchi and K. Nakano, Applica-tion of neural computation to sound analysis for valve diagno-sis, in:Proc. IEEE Int. Joint Conf. on Neural Networks, Seatle,WA, July 8–12, 1991, I: pp. 177–182.

[6] A.R. Barron, Predicted squared error – a criterion for auto-matic model selection, in:Self-Organizing Methods in Model-ing: GMDH Type Algorithms, S.J. Farlow, ed., Dekker, NewYork, 1984, pp. 87–103.

[7] R.L. Barron, A.N. Mucciardi, F.J. Cook, J.N. Craig and A.R.Barron, Adaptive learning networks: developments and ap-plications in the United States related to GMDH, in:Self-Organizing Methods in Modeling: GMDH Type Algorithms,S.J. Farlow, ed., Dekker, New York, 1984, pp. 25–65.


[8] J.D. Boyes, Noise and vibration analysis of reciprocat-ing machinery,Noise and Vibration Monitoring Worldwide(April/March 1981), 90–92.

[9] L. Breiman, J. Friedman, R. Olshen and C. Stone,Classifica-tion and Regression Trees, Wadsworth, Belmont, CA, 1984.

[10] C. Charalambous, Conjugate gradient algorithm for efficienttraining of artificial neural networks,IEE Proceedings-G139(1992), 301–310.

[11] R. Duda and P. Hart,Pattern Recognition and Scene Analysis,Wiley, New York, 1973.

[12] S.J. Farlow, The GMDH Algorithm, in:Self-Organizing Meth-ods in Modeling: GMDH Type Algorithms, S.J. Farlow, ed.,Dekker, New York, 1984, pp. 1–24.

[13] C.R. Gould, L.G. Holzsweig, S.E. King, Y.C. Lau, R.V. Pooreand N.R. Roberson, The XSYS data acquisition system at theTriangle Universities nuclear laboratory,IEEE Trans. NuclearScienceNS-28(1981), 3708–3714.

[14] S. Ikeda, Nonlinear prediction models for river flows andtyphoon precipitation by self-organizing methods, in:Self-Organizing Methods in Modeling: GMDH Type Algorithms,S.J. Farlow, ed., Dekker, New York, 1984, pp. 149–167.

[15] A.G. Ivakhnenko, Polynomial theory of complex systems,IEEE Trans. Systems, Man, and CyberneticsSMC-1 (1971),364.

[16] S. Knerr, L. Personnaz and G. Dreyfus, Single-layer learn-ing revisited: A stepwise procedure for building and training aneural network, in:Neurocomputing: Algorithms, Architecturesand Applications, F.F. Soulie and J. Herault, eds, Springer-Verlag, Berlin, 1990, pp. 41–50.

[17] M. Kotani, H. Matsumoto and T. Kanagawa, Acoustic diagno-sis for compressor with hybrid neural network, in:Proc. IEEEInt. Joint Conf. on Neural Networks, Seatle, WA, July 8–12,1991, I: pp. 251–256.

[18] J.M. Malone II, Regression without models: Directions insearch for structure, in:Self-Organizing Methods in Modeling:

GMDH Type Algorithms, S.J. Farlow, ed., Dekker, New York,1984, pp. 67–86.

[19] I.W. Mayes, Use of neural networks for on-line vibration mon-itoring, Proc. of the Institution of Mechanical Engineers208(1994), 267–274.

[20] G.J. Montgomery and K.C. Drake, Abductive networks, in:Proc. SPIE Applications of Artificial Neural Networks Conf.,Orlando, Florida, 1990, pp. 56–64.

[21] E.A. Page and C. Berggren, Improve predictive maintenancewith HFE monitoring,Hydrocarbon processing, January 1991,pp. 69–72.

[22] PCB Piezoelectronics, Inc. 3425 Walden Avenue, Depew, NY14043-2495.

[23] J.P. Peck and J. Burrows, On-line condition monitoring of ro-tating equipment using neural networks,ISA Transactions33(1994), 159–164.

[24] S.J. Richards, Vibration analysis of rotating equipment in millenvironment,Iron and Steel Engineer, June 1992, pp. 33–38.

[25] D.E. Scott and C.E. Hutchinson, An application of the GMDHalgorithm to economic modeling, in:Self-Organizing Methodsin Modeling: GMDH Type Algorithms, S.J. Farlow, ed., Dekker,1984, pp. 67–86.

[26] P.D. Wasserman,Neural Computing: Theory and Practice, VanNostrand–Reinhold, New York, 1989.

[27] S.M. Weiss and C.A. Kulikowski,Computer Systems ThatLearn., Kaufmann, Los Altos, CA, 1991.

[28] R.A. Vaughan (ed.),Pattern Recognition and Image Processingin Physics, SUSSP Publications, Edinburgh, and Adam Hilger,Bristol, 1991.

[29] Q. Zhuge, Y. Lu and S. Yang, Signature analysis for recipro-cating machinery with adaptive signal processing,MechanicalSystems and Signal Processing4 (1990), 355–365.

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2010

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks


Using abductive machine learning for online vibration ...

Documents

Transcript of Using abductive machine learning for online vibration ...