Multivariate fuzzy modelling of time- series data

70
Master’s programme in Mechanical Engineering Multivariate fuzzy modelling of time- series data Tuomas Keski-Heikkilä Master’s Thesis 2021

Transcript of Multivariate fuzzy modelling of time- series data

Page 1: Multivariate fuzzy modelling of time- series data

Master’s programme in Mechanical Engineering

Multivariate fuzzy modelling of time-series dataTuomas Keski-Heikkilä

Master’s Thesis2021

Page 2: Multivariate fuzzy modelling of time- series data

Copyright ©2021 Tuomas Keski-Heikkilä

Page 3: Multivariate fuzzy modelling of time- series data

Aalto University, P.O. BOX 11000, 00076AALTOwww.aalto.fiAbstract of master's thesis

Author Tuomas Keski-HeikkiläTitle of thesis Multivariate fuzzy modelling of time-series dataMaster programme Mechanical engineeringThesis supervisor Prof. Kari TammiThesis advisor(s) Dr Miika Valtonen and MSc Riku Ala-LaurinahoDate 24.07.2021 Number of pages 62 Language English

AbstractModern industrial machines record increasing amounts of data. This data could be betterutilized to improve the design of machines or their usage. Many methods exist for model-ling data, but each has its limitations. For example, kernel density estimation struggleswith high dimensionality and neural networks are prone to overfitting and have poor inter-pretability.In this thesis, the main objective was to develop a fuzzy logic-based method for efficienthigh dimensional time-series data modelling. The goal of the method is to help a productdesigner or engineer interpret and utilize data collected by industrial machines. Secondari-ly, three potential use cases were studied for fuzzy modelling: visualizations, predictivemodelling, and anomaly detection.The fuzzy logic-based modelling method was developed by iterative prototyping. To eval-uate it, data was collected from an industrial crane, Ilmatar. All the collected data werepreprocessed into a combined CSV file which was used to build a fuzzy model of Ilmatar.In addition, concepts were proposed for fuzzy logic-based predictive modelling and anom-aly detection.It was found that the execution time of the proposed fuzzy modelling method grows onlylinearly with increasing dimensions, thus proving that the method is scalable to high di-mensional data. In addition, with the fuzzy model, an accurate representation of Ilmatardata could be saved with 80% less data, compared to raw measurement data. However,tradeoffs between the interpretability and accuracy of the model need to be made. Anotherbenefit of the model is the possibility for in-built aggregation that it provides.This thesis successfully developed a fuzzy modelling method that has advantages overother widely used methods. The next step would be to apply fuzzy modelling in an actualindustrial use case. In future research, the effect of various membership functions shapesand using linguistic terms to define membership functions should be studied. In addition,the learning capabilities of the fuzzy method could potentially be further enhanced withneural networks.

Keywords Fuzzy logic, data modelling, data visualization, Kernel Density Estimation,artificial neural networks

Page 4: Multivariate fuzzy modelling of time- series data

Aalto-yliopisto, PL 11000, 00076 AALTOwww.aalto.fiDiplomityön tiivistelmä

Tekijä Tuomas Keski-HeikkiläTyön nimi Moniulotteisen aikasarjadatan sumea mallinnusMaisteriohjelma KonetekniikkaTyön valvoja Prof. Kari TammiTyön ohjaaja(t) TkT Miika Valtonen ja DI Riku Ala-LaurinahoPäivämäärä 24.07.2021 Sivumäärä 62 Kieli Englanti

TiivistelmäModernit teolliset laitteet tallentavat enenevissä määrin dataa. Tätä dataa voitaisiin käyttääparemmin laitteiden tai niiden käytön kehittämiseksi. Datan mallinnukseen on olemassamonia menetelmiä, mutta jokaisella on puutteensa. Esimerkiksi ydinestimointi suurellamäärällä muuttujia on haasteellista ja neuroverkot herkästi ylisovittavat ja ovat vaikeastitulkittavia.Tässä diplomityössä päätavoite oli kehittää sumeaan logiikkaan perustuva mallinnusmene-telmä moniulotteiselle aikasarjadatalle. Tavoitteena oli, että laitesuunnittelijat tai insinööritpystyisivät menetelmän avulla paremmin tulkitsemaan ja käyttämään laitteista kerättyädataa. Toissijaisena tavoitteena oli tutkia kolmea potentiaalista käyttötapausta sumeallemallinnukselle: visualisoinnit, ennustava mallinnus, sekä poikkeamien havainnointi.Sumea mallinnusmenetelmä kehitettiin iteratiivisesti testaamalla. Menetelmää arvioitiindatalla, joka kerättiin teollisesta hallinosturista, Ilmattaresta. Kaikki kerätty data yhdistet-tiin ja esiprosessoitiin CSV tiedostomuotoon, ja luodusta tiedostosta muodostettiin mallin-nusmenetelmän avulla sumea malli Ilmattaresta. Lisäksi työssä pohdittiin sumealla logii-kalla ennustamista, sekä poikkeamien tunnistusta konseptitasolla.Työssä selvisi, että mallinnukseen kuluva aika kasvoi ainoastaan lineaarisesti, kun muuttu-jien määrää kasvatettiin. Ehdotettu sumea mallinnus menetelmä siis skaalautuu hyvin suu-relle määrälle muuttujia. Lisäksi huomattiin, että sumealla mallilla pystyttiin tallentamaantarkka esitys Ilmattaren datasta 80 % pienemmällä data määrällä raakadataan verrattuna.Mallinnuksessa kuitenkin joudutaan tekemään valintoja mallin tulkittavuuden ja tarkkuu-den välillä. Yksi sumean mallinnuksen eduista on myös sen tarjoama sisäänrakennettutapa yhdistää rivejä.Tässä diplomityössä onnistuneesti kehitettiin sumea mallinnusmenetelmä, jolla on etujamuihin yleisiin mallinnusmenetelmiin verrattuna. Seuraava askel on sumean mallinnuksensoveltaminen oikeassa teollisuuden käyttötapauksessa. Jatkotutkimuksissa tulisi selvittääeri jäsenyysfunktioiden vaikutus, sekä kielellisten termien hyödyntäminen jäsenyysfunkti-oissa. Lisäksi sumean mallinnuksen oppimista voitaisiin mahdollisesti parantaa hyödyn-tämällä neuroverkkoja.

Avainsanat Sumea logiikka, datan mallinnus, datan visualisointi, ydinestimointi,neuroverkot

Page 5: Multivariate fuzzy modelling of time- series data

PrefaceFirst and foremost, I want to thank my advisors Miika and Riku for their invalua-ble comments and encouragement in the writing of this thesis.I would also like to thank Kari and Pauli for providing me with the opportunity towork full time on this exciting project. Thanks also to Valtteri, for providing anindustry perspective, and to everyone else involved in the Machinaide project.Lastly, I want to thank my partner, Liisa, whose support has been especially inval-uable in these days of remote work and isolation amidst the Covid-19 pandemic.Helsinki, 24.07.2021

Tuomas Keski-Heikkilä

Page 6: Multivariate fuzzy modelling of time- series data

Table of contents1 Introduction ……………………………………………………………………………………………1

1.1 Objectives and methods ............................................................................... 21.2 Scope and thesis structure ............................................................................3

2 Related work ………………………………………………………………………………………….42.1 Fuzzy logic .................................................................................................... 4

2.1.1 Fuzzy logic controllers............................................................................52.1.2 Classification and pattern recognition .................................................. 62.1.3 Fuzzy forecasting and anomaly detection ............................................. 82.1.4 Adaptive fuzzy systems.......................................................................... 8

2.2 Kernel density estimation ............................................................................ 92.2.1 Bandwidth............................................................................................. 112.2.2 Computational aspects ......................................................................... 132.2.3 Online density estimation .................................................................... 132.2.4 KDE anomaly detection........................................................................ 14

2.3 Artificial neural networks ........................................................................... 142.3.1 Predictive neural networks................................................................... 172.3.2 Anomaly detection with neural networks ............................................ 172.3.3 Neuro-fuzzy methods ...........................................................................18

3 Modelling concept ………………………………………………………………………………..203.1 Building the model ..................................................................................... 203.2 Model querying and visualization...............................................................253.3 Predictive modelling .................................................................................. 263.4 Anomaly detection ..................................................................................... 32

4 Evaluation.……………………………………………………………………………………………344.1 Software tools ............................................................................................. 344.2 Fuzzy model parametrization .....................................................................354.3 Ilmatar data collection ............................................................................... 38

4.3.1 Fuzzy model .......................................................................................... 41

Page 7: Multivariate fuzzy modelling of time- series data

4.4 Model query and visualization example .................................................... 424.5 Model optimization .....................................................................................45

4.5.1 Fuzzy operators ....................................................................................454.5.2 Data reduction ..................................................................................... 48

4.6 Execution time of fuzzy modelling............................................................. 505 Discussion ……………………………………………………………………………………………53

5.1 Accuracy and interpretability......................................................................535.2 Fuzzy learning compared to related work ..................................................545.3 High-dimensional online modelling ...........................................................545.4 Data reduction.............................................................................................555.5 Future work .................................................................................................56

6 Conclusions ………………………………………………………………………………………….57References ………………………………………………………………………………………………….58

Page 8: Multivariate fuzzy modelling of time- series data

AbbreviationsANN Artificial Neural NetworkBCV Biased Cross-ValidationCV Cross-ValidationCSV Comma-Separated ValuesFFT Fast Fourier TransformFL Fuzzy LogicFLC Fuzzy Logic ControllerIoT Internet of ThingsKDE Kernel Density EstimationLSCV Least-Squares Cross-ValidationMF Membership functionNN Neural NetworkoKDE Online Kernel Density EstimationPDF Probability Density FunctionPI Plug-InPLC Programmable Logic ControllerRAM Random Access MemoryROT Rules of ThumbSCV Smoothed Cross-ValidationSQL Structured Query Language

Page 9: Multivariate fuzzy modelling of time- series data

1

1 IntroductionModern industrial machines record potentially up to hundreds of different operat-ing parameters. While large amounts of data are collected by industrial machines,a major challenge facing the industry is using this data in an optimal way to auto-matically find and analyze interesting patterns (Diez-Olivan, Del Ser, Galar, &Sierra, 2019). There is great potential in using data to better predict componentfailures because the maintenance costs for industrial machines can account for75% of equipment lifecycle costs (O’Donovan, Leahy, Bruton, & O’Sullivan, 2015).In recent research, it has also been noted that the success of a product design isincreasingly dependent on the manufacturer’s capability to manage data (Tao etal., 2019).This thesis was funded by the Machinaide project. The Machinaide project sup-ports new innovative concepts for accessing, searching, analyzing, and using mul-tiple machine’s data to increase machine usability and functionality. Specifically,in the case of industrial crane operating data, interest lies in developing new waysto model and analyze the collected data, to provide insights for machine users. Forthis thesis, three use cases were identified for crane data modelling: descriptivefuzzy model showing how the machine has been used, predictive model that pre-dicts future behavior based on historical data and anomaly detection model thatdetects abnormalities or changes in the process.One method for modelling machine usage data is to employ density estimation.Density estimation is a statistical method for finding the underlying probabilitydensity function of a data set (Silverman, 2018). Applying density estimation tomachine operating parameters provides usage distributions and information onthe behavior and interaction between different parameters. Density estimationmethods include kernel density estimation (Gramacki, 2018) and neural densityestimation (Papamakarios, 2019). These methods produce accurate density esti-mates but can quickly become computationally expensive with increasing amountsof data and data dimensions.Anomaly detection is a form of data analysis, where the goal is to identify unex-pected behavior from data sets (Chandola, Banerjee, & Kumar, 2009). The unex-pected data points or events are usually referred to as anomalies or outliers.Anomaly detection is widely used across many different application domains rang-ing from healthcare to military surveillance. In the industry, effective anomaly de-tection leads to increased equipment availability, product quality, worker safety,and reduced rework cost (Siegel, 2020). Predictive analytics, on the other hand, isthe process of discovering patterns in data in order to estimate future outcomes(Larose, 2015).Recently, (Siegel, 2020) compared multivariate neural network architectures witha collection of traditional machine learning techniques for anomaly detection. Thedatasets in the study were from an industrial testbed, and he found that the stud-

Page 10: Multivariate fuzzy modelling of time- series data

2

ied neural network architectures outperformed the popular traditional methods.Additionally, neural networks are often applied in predictive analytics with appli-cations ranging from stock market predictions, patient disease risk to predictingmotor failure (Scalabrini Sampaio, Vallim Filho, Santos da Silva, & Augusto daSilva, 2019). Neural networks offer many advantages, such as the ability to fitcomplex nonlinear models with no prior specification of the model (Livingstone,Manallack, & Tetko, 1997). However, neural networks can suffer from overfittingand overtraining, and the interpretation of neural networks is difficult for non-experts (Livingstone et al., 1997).Fuzzy logic systems follow a human-like decision-making process and this is aclear user-interaction advantage (Albertos, Sala, & Olivares, 1998). When com-pared with neural networks, fuzzy systems require less training data and betterexplain reached conclusions. Using fuzzy control principles, the data can also begeneralized to allow for easy aggregation to reduce the amount of data. Fuzzy con-trol is a widely used method in many research areas and it has been extensivelyresearched over the past decades. Fuzzy modelling has, for example, been appliedto modelling and predicting the behavior of cast components (Tarasov, Tan,Jarfors, & Seifeddine, 2020) and anomaly detection of industrial wireless sensornetworks (Kumarage, Khalil, Tari, & Zomaya, 2013). However, to the author’s bestknowledge, it has not been applied in the manner proposed in this thesis.1.1 Objectives and methodsThe primary objective of the thesis was to develop a method for building a fuzzymodel from industrial equipment data. The goal is that the proposed method couldstore machine data more efficiently and still capture the whole variable space. Thecreated model could then be used to query any variable subset. Secondarily, poten-tial use cases for the modelling method will be studied:

1. Visualizations of the built fuzzy model,2. Predictive modelling,3. Anomaly detection.

In order to achieve the research objectives, the modelling method was first devel-oped by iterative prototyping. Java was used as a programming language, andjfuzzylite Java library by (Rada-Vilela, 2018) was used to implement fuzzy logic.Real crane data was then collected from an industrial smart crane (Ilmatar,Konecranes), shown below in Figure 1. This data was used to test the developedmodelling method. In addition, visualizations of the crane data were made fromthe developed model. Finally, predictive modelling and anomaly detection with thefuzzy modelling method was studied.

Page 11: Multivariate fuzzy modelling of time- series data

3

Figure 1. Industrial crane Ilmatar.1.2 Scope and thesis structureThis thesis focuses on fuzzy logic-based modelling method development, detailedsoftware implementation is excluded from the scope. Data from an industrialcrane, Ilmatar, is used to evaluate the developed method and the focus is on devel-oping the modelling method for industrial use. In the related works, two closelyrelated technologies are covered in addition to other fuzzy systems: kernel densityestimation and neural networks.The rest of the thesis is structured as follows. Chapter 2 reviews the literature onrelated works. Chapter 3 presents the modelling concept and chapter 4 evaluatesthe modelling concept with data from Ilmatar. Chapter 5 discusses key elements ofthe developed modelling method, makes comparisons to related work and presentsideas for further research. Finally, chapter 6 summarizes the work.

Page 12: Multivariate fuzzy modelling of time- series data

4

2 Related workThis chapter presents related works in industrial data modelling. In order to applyfuzzy logic in a modelling method, it is essential to understand the fundamentals offuzzy logic. Therefore section 2.1 first reviews the basics of fuzzy logic, and thenpresents related works in the field of fuzzy logic. Kernel density estimation is awidely used density estimation method and it was considered as a potential ap-proach to be applied in the modelling of industrial equipment data. In addition,ideas from kernel density estimation were applied in the visualization of built fuzzymodels. Therefore, kernel density estimation is covered in section 2.2. Finally, sec-tion 2.3 covers artificial neural networks, another key technique in data modelling,which, much like fuzzy systems, attempt to understand real-world systems by tak-ing in fundamental knowledge, such as measurements and observations, ratherthan beginning from a theory or a mathematical model (Ross, 2005).2.1 Fuzzy logicFuzzy logic (FL) is a way of dealing with imprecision, it was introduced by Zadeh(Zadeh, 1965) to better handle imprecise data. FL is a multivalued logic, wherevalues are allowed also between 0 and 1 as opposed to traditional Boolean logic,where something strictly is true or false. FL allows a more human-like approach todecision making, with fuzzy logic it is also possible to make rational decisionsbased on imperfect information (Hellmann, 2001; Wolf et al., 1996; Zadeh, 2008).The fact that in fuzzy logic, the truth value of a proposition lies somewhere be-tween 0 and 1, means that a proposition can be true and false to some degree sim-ultaneously. Temperature is a good example of a variable that can be modelledwith fuzzy logic. We can assign several linguistic values for temperature, such asvery cold, cold, warm, hot, or very hot (see Figure 2 below).

Figure 2. Example of membership functions (Wolf et al., 1996)

Page 13: Multivariate fuzzy modelling of time- series data

5

These linguistic values can be represented with membership functions as shown inthe previous figure, thus forming a fuzzy set for temperature. The degree to whicha real temperature value belongs to a given linguistic value is given by the mem-bership function. For example, in the above figure, the temperature of 23.5 degreesCelsius is hot to a degree of 0.18 and warm to a degree of 0.79. This process oftransforming real values into linguistic fuzzy values is called fuzzification (Wolf etal., 1996).Membership functions can be of any shape, some common options are presentedin Figure 3 below. In this thesis, Gaussian membership functions are predomi-nantly used to simplify the design process. Gaussian membership functions aresuitable to model nearly any type of variable and the membership functions can bedesigned so that for every value, two to three membership functions activate sim-ultaneously. This helps in the design of fuzzy modelling, especially with assigningweights to fuzzy rules.

Figure 3. Common membership function shapes2.1.1 Fuzzy logic controllersA common use of fuzzy logic is in fuzzy logic controllers. Fuzzy control has evenbeen applied in automatic crane operation to limit the sway of the hook (Itoh,Migita, Itoh, & Irie, 1993). Fuzzy logic control has been successfully applied inmany industrial applications. It is often applied in complex control systems, likechemical process control, but it can also be successfully applied in low-cost micro-controller control applications (Chabni, Taleb, Benbouali, & Bouthiba, 2016). InFL controllers, if-then rules are used to define the relationships between the inputand output variables in a fuzzy logic system (Rada-Vilela, 2018). An example of anif-then rule could be “If temperature is hot then fan speed is high”. In this example“temperature is hot” is the antecedent of the rule and “fan speed is high” is theconsequent. The rules are always written in the form “If antecedent then conse-quent”. In the antecedent, multiple statements can be connected with “and’s” and

Page 14: Multivariate fuzzy modelling of time- series data

6

“or’s”, these are called the conjunction operator and the disjunction operator, re-spectively. In the consequent, the statements are independent, but they can still bechained together with symbolic “and’s”.An example of a fuzzy logic controller is illustrated in Figure 4 below (Rada-Vilela,2018). Fuzzy logic controller’s operation consists of three stages: the fuzzificationstage where the input values are converted into fuzzy values, the inference stagewhere the system’s fuzzy rules are activated and the consequents are aggregatedinto fuzzy outputs and finally, the defuzzification stage where the fuzzy outputs aredefuzzified back into crisp real values, such as a monetary value of the tip as shownin the example below.

Figure 4. Example of a fuzzy logic controller (Rada-Vilela, 2018)In the defuzzification stage, the crisp values can be calculated by integrating overthe fuzzy output values, examples of this include the centroid method or the bisec-tor method (Rada-Vilela, 2018). Another method for obtaining the crisp values isto use a weight-based defuzzifier, which uses the weights and fuzzy values to de-termine the crisp value. The centroid or in other words, the center of gravity (CoG)method is the most used.2.1.2 Classification and pattern recognitionIn addition to control systems, fuzzy logic has typically been applied in classifica-tion and pattern recognition problems (Ross, 2005). In classification, also knownas clustering, the aim is to divide a data set into homogenous clusters. Clusteringcan be further divided into hard and soft clustering, see Figure 5 below. In hardclustering, a data point belongs to just one cluster, whereas in soft clustering, alsoknown as fuzzy clustering, a data point can belong to multiple clusters with differ-ent probabilities.

Page 15: Multivariate fuzzy modelling of time- series data

7

Figure 5. Hard and soft clustering exampleThe hard c-means (HCM) method, suggested by James Bezdek in 1981, is a power-ful method for hard clustering (Ross, 2005). In the HCM method, an objectivefunction is used to:

1. Minimize the Euclidean distance between data points and the center of itscluster,

2. Maximize the Euclidean distance between cluster centers.Fuzzy c-means clustering (FCM), on the other hand, is the most common fuzzyclustering algorithm (Ross, 2005). In FCM data points are grouped into N clusters,and every data point belongs to clusters to a certain degree. The data points thatare close to a cluster’s center point will have a high degree of membership in thecluster, and the data points that lie far from the cluster center have a low degree ofmembership in that cluster.The FCM algorithm consists of several steps but also focuses on minimizing anobjective function (Hung & Yang, 2001). To simplify, in FCM a defined number ofinitial cluster centers are first randomly selected from the given dataset. Then anew cluster center is iteratively found, until the cluster center is stable, or the ob-jective function converges to a local minimum. For a more complete explanation ofthe FCM algorithm, the reader can refer to (Hung & Yang, 2001).While classification determines the structure in a set of data, pattern recognitionseeks to assign new data into classes defined in the classification process (Ross,2005). Figure 6 below shows an overview of how a pattern recognition systemmight work. The classification system is taught with training data, and a feedbackloop is required in case better class divisions are needed. The pattern recognitionsystem takes in new data and assigns it into classes provided by the classificationsystem. This step also requires a feedback loop in case the system fails to matchthe data. Pattern recognition is used widely across many engineering and scientificdisciplines like biology, computer vision and artificial intelligence. There are

Page 16: Multivariate fuzzy modelling of time- series data

8

countless applications for pattern recognition, such as data mining for meaningfulpatterns, biometric recognition for personal identification and remote sensing toforecast crop yield (Jain, Duin, & Mao, 2000).

Figure 6. Classification and pattern recognition overview (Ross, 2005)2.1.3 Fuzzy forecasting and anomaly detectionFuzzy logic has also been employed in forecasting applications and anomaly detec-tion. Tapio Frantti and Petri Mähönen developed a fuzzy logic advisory tool(FLAT) to forecast the demand for signal transmission products in an electronicsmanufacturing factory (Frantti & Mähönen, 2001). In their approach, membershipfunctions were generated based on input data and the fuzzy rules were designedwith expert knowledge. Their model successfully provided more accurate decision-making support than earlier methods. Fuzzy logic-based systems can successfullymanage complex nonlinear systems and have also been employed for predictivemodelling in, for example, the casting industry (Tarasov et al., 2020). Tarasov etal. found that their fuzzy logic-based model’s accuracy was similar to the ANN ap-proaches proposed in comparison. They concluded that the fuzzy-based model isadvantageous over the ANN model especially when transparency of the model isvalued.In addition, fuzzy logic has been used for anomaly detection especially in the fieldof network anomaly detection (Hamamoto et al., 2018; Kumarage et al., 2013).Hamamoto et al. employed fuzzy logic combined with genetic algorithms to predictand detect network anomalies. Kumarage et al. utilized fuzzy c-means clustering todetect malicious activities in wireless sensor networks.2.1.4 Adaptive fuzzy systemsFuzzy systems can also be built to automatically determine membership functionsor the IF-THEN rules. Many algorithms can be used to automate fuzzy systems,such as batch least squares, recursive least squares, gradient method, learningfrom example (LFE), modified learning from example (MLFE), and clusteringmethod (Ross, 2005).Vainio et al. built an adaptive fuzzy logic control system to automatically adjustVenetian blinds and ceiling lights in a smart home (Vainio, Valtonen, & Vanhala,2008). In their approach, a learning algorithm was used to automatically learn therules for the control system based on user routines. In their approach, the context

Page 17: Multivariate fuzzy modelling of time- series data

9

information was periodically saved, and this information was then fuzzified. Thedata was processed in separate sets, each containing the sensor and actuator val-ues from a single measurement time. The sets were then used to update the fuzzyrule base. For each set, the existing rule base was searched for a matching input-value combination. If no matching rule is found, one is created, combined with theset’s output-value combination with a small initial rule weight. If, however, amatching input-value combination was found, the output values were compared. Ifthe output-value combination of the found rule matched the set’s output-valuecombination, the weight of this rule was increased, and if not, the rule’s weight wasdecreased. Typically, more than one matching rule is found. In addition, when ruleweight drops below 0 because of decrements, the rule is then removed from therule base.Vainio et al. found that their system could successfully learn a user’s preferencesand control the lighting accordingly. Additionally, they argued that the proposedsystem could potentially be applied to many other applications such as heating,plumbing, air conditioning or entertainment control systems (Vainio et al., 2008).2.2 Kernel density estimationKernel smoothing is one of many data smoothing techniques, but it is the mostimportant and widely used (Crabbe, 2013; Gramacki, 2018). Kernel Density Esti-mation (KDE) is covered in related works because it was initially considered as apotential modelling approach and some of the core ideas, especially from KDE vis-ualizations, are applied in the proposed fuzzy logic-based method in this thesis. Inaddition, KDE was used to evaluate the validity of the density visualizations madefrom the fuzzy models.KDE is a powerful method for finding the underlying probability density functionfrom data (Gramacki, 2018). It is a nonparametric data analysis approach. Thismeans that no assumptions are made of the distribution of examined data. KDE inthe univariate case was first developed by Fix and Hodges in 1951 and it was pub-lished by Rosenblatt in 1956. They define the kernel density estimator as (equation1):

𝑓 𝑥 ℎ 1𝑛ℎ∑ 𝐾 𝑥 𝑖

ℎ𝑖𝑖=1 (1)

where X1, X2, …, Xi is a random sample drawn from a common density function f, his the bandwidth, and K is the kernel (Crabbe, 2013; Gramacki, 2018; Rosenblatt,1956). In the univariate case, a kernel function with a specified bandwidth is in-serted in the location of each data point as shown in Figure 7 below. The sum ofthese kernel functions forms the probability density function.

Page 18: Multivariate fuzzy modelling of time- series data

10

Figure 7. Construction of a kernel density estimation with Gaussian kernels (Gramacki, 2018)In KDE, the used kernel is usually symmetric, but recently, also asymmetric kernelfunctions have been used in some applications (Węglarczyk, 2018). Some com-monly used symmetric kernel functions are displayed in Figure 8 below.

Figure 8. Example kernel optionsA normal distribution is most commonly used in KDE (Guidoum, 2015), see Figure9.a. It has also been proven, that the choice of Kernel is not very important, espe-cially with larger datasets, in contrast to the high importance of bandwidth selec-tion (Chen, 2017; Gramacki, 2018). Figure 9 below illustrates this by showing acalculated KDE with the different kernel types presented earlier.

Page 19: Multivariate fuzzy modelling of time- series data

11

Figure 9. KDE with different kernels2.2.1 BandwidthBandwidth selection is a crucial task in kernel density estimation (Gramacki,2018). The accuracy of KDE is heavily dependent on the used bandwidth. In theunivariate case, bandwidth is simply a scalar value, and it controls the amount ofsmoothing done to the data. In the multivariate case, bandwidth is a matrix thatcontrols both the orientation and amount of smoothing. Bandwidth is commonlydenoted as h.The issue of selecting optimal bandwidth with certain criteria can be solved inmany ways. Bandwidth selectors can be divided into three main types (Gramacki,2018):

1. Rules of thumb (ROT) methods. These use simple mathematical formulas tocalculate the bandwidth. They are designed to work in many circumstances,but there is no guarantee that the resulting bandwidth is optimal.2. Cross-validation (CV) methods. These can be further divided into at leastthree variants: least-squares cross-validation (LSCV), biased cross-validation (BCV) and smoothed cross-validation (SCV).3. Plug-in (PI) methods. These are based on plugging in estimates of unknownvariables in formulas for asymptotically optimal bandwidth.

With too small bandwidth, the resulting PDF becomes jittery and individual pointconcentrations show clearly (Gramacki, 2018). However, with too large a band-width, modes can be lost. This is known as over smoothing. Figure 10 below illus-trates the difference in KDE using different bandwidths. The data was randomlygenerated and a gaussian kernel was used.

Page 20: Multivariate fuzzy modelling of time- series data

12

Figure 10. Gaussian kernels with differing bandwidthsFrom the figures, we can see that with h=0.01 the figure reminds a histogram. Thefigure shows all individual concentrations of data points but provides a poor repre-sentation of the underlying distribution. Bandwidth value h=0.1 provides a fairlyoptimal KDE, but the most ideal bandwidth always depends on the criteria of theuse case. The h=0.5 bandwidth value slightly over smooths the density estimationand h=1 lose all but the main mode of the distribution. Selecting the bandwidth isnot a simple task. Intensive research has been made on the subject and every de-veloped method for determining the ”optimal” bandwidth provides the most idealbandwidth when compared to the criteria used for creating the method (Gramacki,2018).Additionally, KDE can be divided into fixed and adaptive KDE, based on the meth-od of calculating bandwidth. In fixed KDE the bandwidth is constant across thewhole evaluated grid, as in previous examples, whereas in adaptive KDE thebandwidth varies between locations on the grid (Yuan et al., 2019). Adaptive KDEhelps overcome some of the challenges in fixed KDE. For example, if data is gener-ally sparsely spread across the evaluated grid but has a very high density in a par-ticular part of the grid, the fixed bandwidth optimized for the whole grid may notaccurately represent the dense area (Yuan et al., 2019). Whereas with adaptiveKDE, in the areas of the grid with higher density, the bandwidth can be smaller,and in the more sparsely populated areas of the grid, the bandwidth can be higher.Thus, giving a better representation of the probability distribution across thewhole evaluated grid.

Page 21: Multivariate fuzzy modelling of time- series data

13

2.2.2 Computational aspectsWhile KDE techniques have reached maturity in research, KDE can be computa-tionally very expensive, and recent research has been focused on the computation-al aspects (Gramacki & Gramacki, 2017). Typically, KDE is calculated with an ap-proximation technique called binning (Gramacki, 2018). In binning the KDE iscalculated for an equally spaced grid of evaluation points. To reach a reasonableaccuracy, the number of grid points should be more than 50 for each dimension.The computational complexity of kernel density estimation is 𝑂 𝑚𝑛 where m isthe number of evaluation points and n is the number of data points. The evaluationpoints can be the same as the number of data points, thus giving 𝑂 𝑛2 for compu-tational complexity. It is clear that the computational complexity grows rapidlywith increasing sample size and often the naive evaluation of KDE is impractical(Gramacki, 2018).One approach to reducing the computational complexity is to combine fast Fouriertransform (FFT) with binning (Gramacki, 2018; Gramacki & Gramacki, 2017).Many other approaches have also been proposed to reduce computational times,such as Message Passing Interface (MPI) (Łukasik, 2007), Graphics ProcessingUnits (GPUs) (Andrzejewski, Gramacki, & Gramacki, 2013) and Field-Programmable Gate Arrays (FPGAs) (Gramacki, Sawerwain, & Gramacki, 2015).Some methods are hardware-based, while others are software-based.2.2.3 Online density estimationEven with novel techniques and increasing computational capacity, processinglarge datasets in batches quickly becomes unfeasible. Online Kernel density esti-mation (oKDE) is a software-based method that was developed by (Kristan,Leonardis, & Skočaj, 2011) to allow for online bandwidth estimation and to keepthe KDE’s complexity low. All data may also not be available in advance, or per-haps a process needs to be monitored indefinitely, while continuously providingthe best estimate of the data distribution. For these reasons, online density estima-tion may be required (Ferreira, de Matos, & Ribeiro, 2016; Kristan et al., 2011).In oKDE the key idea is that not all the data points are stored, but instead a com-pressed sample distribution is maintained. Based on the compressed sample dis-tribution an optimal bandwidth can be calculated, and the convolution betweenthe bandwidth and the compressed model provides the KDE of the underlying dis-tribution. The compression is maintained by a compression and revitalizationscheme that they presented (Kristan et al., 2011).In 2016 Ferreira et al. redesigned oKDE to be even more computationally efficient,numerically robust and extensible (Ferreira et al., 2016). They redesigned thesoftware, with high dimensionality in mind. The proposed approach was testedwith datasets with up to 30 dimensions. They reached up to 40 times faster calcu-lation speeds and required 90% less memory. The redesigned oKDE was releasedas xokde++ and it is implemented in C++.

Page 22: Multivariate fuzzy modelling of time- series data

14

Self-organizing maps (SOM) have also been employed for density estimation overdata streams (Cao, He, & Man, 2012). Additionally, Zhou et al. developed a newconcept of M-Kernels for performing density estimation on a data stream (Zhou,Cai, Wei, & Qian, 2003). Traditionally, each data point is represented with its ownkernel function. In the M-Kernel method, on the other hand, the number of ker-nels is reduced by combining kernels into larger M-Kernels. As a simple example,if two data points have the same value, instead of two kernels, they can be repre-sented by a larger M-Kernel. The M-Kernel is given an additional parameter:weight 𝜌, which indicates how many kernels it represents.More recently, the M-Kernel method has been applied in KDE-based outlier detec-tion (Qin et al., 2019). Qin et al. developed a strategy (KELOS) to continuously de-tect the top-N KDE-based local outliers. In the proposed method kernels that areclose to each other are combined, similarly as in the M-Kernel method. In addi-tion, data points that have no prospect of becoming outliers are disregarded. Themethod proved to be 2-3 times faster for outlier detection than baseline outlierdetection methods.2.2.4 KDE anomaly detectionDensity-based approaches can also be used in anomaly detection applications,even for nonlinear systems (Zhang, Lin, & Karim, 2018) or data streams (Qin et al.,2019). Zhang et al. proposed an adaptive kernel density-based approach for detect-ing anomalies in nonlinear systems. The proposed approach was built to assign alocal outlier score to each sample, which was used to indicate how anomalous thesample was compared to other nearby samples. To enhance the system, the kernelbandwidth was adaptively set lower in high-density regions and higher in low-density regions. The proposed approach was also experimented with an industrialdataset and promising results were achieved, demonstrating the applicability in areal-world application (Zhang et al., 2018).2.3 Artificial neural networksArtificial neural networks (ANNs) often simply called neural networks (NNs) areinspired by the brain (Da Silva et al., 2017). Neural networks can learn to solvevarious problems by employing different algorithms that mimic the function ofbiological neurons. ANNs are comprised of several artificial neurons connected bysynaptic connections. An artificial neuron is presented in Figure 11 below.

Page 23: Multivariate fuzzy modelling of time- series data

15

Figure 11. An artificial neuron (Da Silva et al., 2017)

Artificial neurons consist of seven elements, shown in the above figure (Da Silva etal., 2017):

1. Input signals (x1, x2, …, xn) that are the signals entering the neuron, comingfrom the sensed environment or a connected neuron.

2. Synaptic weights (w1, w2, …, wn) that are assigned to input signals to allowfor quantification of the signal’s significance to the neuron’s functionality.

3. Linear aggregator (Σ) that combines the input signals, weighted by their re-spective weights, into an activation voltage.

4. Activation threshold or bias (θ) specifies a threshold that the activationvoltage coming from the linear aggregator should surpass to generate anoutput trigger.

5. Activation potential (u) is the difference between the activation voltagecoming from the linear aggregator and the activation threshold. If the acti-vation voltage is higher than the threshold, the neuron is activated, if not,the neuron will remain deactivated.

6. Activation function (g) limits the output of the neuron within a range de-fined by the used activation function.

7. The output signal (y) transmits the final value produced by the neuron froma set of given input signals. The output signal can also be used as an input toa sequentially connected neuron.

The neurons can be arranged together in many different structural compositions,and these are called architectures (Da Silva et al., 2017). Neural network architec-tures can be divided into two categories: feed-forward networks and feedback net-works (Jain, Mao, & Mohiuddin, 1996). Feed-forward networks do not containloops and are organized in layers that are unidirectionally connected. Generally,feed-forward networks are static, the output for a given input is a set of values, ra-ther than a sequence, and does not depend on the previous states of the neurons(Jain et al., 1996). Conversely, feedback networks are dynamic systems. When aninput is given and neuron outputs are calculated, because of the feedback loops,the inputs to every neuron are modified. Thus, the network enters a new state.

Page 24: Multivariate fuzzy modelling of time- series data

16

Used learning algorithms vary between different network architectures. Some typi-cal architectures for these two categories are presented in Figure 12 below.

Figure 12. Neural network architectures (Jain et al., 1996)Neural networks are typically taught with training data sets. The learning processis often supervised, meaning the output values are also known in the training data.However, for some tasks, such as preprocessing raw data to extract features, alsounsupervised learning can be used (Becker & Plumbley, 1996). From the trainingdata sets the neural network parameters, synaptic weights and thresholds, are iter-atively modified to teach the network to perform a specific task (Jain et al., 1996).The network is tuned so that it gives the correct output with the input values in-cluded in the training data set. Typically, the training is concluded after a specifiedlevel of accuracy is reached (Da Silva et al., 2017).Generally, artificial neural networks consist of three parts (Da Silva et al., 2017):

1. An input layer, which receives data from the external environment, such assignals or measurements. This input data is usually normalized between thevalue range of the successive neurons’ activation functions, to improve thenumerical precision of calculations within the network.

2. Hidden layers, which consists of several neurons that account for most ofthe processing done in the network. The goal of the hidden layer is to learnthe patterns associated with the modelled system. A typical NN architectureincludes one or more hidden layers.

3. Output layer, which also consists of several neurons. The output layer is re-sponsible for providing the final output from the neural network.

The structure of a common NN architecture, multilayer feedforward network, isdisplayed in Figure 13 below. Multilayer feedforward networks can be employedfor diverse problems ranging from pattern classification, process control and op-timization to robotics (Da Silva et al., 2017). The below example has two hidden

Page 25: Multivariate fuzzy modelling of time- series data

17

layers, the number of hidden layers and the number of neurons included dependon the nature and complexity of the problem the network is applied to.

Figure 13. Multilayer feedforward network (Da Silva et al., 2017)2.3.1 Predictive neural networksNeural networks are one of many machine learning techniques that are currentlybeing employed in prediction tasks (Scalabrini Sampaio et al., 2019). ScalabriniSampaio et al. developed a multilayer perceptron NN method, much like the struc-ture presented in Figure 13, for predicting the failure time of a motor. They foundthat the ANN method was superior to other common machine learning methods.Especially the medium to long term predictions was more accurate with the ANNmethod. However, they noted that some weaknesses of the multilayer perceptronNNs, such as the tendency to converge slowly, and overfitting, should be furtherresearched.Another example of a challenging prediction task that can be solved with NNs ispredicting the remaining useful life of a bearing (Ren et al., 2018). Ren et al. pro-posed a convolution NN for the task and found that the proposed method signifi-cantly improved the prediction accuracy. A convolution NN is a type of feedfor-ward network, commonly used in image processing. Neural networks are also em-ployed in problems such as stock market predictions and prediction of Parkinson’sdisease risk (Guresen, Kayakutlu, & Daim, 2011; Sadek et al., 2019).2.3.2 Anomaly detection with neural networksAnomaly detection problems are also commonly solved with neural networks andoften outperform typical machine learning techniques, such as isolation tree mod-els or minimum covariance determinant models (Siegel, 2020). Staar et al. pro-posed a new approach to anomaly detection for industrial surface inspection(Staar, Lütjen, & Freitag, 2019). In their approach, a convolutional neural network

Page 26: Multivariate fuzzy modelling of time- series data

18

(CNN) was trained to learn a similarity metric for the inspected surface texture byusing triplet networks, a recent development in deep metric learning. Staar et al.received promising results, indicating that the system could even find defects thatwere not included in the training set (Staar et al., 2019). However, they concludethat while perfect discrimination was reached for some classes, the proposedmethod completely failed for others.Types of autoencoder neural networks have recently seen use in anomaly detectionapplications (Tang et al., 2020; Wang, Wang, Liu, & Qu, 2020). Autoencoder NNsemploy unsupervised learning and generally consist of an input layer, one or morehidden layers and an output layer. Unlike many other such neural network config-urations, in autoencoders, the hidden layers are smaller than the input or outputlayers (Wang et al., 2020). Thus, much like in the proposed fuzzy modelling meth-od in this thesis, the autoencoder learns a compressed representation, which canbe decoded when necessary. The general architecture of an autoencoder is shownin Figure 14, autoencoders consist of an encoder and a decoder (Wang et al.,2020).

Figure 14. Example of an autoencoder neural network (Wang et al., 2020)2.3.3 Neuro-fuzzy methodsNeuro-fuzzy systems have recently gained increasing popularity among research-ers in various fields (Shihabudheen & Pillai, 2018). In neuro-fuzzy systems, neuralnetworks are employed together with fuzzy systems to provide the learning capa-bility for either selecting suitable membership functions or to determine fuzzyrules (Nauck, 1997). The learning ability of NNs is combined with fuzzy systemscapability to clearly represent information and to improve the inference capabili-ties of the system (Shihabudheen & Pillai, 2018). Generally, neuro-fuzzy systemsutilize multilayer feedforward neural networks (Nauck, 1997). Neuro-fuzzy sys-

Page 27: Multivariate fuzzy modelling of time- series data

19

tems were initially applied in the domain of fuzzy control, but more recently havebeen applied to a wider range of domains, such as data analysis and decision sup-port (Nauck, 1997). Other applications in recent research include nonlinear systemmodelling, classification, time series prediction and control (Shihabudheen &Pillai, 2018). Reversely, fuzzy techniques can also be used to speed up the learningprocess of a neural network. However, when the goal is to create a neural network,the approach should rather be called fuzzy neural networks (Nauck, 1997).

Page 28: Multivariate fuzzy modelling of time- series data

20

3 Modelling conceptIn this chapter, the proposed modelling concept is presented. First, the process ofbuilding the model is covered. Essentially explaining how the raw collected datacan be fuzzified and how it is stored, and how it can be aggregated. Sections 3.2,3.3 and 3.4 explain how the model can be queried and the concepts for extendedpotential use cases: model visualization, predictive modelling, and anomaly detec-tion.3.1 Building the modelBefore a fuzzy model can be built from a dataset, each of the variables needs to bedefined as a fuzzy set. The membership functions in the fuzzy set can be of anyshape, the most important part is that every possible value of a variable must bemodelled by at least one membership function. In this thesis, Gaussian member-ship functions were used. These membership functions are placed so that adjacentMF’s intersect with 0.5 degrees of activation and so that every other membershipfunction intersect with approximately 0.05 degrees of activation, this is illustratedin Figure 15 below.

Figure 15. An example fuzzy set for loadThe membership functions, sometimes called linguistic variables, can be given de-scriptive names like shown in the above figure. However, when many membershipfunctions are used, it is easier to simply name the membership functions with or-dinal numbers, starting from 0.

Page 29: Multivariate fuzzy modelling of time- series data

21

With the fuzzy engine defined, machine data can be read and stored in a fuzzyformat. The pseudocode shown below in Figure 16 describes the process of formingfuzzy rules based on reading raw machine data.

Figure 16. Pseudocode for generating fuzzy rules from dataThe idea is to fuzzify each row of crisp input values in up to three fuzzy rules. Tolimit the number of needed rows, the same weight is given to all variables in eachfuzzy rule. A few different methods can be used for this, but in the model optimiza-tion chapter (section 4.6.1) it was concluded that the most accurate way is to takethe average of the individual activation degrees for each output variable. In theabove example, the rules are saved in a database, but the rules can also be saved inthe fuzzy rule-base as if-then rules.As an example, we may be interested in where a crane has been operated in a fac-tory, and in addition, we might want to be able to see where alarms typically occur.This can be used as a simple example case to illustrate the formation of fuzzy rulesfrom industrial equipment data. In this system, one defining input variable – thealarm signal – is included. In this case, we may think of the alarm signal as a sim-ple binary signal, it can have a value of 0, meaning no alarm is active, or value 1,meaning an alarm is active. Similarly, we could have 0 for no alarm, and separateoptions for any given alarm code. This simplified variable can be represented witha fuzzy set consisting of two rectangle membership functions.For the output variables, on the other hand, this type of system would have twovariables: trolley position and bridge position. However, if the rules are saved in adatabase and not at any stage converted into fuzzy if-then rules, there is no distinc-tion between input and output variables, there are only variables. In this example,the trolley position can have values ranging from 0 meters to approximately 10meters and the bridge position can have values ranging from 0 meters to approxi-mately 25 meters. Both variables can be represented by a fuzzy set of 5 Gaussianmembership functions. For an example data point (Table 1), the fuzzy rules aredetermined as follows.

FOR each new dataset NFOR each variable#find the membership functions for sensed stateF.input = FindThreeHighestFuzzyTerms(N.values)FOR each of the three sets of fuzzy terms saved in FFOR each variable#Save the ordinal number of the MFWrite value of fuzzy term in SQL table#Calculate the average activation degreeweight = Avg(activation_degrees)#Save the weight in the databaseWrite weight in SQL table

Page 30: Multivariate fuzzy modelling of time- series data

22

Table 1. An example raw data pointAlarm TrolleyPosition (m) BridgePosition (m)0 3,0 17,5

The membership functions for the bridge position are shown in Figure 17 below.For the value of 17,5 meters, we see that the highest activation degree is for themembership function 3, and it is approximately 0,78. The second highest activa-tion degree is for the membership function 2, and it is approximately 0,23. Finally,the third-highest activation degree is 0,02 for the membership function 4.

Figure 17. Fuzzy set for the bridge position and the highest termsFor the trolley position (Figure 18) below, we similarly have 0,93 for the member-ship function 1, and 0,11 for the membership function 2 and 0,02 for the member-ship function 0.

Page 31: Multivariate fuzzy modelling of time- series data

23

Figure 18. Fuzzy set for trolley position and the highest termsThis gives the following for the three highest rules (Table 2). For the alarm data,we simply have a Boolean value of 0 or 1. For the bridge and trolley position, onthe other hand, we have an ordinal integer that determines the membership func-tion in question and an activation degree, which determines how strongly the vari-able belongs to the membership function.

Table 2. The three highest rulesRule Alarm Bridge Position Activation

degreeTrolley Position Activation

degreeHighest 0 3 0,78 1 0,932nd highest 0 2 0,23 2 0,113rd highest 0 4 0,02 0 0,02

For a fuzzy rule, only a singular weight can be given. Therefore, the activation de-grees for each row need to be combined into a single weight. In the tests completed(section 4.6.1) it was found that the most accurate way of doing this is by takingthe average of the activation degrees. This is calculated in equation (2) below.0 78 0

2𝑎𝑛𝑑 0 2 0 11

2𝑎𝑛𝑑 0 02 0 02

2(2)

These three rules are represented as fuzzy rules as follows. A fuzzy rule always be-gins with the keyword ‘If’ and variables before the following ‘then’ clause are the

Page 32: Multivariate fuzzy modelling of time- series data

24

input variables whereas the variables after are the output variables. The weight ofthe fuzzy rules is given in the end after the ‘with’ clause. Each variable is given anordinal integer, which defines which membership function the read data belongsto, and the weight value describes the degree of membership for each variable. Theformed fuzzy rules are now presented in Figure 19 below. The rules can be saved assuch in the fuzzy rule base.

Figure 19. Three highest fuzzy rulesThese rules can also be saved in a Structured Query Language (SQL) database asshown below in Table 3. The main distinction in the data format is that in the da-tabase, the variables do not have to be divided into input and output variables.This distinction can be made later for any specific use case if needed. In addition, ifa database designed for time series data is used, timestamps of the read data caneasily be included in the rules. While time can be included in fuzzy rules as a varia-ble as well, time information can be more easily managed in a suitable database.

Table 3. The three highest rules in database formatTimeStamp Alarm BridgePosition TrolleyPosition Weight2020-04-04T00:34:37.928Z 0 3 1 0,8552020-04-04T00:34:37.928Z 0 2 2 0,172020-04-04T00:34:37.928Z 0 4 0 0,02

One of the main advantages that the modelling method provides is the ease of ag-gregation. In the modelling method, the data is already bucketed to specific rangesformed by the membership functions as shown above. This combined with the databeing saved in a database capable of handling time-series data, provides an in-built aggregation method in the model. The data can easily be aggregated to anytime step, e.g. from 250ms sampling rate to 1-minute sampling rate. Meaning thatfrom a set of rows in a specified time bucket, the matching rows can be combined,and their respective weights summed together. However, when data rows are com-bined, to prevent the weights from rising excessively, the weights should be nor-malized periodically. The model can be built from a continuous stream of meas-urements, or larger batches, updated periodically.

Page 33: Multivariate fuzzy modelling of time- series data

25

3.2 Model querying and visualizationThe first targeted use case in this thesis was the visualization of the built model.This section describes the process of querying the model and extracting visualiza-tions from the fuzzy data. The overview of the process from raw data to fuzzy visu-alizations is illustrated in Figure 20 below.

Figure 20. Overview of the modelling process, from raw data to the visualizationFirst, the descriptive fuzzy model is formed in the database (or alternatively in thefuzzy engine as if-then rules), as explained in section 3.1. Extracting the visualiza-tions is simpler when the data is saved in the database. With a database, the datato be visualized can be queried with simple SQL queries, and the data can be ag-gregated already in the database. Any number of variables can be queried and ex-tracted from the fuzzy model, thus giving a subset of the whole modelled variablespace. From the extracted data rows the matching rules are then combined andtheir weights summed. The combined rows then need to be matched with the de-fined membership functions in the fuzzy sets. This means that for each variable,the membership function that matches the ordinal number given in the rule for thevariable needs to be found. From the membership functions, the membershipfunction locations (mean values) and standard deviations are matched with thefuzzy data rows. The fuzzy data rows define the weight of each membership func-

Page 34: Multivariate fuzzy modelling of time- series data

26

tion. The last step is to iteratively draw each of the membership functions with thecorresponding weights. An example visualization is shown in Figure 21 below.

Figure 21. Example visualization3.3 Predictive modellingThe second use case targeted in this thesis was predictive modelling. The processof predictive modelling can be divided into two steps, in the first step, a predictivemodel needs to be created and taught, and in the second step, the built predictivemodel can be used to iteratively predict further into the future. To generate a pre-dictive model, the fuzzy engine needs to be defined so that the current state of themachine is modelled to the input side and the predicted next state after a certaintime step is modelled to the output side. This is illustrated in Figure 22 below.

Figure 22. Example predictive fuzzy rule

Page 35: Multivariate fuzzy modelling of time- series data

27

The input side is after the “If” clause, and the output side is after the “then” clause.In the above example, only the bridge position variable is included, but multiplevariables can be included and chained together with “and” clauses. The pseudo-code for learning these fuzzy rules for a predictive model is presented in Figure 23below.

Figure 23. Pseudocode for learning the predictive modelThe core idea for the learning algorithm was adopted from (Vainio et al., 2008). Inthe algorithm first, the three highest terms are searched and saved for both theinput side and the output side. After this, the existing rule base is looped throughto look for rules with matching input. Once a matching rule is found, if the outputside also matches, and the weight is not already more than 100, then the weight ofthis rule is incremented with a specific amount, 0.01 for example. In this learningalgorithm, the weights could potentially rise to thousands, and therefore it makessense to periodically normalize the weights. In the pseudocode, all the weights arenormalized between 0 and 1 every time a rule rises above 100. If the output doesnot match the found rule, then the weight is decremented instead, and in this case,if the weight drops below zero and no longer has any effect on the system, the ruleis removed completely from the rule base. After all the existing rules are checked, anew rule is created if no matching rule was found. This new rule is given a lowstarting weight based on the average activation degrees of the variables. An exam-

FOR each new dataset NFOR each input variable#Save the sensed input stateF.input = FindThreeHighestFuzzyTerms(N.input-values)FOR each output variable#Save the sensed output stateF.output = FindThreeHighestFuzzyTerms(N.output-values)FOR each term saved in F#search ruleblock for matching inputFOR each rule R in ruleblockIF R.input = F.input then#matching rule was foundIF R.output = F.output then#increase the weight of matching ruleIF R.weight < 100 thenINCREMENT R.weight#normalize if rules have risen too highELSE FOR each rule R in ruleblockR.weight = R.weight/Max(R.weights)#Found rule steers output wrong, decrement weightELSE DECREMENT R.weightIF R.weight < 0Remove rule RIF no matching rule R was found from ruleblock thenadd a new rule newR where F.input = F.output andnewR.weight = 0.01 * Avg(F.activation_degrees)

Page 36: Multivariate fuzzy modelling of time- series data

28

ple of a few fuzzy rules learned for predicting a linearly growing signal is shownbelow in Figure 24, in the example, the system was taught what the difference be-tween previous values is in the next time step, for a given current difference value.

Figure 24. A few examples of predictive model fuzzy rulesIn the second step, the built predictive model is used to iteratively predict forward.The pseudocode for this process is shown in Figure 25 below.

Figure 25. Pseudocode for iteratively predicting forward using the formed predictive modelThe first step is to give the latest machine state or the state from which the predic-tion is made, to the fuzzy engine. Based on the learned fuzzy rules in the predictivemodel, the fuzzy engine can then be used to calculate the predicted next value. Thisnew value is saved and then used as the input value in the next iteration of thewhile loop. This process can be repeated as many times as wanted, but for mostsystems, the prediction accuracy is worse, the further it is predicted.Industrial equipment data can be divided into cumulative data, which is not con-strained to a specific range of values, and data that is constrained to specific rang-es. For a predictive fuzzy model to function on a cumulative data signal, it can not

#predict n time steps forwardWHILE i < nIF first data point:#set current machine state as inputFOR all input variablesEngine.getInputVariable.setValue(machine state)#process the fuzzy engine to get output after time stepEngine.process();#save the output values in a listFOR all output variablesList.append(Engine.getOutputVariable.getValue())i++;ELSE: #set previous fuzzy engine output as new inputFOR all input variablesEngine.getInputVariable.setValue(previous output)#process the fuzzy engine to get the next output after time stepEngine.process();#save the output values in a listFOR all output variablesList.append(Engine.getOutputVariable.getValue())i++;#Write all predicted values to a csv file

Page 37: Multivariate fuzzy modelling of time- series data

29

use absolute values as inputs and outputs, because the signal is not limited to arange of values. Therefore, the predictions and iterations need to be modelled withdifferences to the previous value. An example prediction of a cumulative signal,with some noise included, is shown in Figure 26 below. The model was able to findthe underlying trend of the cumulative signal relatively well.

Figure 26. Prediction of a cumulative signalThe second type of signal, such as a periodic signal, a sine wave, for example, isconstrained to a constant range. Therefore, for these signals, the absolute valuescan be used. However, in a sine wave, for any given point in the system, also thedirection of movement needs to be known. Therefore, to predict a sine wave, thesystem was taught with the absolute values and the change from the last knownpoint. It was found that it is sufficient to track if the change was positive or nega-tive. Similarly, a slope could be used in combination with the absolute value. Theprediction results for a sine wave are shown in Figure 27 below.

Page 38: Multivariate fuzzy modelling of time- series data

30

Figure 27. Predicting a smooth sine wave signalThe prediction works relatively well for a smooth signal. However, for a periodicsignal, the error grows the further it is predicted. Similarly, the system works withtwo signals predicted simultaneously, a sine wave, and a cosine wave. This is illus-trated in Figure 28 below.

Figure 28. Predicting two signals: A sine wave and a cosine waveThe system can learn and predict periodic signals relatively accurately. However, ifnoise is introduced in the system, the prediction becomes unreliable. This is be-

Page 39: Multivariate fuzzy modelling of time- series data

31

cause when predicting just a single time step forward, with noise, the system mightlearn to move to the opposite direction for a time step, and then continue again tothe proper direction. See Figure 29 below.

Figure 29. Predicting sine and cosine signals with noise includedThis problem can be solved by predicting multiple time steps forward at once. Thepredictive model can be taught, for example, to predict 1-, 3- and 10-time stepsforward. This increases the robustness of the predictive system, and noise is nolonger a problem. This is illustrated in below Figure 30.

Figure 30. Predicting noisy sine and cosine signals with multiple time step learning

Page 40: Multivariate fuzzy modelling of time- series data

32

3.4 Anomaly detectionThe third targeted use case in this thesis was anomaly detection. Three potentialmethods for detecting anomalies with a fuzzy system were identified in this thesis.Method 1: Based on densityThe third method for detecting anomalies using a fuzzy system is through theweights of the fuzzy rules. For any given machine state, the learned fuzzy rule basecan be searched for matching states. These states can be aggregated together, andtheir weights summed, and the resulting sum can be used to quantify the anomalyof the state. The sum is low if the machine state is rare, and high if it is a commonmachine state. Thus, for every membership function in the fuzzy set, the resultingaggregated weight defines how common it is for a value to exist in the membershipfunction. When a value is put into a membership function with low weights, thevalue can be flagged as an anomaly.Method 2: Based on predictive modellingAnomaly detection can be done similarly to predictive modelling with fuzzy sys-tems. In anomaly detection, instead of predicting future behavior, the current statecan be predicted based on the prediction model that was built based on historicaldata. The predictive model can be given the previous state of the model, and thenthe model predicts the current state of the machine. This prediction can then becompared to the actual current state of the machine, which is also known. The dif-ference between these values determines how anomalous the current state is withrespect to each parameter. Like in predictive models, in this type of anomaly detec-tion system, it is important that the model is adaptive and learns continuously andemphasizes recent behavior over past behavior. This can be implemented by peri-odically reducing the weight of every learned fuzzy rule, in addition to the methodspresented in the predictive learning algorithm earlier. This way older rules are re-duced more times, than the newer ones, thus the newer rules are emphasized inthe predictions.Method 3: Based on expert knowledgeWith a fuzzy system, an anomaly detection model can also be built based on expertknowledge. Fuzzy if-then rules can be implemented from known relationships. Forexample, an engineer might design a fuzzy rule such as the rule shown in Figure 31below.

Figure 31. A fuzzy if-then rule for anomaly detection, that is based on expert knowledge

Page 41: Multivariate fuzzy modelling of time- series data

33

This is by far the simplest system for anomaly detection. With this method, it iseasy to detect and quantify obvious faults in the system. The engineer can designand tune the rules to accommodate a specific need. If for example, the systemneeds to detect overheating in the system, a rule such as the one described abovecan be implemented.

Page 42: Multivariate fuzzy modelling of time- series data

34

4 EvaluationThis chapter presents the evaluation process and results of the fuzzy logic model-ling approach. First, the software tools used to build a prototype of the modellingconcept are discussed in section 4.1. In section 4.2 the process of initializing afuzzy model is covered. Before any data can be fed to the fuzzy system, fuzzy pa-rameters need to be defined, including the membership functions of fuzzy sets foreach variable, and the fuzzy operators to be used. In order to evaluate the devel-oped modelling method, test data needs to be available. Industrial crane Ilmatarwas used as an example industrial machine in this thesis. Section 4.3 describes theprocess of collecting data and the resulting Ilmatar model from the data. In section4.4 the process of extracting a visualization from the Ilmatar model is presentedthrough an example. In section 4.5 the built model is optimized, in the first subsec-tion the effect of varying fuzzy operators is examined and in the second subsection,the effects of reducing data are studied. Finally, in section 4.6 the execution time ofthe fuzzy modelling process is evaluated.4.1 Software toolsTo ease the implementation of fuzzy logic, an existing fuzzy logic library was used.In this thesis FuzzyLite library was chosen, because it is a free and open-sourcelibrary, provides all the needed features, enables accurate results, has competitiveperformance and most importantly, has well-documented source code. FuzzyLitelibrary is programmed using C++ and Java programming languages, namely fuzzy-lite and jfuzzylite. Due to the author’s programming language preference, the Javaversion, jfuzzylite was used. Jfuzzylite was released in March 2017, and it is an ob-ject-oriented, dependency-free library that supports multiple platforms (Windows,Linux, iOS, Ros). For a more complete review of the jfuzzylite software library, thereader can refer to (Rada-Vilela, 2018).FuzzyLite author Rada-Vilela has also released a graphical user interface tool,which was used to simplify the fuzzy control engine design process. With QtFuzzy-Lite 6 the membership functions and other settings can be easily defined and visu-ally tested. The formed engines can easily be imported into Java code using theimport function provided with Jfuzzylite. The graphical user interface consists ofthree major elements: the input variables, output variables and the rule block.Each of these components of the fuzzy engine can be configured in the user inter-face. An overview of the user interface is presented in Figure 32 below. The fuzzyengine can be saved in fuzzylite language format and easily imported into Javacode.

Page 43: Multivariate fuzzy modelling of time- series data

35

Figure 32. QtFuzzyLite 6 graphical user interfaceJava was used as a programming language for the modelling implementation, butsome data preprocessing and visualization was also done in Python. In addition,TimescaleDB was used as a database to save the fuzzy if-then rules. TimescaleDBoffers a PostgreSQL database for time-series data. It is an open-source relationaldatabase and offers full SQL support (Timescale, 2021). An instance of Time-scaleDB was run locally in a docker container. Docker is a platform for developing,shipping and running applications, in this case, it was simply used to run an in-stance of TimescaleDB (Docker, 2021). Finally, SQL Workbench/j was used to que-ry the database in the development phase. SQL Workbench/j provides a more us-er-friendly interface for running SQL queries and graphically view and modify datatables (SQLWorkbench, 2021). The used software tools are all open source, exceptfor QtFuzzylite 6, for which a license was bought to ease the fuzzy engine develop-ment.4.2 Fuzzy model parametrizationThe first step in the implementation of a fuzzy-based method is defining the fuzzyengine. In the engine, the membership functions need to be defined for each mod-elled machine variable. Jfuzzylite supports a wide array of membership functionshapes, but for this thesis, Gaussian membership functions were used. Thesemembership functions are placed so that adjacent MF’s intersect with 0.5 degreesof activation and so that every other membership function intersect with approxi-mately 0.05 degrees of activation, this is illustrated in Figure 33 below.

Page 44: Multivariate fuzzy modelling of time- series data

36

Figure 33. An example set of membership functions for the load variableFor the input variables in a fuzzy system, only the range and number of member-ship functions need to be defined in the engine. For output variables, we also needto determine the aggregation function and defuzzifier. The ranges were attainedfrom the machine specifications and for determining the ideal number of member-ship functions and best aggregation and defuzzifier methods accuracy testing wasconducted. These tests are explained in detail in section 4.5.1 It was found thatgenerally normalized sum should be used for aggregation and the centroid methodfor defuzzification. The number of membership functions that should be usedheavily depends on the range of the variable and the desired accuracy. An examplefuzzylite definition of an output variable is shown below in Figure 34.

Figure 34. Definition of an output variable in the fuzzylite language

Page 45: Multivariate fuzzy modelling of time- series data

37

In addition, settings for the rule block of the engine need to be defined. The ruleblock has four parameters (Rada-Vilela, 2018):

1. Conjunction method, which is required if the rule block antecedents have‘and’ connectives. Determines how the propositions are joined, when the‘and’ connective is used.

2. Disjunction method, which is required if the rule block antecedents have ‘or’connectives. Determines how the propositions are joined, when the ‘or con-nective is used. Irrelevant in this thesis, because ‘or’ connectives were notused in the proposed modelling method.

3. Implication method, which utilizes the activation degree of a rule to modu-late the terms in the consequent of the rule.

4. Activation method, which determines how rules are activated in the ruleblock. The used method, proportional, means that activation degrees for allrules are computed and normalized so that the sum of activation degrees is1.0.

During the development it was noticed that for rule block conjunction, the mini-mum method is the best option, for implication, the algebraic product methodshould be chosen and finally, for activation, the proportional method is the bestoption. The ideal configuration for the output variables and the rule block is col-lected in Table 4 below.

Table 4. Output variable and rule block configuration in fuzzy engineOutput variablesAggregation: Defuzzification:Normalized sum CentroidRule blockConjunction: Disjunction: Implication: Activation:Minimum - Algebraic product Proportional

When using jfuzzylite, the engine can be saved and exported in FuzzyLite Lan-guage (fll) format (Rada-Vilela, n.d.). This format is simpler than state-of-the-artformats, such as Fuzzy Control Language (fcl) or Fuzzy Inference System (fis),while also providing added functionality. Definition of the fll engine format is ex-pressed as shown below in Figure 35.

Page 46: Multivariate fuzzy modelling of time- series data

38

Figure 35. FLL definition (Rada-Vilela, n.d.)From the fll format, the fuzzy engine can easily be imported into java code with theimport functions included in the jfuzzylite software library.4.3 Ilmatar data collectionMachine data for this thesis was collected from the industrial crane Ilmatar, whichis in the Aalto industrial internet campus laboratory hall. Ilmatar is an overheadcrane that is equipped with numerous smart features and a remote monitoringservice with IoT connectivity. Autiosalo (2018) has written an article on the Il-matar platform and it can be referred to for more complete information about Il-matar.Ilmatar uploads limited data of its usage into MindSphere continuously. All thisdata was downloaded from MindSphere in comma-separated values (CSV) fileformat for the timeframe 1.1.2020-1.1.2021. The data set includes 18 variables suchas position data, load data and condition monitoring data. This data is howeverstored with relatively long reading cycles, ranging from 1 second to 60 seconds. Inaddition, the data set does not include all the interesting variables that Ilmatar up-loads.Therefore, to get a more complete data set straight from the PLC, multiple testruns were performed with the Ilmatar crane. Data from the test runs were loggeddirectly through the OPC UA server with UaExpert software. The OPC UA server

Page 47: Multivariate fuzzy modelling of time- series data

39

provides access to 96 variables, which were all saved with 250-millisecond sam-pling frequency. To simulate a crane operating process, varying loads were carriedfrom point A to point B and back, following an L-shape path. The path followed inthe test runs is illustrated in Figure 36 below.

Figure 36. The path followed in the test runs

Eight test cycles were completed, each consisting of 5 trips from point A to B andback. The loads available in the laboratory were 165kg, 500kg and 1000kg, seeFigure 37 below.

Figure 37. Weights from left to right respectively: 165kg, 500kg and 1000kgTwo test cycles were first completed without any weight. The first 5 trips weredone with sway control on, and then 5 with the sway control off. Similar test cycleswere completed with all the available loads. The loads were lowered to the groundat both points, A and B.Data preprocessing for the Ilmatar datasets was done in Python. Python was se-lected as a programming language for all data manipulation tasks in this thesis

Page 48: Multivariate fuzzy modelling of time- series data

40

because it has easy to use tools such as pandas for scientific and numeric compu-ting. Pandas is an open-source data analysis and manipulation tool for the pythonprogramming language (Pandas 2021).UaExpert records each variable independently, and it saves a new value only whenit is different from the last. In addition, it records Ilmatar variables in 10 separateCSV files, and instead of columns for variables, each data row includes only a pri-mary key for indicating which variable value is in question. Firstly, 15 interestingvariables were sorted from the separate CSV files into one combined CSV file. Thenthe data was sorted into combined rows based on their timestamps, and each pa-rameter was given its own column. In the resulting dataset, there were many datarows where most of the variables had NULL values. To remedy this, the datasetneeded to be padded. Each NULL value was replaced with the last known value forthe variable. After this, the remaining NULL values were replaced with the nextknown value. The 15 collected variables are listed in Table 3 below.

Table 5. Variables logged from the OPC UA serverVariable Data type DescriptionSourceTimeStamp datetime Timestamp from the data source

(Ilmatar)Alarm Bool 1 = at least one alarm activeBridgeSpeedFeedback float64 Bridge speed in %TrolleySpeedFeedback float64 Trolley speed in %HoistSpeedFeedback float64 Hoist speed in %BridgePosition float64 Bridge position in metersTrolleyPosition float64 Trolley position in metersHoistPosition float64 Hoist position in metersBridgeMotorTorque float64 Bridge motor torque in %, positive

when motoring and negative whengenerating

TrolleyMotorTorque float64 Trolley motor torque in %, positivewhen motoring and negative whengenerating

HoistMotorTorque float64 Hoist motor torque in %, positivewhen motoring and negative when

Page 49: Multivariate fuzzy modelling of time- series data

41

generatingBridgeRopeAngle float64 Calibrated rope angle in bridge

movement direction in radiansTrolleyRopeAngle float64 Calibrated rope angle in trolley

movement direction in radiansLoadTare float64 Tared load in tonnesCycle Int64 Integer describing which of the 8

cycles the data point belongs to

A few example rows of the resulting CSV data set are shown in Table 6 below. Thecomplete data set consisted of 31303 rows of data with 15 variables each.

Table 6. A few example rows of dataDate Alarm BridgePosi-tion … Hoist-MotorTorque Trolley-RopeAngle2020-09-0315:27:13+00:00 0 16.575 … 0 0.00429468

2020-09-0315:56:21.250000+00:000 22.191 … -0.5 -0.0352965

2020-09-0316:38:09.500000+00:000 22.535 … 2 -0.0916061

2020-09-0316:38:30+00:00 1 22.923 0 0.000906

4.3.1 Fuzzy modelUsing the collected CSV data set, a fuzzy model for Ilmatar could be formed as ex-plained in section 3.1. The data table containing the complete Ilmatar model con-tains 92629 rows of fuzzy data in total, before employing any data reduction tech-niques. Some example fuzzy data rows are shown in Table 7.

Page 50: Multivariate fuzzy modelling of time- series data

42

Table 7. A few example rows of the fuzzy Ilmatar modelTime alarm bridge posi-tion … hoisttorque trolleyropeangle weight2020-04-04T00:34:37.928Z 0 6 … 5 5 0.85

2020-04-04T00:34:37.928Z 0 7 … 4 6 0.22

2020-04-04T00:34:37.928Z 0 5 … 0 4 0.02

4.4 Model query and visualization exampleWith the complete model built, any subset of the model can be queried and visual-ized. With the SQL query ”SELECT * FROM IlmatarModel”, we get the full tablewith all of its contents. To get a visualization of the bridge position with respect tothe trolley position, we can select only those variables and aggregate the matchingrows and sum their weights together with the following SQL query.

𝑺𝑬𝑳𝑬𝑪𝑻 𝑏𝑟𝑖𝑑𝑔𝑒𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑡𝑟𝑜𝑙𝑙𝑒𝑦𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑺𝑼𝑴 𝑤𝑒𝑖𝑔ℎ𝑡 𝑠𝑢𝑚 𝑤𝑒𝑖𝑔ℎ𝑡

𝑭𝑹𝑶𝑴 𝐼𝑙𝑚𝑎𝑡𝑎𝑟𝑀𝑜𝑑𝑒𝑙

𝑮𝑹𝑶𝑼𝑷 𝑩𝒀 𝑏𝑟𝑖𝑑𝑔𝑒𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑡𝑟𝑜𝑙𝑙𝑒𝑦𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛

𝑶𝑹𝑫𝑬𝑹𝑩𝒀 𝑠𝑢𝑚 𝑤𝑒𝑖𝑔ℎ𝑡 𝑫𝑬𝑺𝑪

With the “SELECT” clause, the included variables are defined. In this example, weinclude the bridge position, trolley position and the sum of the aggregated rowsweights. The sum is calculated using the in-built “SUM” function. The “FROM”clause defines the data table that is queried and the “GROUP BY” clause defineshow the rows are aggregated together. In this case, every combination of bridgeand trolley positions is given its individual row. Finally, the “ORDER BY” clauseorders the resulted row from highest “sum_weight” to lowest. The first 8 rows ofthe resulting table are shown in Table 8 below. The number of rows was reduced to44 from 92629.

Page 51: Multivariate fuzzy modelling of time- series data

43

Table 8. First 8 of the aggregated rowsBridgePosition TrolleyPosition Sum_weight6 2 6528.839 9 5445.537 2 2354.269 2 1999.098 2 1864.237 3 1697.49 6 1472.339 8 1269.16

Each of the rows represents a 2-dimensional distribution. In order to draw thesedistributions, the ordinal numbers are matched with the corresponding MF’s inthe fuzzy engine. From the fuzzy engine the standard deviation, location andweight are saved and given to the software used for visualization. In this thesis,Matlab was used for these visualizations. To form the visualization in Matlab, first,the parameters need to be read. After this, the axes for the visualized parametersare defined with equal spacing and all the combinations are saved in new variables.After this, all the rows of parameters read in are looped through. For each set ofparameters, the Gaussian probability density function is calculated and saved in anew variable, also multiplied with the weight included in the parameters. After thisprocess is completed with all the parameter rows, the resulting sum of all of theprobability density functions can be visualized for example as a surf or contourplot. The full example Matlab script used for visualization is provided in Figure 38below.

Page 52: Multivariate fuzzy modelling of time- series data

44

Figure 38. Matlab script to draw 2-dimensional visualizationsThe resulting visualizations are presented in the following Figure 39.

Page 53: Multivariate fuzzy modelling of time- series data

45

Figure 39. Two Matlab visualizations: Contour and surf plot4.5 Model optimizationThe fuzzy model was further optimized by evaluating the impact of using variousfuzzy operators and by testing how the model changes when the amount of data isreduced.4.5.1 Fuzzy operatorsThe choice of conjunction and defuzzification operators, and the number of mem-bership functions for each variable have a considerable impact on the accuracy ofthe saved fuzzy rules. Therefore, in this section, some evaluation results are pre-sented, which show how much error is caused by which options. Conjunction, inthis case, refers to how the varying activation degrees are combined into a singleweight for every fuzzy rule. The evaluated options are the minimum method, aver-age method, and the product method. With the minimum method, simply the low-est activation degree is assigned as the weight for a given fuzzy rule. With the aver-age method, the average of the activation degrees is taken as the weight and withthe product method, the product of the activation degrees is assigned as theweight. The evaluated choices for the defuzzification operator on the other handwere the Mean of Maximum, Centroid and Bisector.In order to evaluate the accuracy of varying combinations with a varying numberof membership functions, 500 data points were each fuzzified into three highestrules with every combination of conjunction and defuzzification operators, andwith 5, 10, 20, 30, 50 and 100 membership functions for each variable. The varia-bles included in this evaluation test were bridge position, trolley position, hoistposition and load tare. The evaluation process is illustrated in Figure 40 below.

Page 54: Multivariate fuzzy modelling of time- series data

46

Figure 40. Accuracy evaluation flow chartFirst, the input raw crisp data points are fuzzified in the fuzzy engine, using a con-junction operator and a defined number of membership functions. The fuzzifieddata are then defuzzified back into crisp data output. The output is then comparedwith the original input values, which gives us the error that the fuzzification pro-cess caused individually for each of the 500 data points. This error data can thenbe plotted as a cumulative distribution function, in order to evaluate it. An exam-ple plot is shown in Figure 41 below. This same process is completed 54 times withevery combination of the conjunction, membership function and defuzzificationoptions.

Figure 41. Example cumulative error distribution functions with the average conjunction, 5membership functions and bisector defuzzificationTo compare the plots with all of the 54 combinations of settings, the 95th percentileerror can be used. From each of the 54 evaluation tests, the 95th percentile errorwas tabulated in a table for each of the variables. A few examples of rows of col-lected results are shown in Table 9 below.

Page 55: Multivariate fuzzy modelling of time- series data

47

Table 9. Evaluation test resultsEvaluated settings Results: 95th percentile error

MF(pcs) Conjunction Defuzzification BridgePosError95% (m)TrolleyPosError 95%(m)

HoistPosError95% (m)LoadTareError95% (t)

Abserrorsum5 Average MoM 3.086 1.091 0.343 0.412 4.9325 Minimum MoM 3.089 1.095 0.358 0.412 4.9545 Product MoM 3.074 1.060 0.339 0.402 4.875

Calculating the absolute sum of the errors for each combination of settings andplotting it with respect to the number of membership functions for each of the con-junction and defuzzification combinations gives the following plot (Figure 42).

Figure 42. Sums of errors with different conjunction and defuzzification operatorsFrom the results, it can be seen that the centroid defuzzification method combinedwith average conjunction gives the smallest error in the fuzzification process. Thistype of plot can also be used to determine how many membership functions shouldbe used for a given variable. If, for example, you wanted to fuzzify bridge positionand keep the error below 0.25m, 30 or more membership functions should beused. For the same error with the trolley position, one would only need 10 mem-bership functions. Figure 43 below shows the results with the ideal fuzzy opera-tors.

Page 56: Multivariate fuzzy modelling of time- series data

48

Figure 43. Errors with centroid defuzzification and average conjunction4.5.2 Data reductionIn order to evaluate the effect of removing the weakest fuzzy rules of the model,first, the KDE visualization of the bridge and trolley positions from a data set col-lected from Ilmatar was calculated. The Kernel density estimate was implementedin Python and the Seaborn library was used (Seaborn, 2021). A scatter plot of theraw data and the respective kernel density estimate is shown in Figure 44 below.

Figure 44. Raw data and its kernel density estimate.This formed visualization was used to evaluate the formed similar density estima-tions from the fuzzy model that was created from the same raw data. First, for thefuzzy model, a visualization was produced with the Python Seaborn library. In or-der to visualize the fuzzy model in Python in a similar way, the fuzzy data rowsneeded to be transformed into a set of data points. To do this, for each of the fuzzydata rows, 50 points were randomly distributed on the membership functions de-fined in the fuzzy data rows, using the respective standard deviations. The samevisualization was completed with all the rules, with 75% of the rules, 50% of therules, 25% of the rules, 10% of the rules and 5% of the rules. The number of rules

Page 57: Multivariate fuzzy modelling of time- series data

49

was reduced so that the rules with the strongest weights remain. The resulting vis-ualizations are shown in Figure 45 below.

Figure 45. Kernel density estimates from the fuzzy modelFrom the results, it can be seen that the resulting kernel density estimation match-es the real KDE visualization with only 25% of the strongest rules remaining. With10% of the rules remaining, the visualization becomes distorted. With the Ilmatardata, 20% of the strongest rules seems to be enough to accurately model the pro-cess. The raw data set contained 3214 rows of data. For this data set, a fuzzy modelwas created similarly as explained in section 3.1. The full fuzzy model contained8165 data rows, but this could be reduced to 1633 rows without losing relevant in-formation, by removing the rules with the weakest weights. Additionally, for thevisualization directly through raw data points, all the 3214 data points were need-ed. For the visualization from the fuzzy model, the rules could be aggregated and

Page 58: Multivariate fuzzy modelling of time- series data

50

only 83 rows of data were needed for the visualization tool. The data sizes wereroughly compared in CSV format, the results are shown in Table 10 below. Similarinformation can be saved as a fuzzy model with approximately 80% less data.

Table 10. File size comparisonData modelling:

Signals (pcs) Rows (pcs) File size in .csv format(kB)

Raw data 15 3214 352Fuzzy model 16 1633 68Visualization:Raw data visual-ization 2 3214 44

Fuzzy visualiza-tion 3 83 2

The number of rows needed in the fuzzy method could be further reduced by em-ploying aggregation already in the modelling phase.4.6 Execution time of fuzzy modellingTo test how the fuzzy modelling method scales with increasing numbers of varia-bles, the execution time of the modelling process was evaluated with a varyingnumber of variables. The full Ilmatar data set of 31303 individual data points werefuzzified and stored in a database as fuzzy if-then rules. The elapsed time was rec-orded, and the process was repeated three times for each number of variables. Thetests were completed on a system with the following specifications:

- Operating system: Windows 10- Processor: Intel® Core™ i5-8400 @2.8Ghz- Installed Random Access Memory (RAM): 16.0 GB

The results are shown in Table 11 below.

Page 59: Multivariate fuzzy modelling of time- series data

51

Table 11. Execution time of fuzzy modelling with varying number of variablesExecution time in milliseconds (ms)

Variables(pcs) 1st run 2nd run 3rd run Average Averageper datapoint

17 467679 482144 461166 470329.7 15.02514 462137 465684 473161 466994 14.91911 464554 466452 473864 468290 14.9608 460981 461411 468287 463559.7 14.8095 461447 457719 462108 460424.7 14.709

The difference in the execution time between 5 and 17 variables is only approxi-mately 2.1%. Increasing the number of variables has very little effect on the execu-tion time of the fuzzification and data storage of data. If the average execution timeis plotted with respect to the number of variables, the elapsed time increases line-arly with increasing dimension (Figure 46).

Figure 46. Elapsed time per data point with respect to the number of variables

Page 60: Multivariate fuzzy modelling of time- series data

52

As the relationship between the number of variables and the execution time seemsto be linear, regression analysis can easily be used to estimate how long it wouldtake to process data points containing even a hundred variables. The coefficient ofdetermination of the regression analysis is 0.872, thus implying a clear correlationbetween the elapsed time per variable and the number of variables. It should benoted, however, that the variation between test runs was significant.

Page 61: Multivariate fuzzy modelling of time- series data

53

5 DiscussionThe developed fuzzy modelling method can build a descriptive fuzzy model fromindustrial machine data with any number of variables and can do so efficiently.Due to the in-built aggregation method and other data reduction methods, usageinformation of Ilmatar could be expressed with a smaller file size through fuzzymodelling. The in-built aggregation is a key advantage of the proposed methodalong with interpretability and capability to handle high dimensional data sets.This chapter discusses the key aspects of the proposed modelling method andcompares the method to related works in sections 5.1 through 5.4. Finally, in sec-tion 5.5 ideas for future work are presented.5.1 Accuracy and interpretabilityIn the proposed fuzzy modelling method, tradeoffs between interpretability andaccuracy must be made (Table 12). Much like in neuro-fuzzy systems, the fewerfuzzy rules and membership functions are needed, the better the interpretability ofthe system, but conversely, the worse the accuracy becomes (Shihabudheen &Pillai, 2018). It can also be concluded that excess variables should be avoided be-cause having unnecessary variables leads to worse interpretability and accuracy inthe fuzzy modelling method.

Table 12. Tradeoffs between interpretability, accuracy, and performanceInterpretability Accuracy

Variables Fewer variables the better Fewer variables the betterNumber of rules Fewer rules the better More rules the betterNumber of Mem-bership functions

Fewer MF’s the better More MF’s the better

The accuracy of the fuzzy modelling method was evaluated to some extent in sec-tion 4.5. It was found that while the fuzzy modelling method does introduce inac-curacy, relatively accurate results can be achieved by choosing an appropriatenumber of membership functions and by keeping the number of variables reason-able. In addition, the modelling method is precise, with a set of input data, the re-sulting fuzzy model is always the same. The proposed fuzzy modelling method islikely best suited in applications where interpretability is valued because it is aclear advantage in fuzzy systems over neural network systems, which are hard fornon-experts to understand (Livingstone et al., 1997).Visualizations are made from the fuzzy model similarly as in KDE. In the fuzzymethod, the membership functions defined by fuzzy rules are drawn with theirrespective weights, whereas in KDE, a kernel is assigned to each point of data andthe sum of these kernels is visualized. The same function shapes can be used both

Page 62: Multivariate fuzzy modelling of time- series data

54

for MF’s in fuzzy systems and for kernels in KDE. The choice of bandwidth in thesevisualizations is critical, for KDE, there are many methods for optimizing thebandwidth and the bandwidth can also be adaptive, as discussed in chapter 2(Yuan et al., 2019). In the proposed fuzzy modelling method, the bandwidth isfixed and needs to be defined in the design phase for each variable. Therefore, theaccuracy of visualizations in the proposed method is a bit more limited than inKDE.5.2 Fuzzy learning compared to related workLearning in the fuzzy modelling method is limited to creating and removing rulesand adjusting their weights. The learning concept is similar to the learning pro-posed by (Vainio et al., 2008). Vainio et al. employed an adaptive fuzzy system tocontrol the Venetian blinds and lighting in a smart home. Their proposed systemlearned the rules by sensing the environment and creating matching rules to con-trol the light level to the user’s preferences. In their method, only the strongestmembership functions were used for creating rules, whereas in the method pro-posed in this thesis, the three highest membership functions are used to create thefuzzy rules. In addition, Vainio et al. created new rules only if a similar rule wasnot already present. If a matching rule was present, its weight was either increasedor decreased based on if it controlled the system desirably or not. In the fuzzymodelling method proposed in this thesis, three rules were created for everymeasurement, to preserve precise time information in the fuzzy model. The ruleswere combined only when the system was aggregated to a longer time frame, orwhen rules were visualized.In the existing literature, the learning capability of fuzzy systems is sometimes im-proved by employing neural networks in either selecting membership functions ordetermining the fuzzy rules (Nauck, 1997). Essentially neuro-fuzzy systems com-bine the learning capabilities of neural networks with the interpretability of fuzzyinference (Shihabudheen & Pillai, 2018).5.3 High-dimensional online modellingIt was found that the fuzzy modelling method scales well with increasing dimen-sions. The processing time increased linearly with increasing dimensions, and thetime increased only by 2.2% from a dataset of 5 variables to a dataset with 17 vari-ables. Based on the regression analysis, it can be said that any number of variablescan realistically be fuzzified and stored as fuzzy if-then rules in real-time. Pro-cessing a single data point with 100 variables takes approximately 17 millisecondsand with 1000 variables the processing time is still only 39 milliseconds. However,if the data is processed in very large batches, the processing time of an update inthe database can become a factor to be considered. The rise in the computationtime is so small with increasing dimensions, that in practice it can be consideredconstant.

Page 63: Multivariate fuzzy modelling of time- series data

55

The online kernel density estimation method in comparison needed 70 millisec-onds for updating a distribution with a new observation averaged over datasetscontaining 4-30 dimensions (Kristan et al., 2011). The improved version of onlinekernel density estimation, xokde++, however, is computationally 3 to 10 timesfaster than oKDE and with diagonal covariances, the speedup is further enhancedto range from 11 to 40 (Ferreira et al., 2016). Xokde++ was evaluated with a farmore powerful computer setup than oKDE: Intel Xeon [email protected] proces-sor, 48GB of RAM and Linux openSUSE 13.1 operating system, while oKDE wasevaluated with a standard 2GHz CPU and 2GB of RAM, thus the comparison is notvery accurate. While the proposed fuzzy modelling method is not directly compa-rable to these methods, the method can be considered competitive in computa-tional performance to these state-of-the-art kernel density estimation methods andis likely faster in very high dimensions.While high dimensionality is not an issue in the modelling of data, in the dataanalysis phase, the curse of dimensionality is an issue much like in kernel densityestimation (Crabbe, 2013). The variable space quickly grows large with increasingdimensions. The number of samples needed grows exponentially with increasingdimension, this is also known as the curse of dimensionality (Crabbe, 2013). In thefuzzy modelling method, even with 10 variables all modelled with 10 membershipfunctions, the possible MF combinations in a fuzzy rule grows to 1010. Therefore, itis highly unlikely for a fuzzy rule to ever repeat and thus, the number of rulesneeded to model the system can’t be reduced without eliminating dimensions. Inaddition, the variable space would be very sparsely populated.5.4 Data reductionIn the fuzzy modelling method initially before employing any data reduction tech-niques, the number of rows needed is approximately tripled. Because for each datapoint, the three highest rules are learned in the system. However, by using the in-built aggregation made possible by the fuzzy modelling, the rows can be aggregat-ed to longer time periods, thus heavily reducing the number of needed rows. Inaddition, it was noted that the weakest rules can be forgotten, without losing rele-vant information, at least with the studied Ilmatar data set. It was found that only20% of the rules were needed to accurately portray the usage of the industrialcrane.Aggregating fuzzy rules together is much like the M-Kernel method in Kernel den-sity estimation used to reduce the number of kernels. In the M-Kernel method, thekernels located next to each other are combined into larger M-kernels, which aregiven a new parameter (weight) to describe how many kernels it represents (Qin etal., 2019). The process is quite similar in aggregating fuzzy rules. Matching rulesare combined, and their respective rules are summed together. Data reductiontechniques are also present in neural networks. For example, autoencoder neuralnetworks learn a compressed model of the modelled system (Wang et al., 2020).

Page 64: Multivariate fuzzy modelling of time- series data

56

5.5 Future workThe proposed modelling method is competitive, when the accuracy of the system isnot the most critical criterion, and when the interpretability of the model is valued.In addition, the method scales well with high dimensional datasets. However, themodelling method was tested with limited data from the industrial crane Ilmatar,which is a research crane and not actively used. The proposed method should betested with data from an industrial machine in active use in a process to verify thepotential of the method in a practical use case. Additionally, while concepts werepresented for fuzzy predictive modelling and anomaly detection, the evaluation ofthese concepts in a real use case was left outside the scope of this thesis.The modelling method could be employed to provide a review service for the usersof the modelled machine. The model could be used to evaluate, for example, howwell the machine is fitted for the customer’s needs and the process. Additionally, apotential use case could be in sales; data from similar processes could be comparedin the sales process. The predictive and anomaly detection capabilities could bebeneficial in, for example, predicting production or detecting changes in the pro-cess. Often, the machine manufacturer is an expert in the processes the machine isused in, this type of fuzzy model could enable new consultative business.In the future, the interpretability of the proposed method could be improved byusing linguistic terms to define the membership functions of variables. This couldallow for an intuitive user interface for the model. With linguistic terms, the usercould be allowed to use natural language to query the model. This would enhancethe usability of the model and the data would be searchable with minimal training.Additionally, the impact of different shapes of membership functions was not stud-ied in this thesis, the model could be further optimized by studying the effect ofvarying membership function shapes. Finally, the learning capabilities of themethod could potentially be improved by employing neural network techniques inthe process of learning the rules that best represent the modelled process or to au-tomatically determine membership functions for variables.

Page 65: Multivariate fuzzy modelling of time- series data

57

6 ConclusionsThis thesis aimed to develop a fuzzy logic-based modelling method for time-seriesdata and examined how it could be applied in several potential use cases. A model-ling concept for building and visualizing a fuzzy model was successfully developedand evaluated with industrial crane data. In addition, concepts for predictive mod-elling and anomaly detection with fuzzy-based technology were presented. Themodelling concept was developed through iterative prototyping, and in order totest the concept with real machine data and further improve it, data was collectedfrom test runs with an industrial crane (Ilmatar).In the proposed modelling method, fuzzy sets are defined for each variable, andthey are used to fuzzify and store the data as fuzzy if-then rules. The system canlearn by adjusting existing rule weights and by adding or removing rules. A keybenefit of the proposed method is that it provides an in-built aggregation method.The formed if-then rules generalize the data, thus allowing matching rules to becombined for any desired time interval. Interpretability is also a key advantage ofthe proposed fuzzy modelling method; the fuzzy logic inference is transparent, andinterpretability can be further increased by employing linguistic terms in fuzzyrules.When the proposed method was evaluated with Ilmatar data, it was found that theaccuracy of the model was heavily dependent on the number of membership func-tions for the variables, thus forcing the designer to make a trade-off between theinterpretability and accuracy of the model. In addition, it was found that with theIlmatar data set, only the strongest 20% rules accurately portrayed the process.Thus, implying that data can be heavily reduced without losing much relevant in-formation. The size of the fuzzy model can be reduced by periodically eliminatingthe weakest rules, or by utilizing the in-built aggregation over a longer period thanthe sampling rate of data collection. Finally, it was found that the execution timegrows only linearly with an increasing number of variables, thus indicating that theproposed method scales well.Future research is needed to test the proposed method with an online data streamand varying membership function shapes. In addition, the interpretability of thesystem could be improved by employing linguistic terms in membership functionsand the learning capabilities could potentially be extended by using neural net-works in conjunction with the proposed fuzzy method. This thesis showed that theproposed modelling method has potential and has advantages over other common-ly used modelling techniques. The next step is to apply the developed fuzzy-modelling method to a real process.

Page 66: Multivariate fuzzy modelling of time- series data

58

ReferencesAlbertos, P., Sala, A., & Olivares, M. (1998). Fuzzy logic controllers. Advantagesand drawbacks. VIII International Congress of Automatic Control, 3, 833–844.Andrzejewski, W., Gramacki, A., & Gramacki, J. (2013). Graphics processing unitsin acceleration of bandwidth selection for kernel density estimation.International Journal of Applied Mathematics and Computer Science, 23(4),869.Autiosalo, J. (2018). Platform for industrial internet and digital twin focusededucation, research, and innovation: Ilmatar the overhead crane. IEEE WorldForum on Internet of Things, WF-IoT 2018 - Proceedings, 2018-Janua, 241–244. https://doi.org/10.1109/WF-IoT.2018.8355217Becker, S., & Plumbley, M. (1996). Unsupervised neural network learningprocedures for feature extraction and classification. Applied Intelligence, 6(3),185–203.Cao, Y., He, H., & Man, H. (2012). SOMKE: Kernel density estimation over datastreams by sequences of self-organizing maps. IEEE Transactions on NeuralNetworks and Learning Systems, 23(8), 1254–1268.https://doi.org/10.1109/TNNLS.2012.2201167Chabni, F., Taleb, R., Benbouali, A., & Bouthiba, M. A. (2016). The application offuzzy control in water tank level using Arduino. International Journal ofAdvanced Computer Science and Applications, 7(4), 261–265.Chen, Y. C. (2017). A tutorial on kernel density estimation and recent advances.Biostatistics and Epidemiology, 1(1), 161–187.https://doi.org/10.1080/24709360.2017.1396742Crabbe, J. J. (2013). Handling the curse of dimensionality in multivariate kerneldensity estimation. Oklahoma State University.Da Silva, I. N., Spatti, D. H., Flauzino, R. A., Liboni, L. H. B., & dos Reis Alves, S. F.(2017). Artificial neural network architectures and training processes. InArtificial neural networks (pp. 21–28). Springer.Diez-Olivan, A., Del Ser, J., Galar, D., & Sierra, B. (2019). Data fusion and machinelearning for industrial prognosis: Trends and perspectives towards Industry4.0. Information Fusion, 50(September 2018), 92–111.https://doi.org/10.1016/j.inffus.2018.10.005Docker. (2021). DockerDocs. Retrieved from https://docs.docker.com/get-started/overview/. Accessed 23.7.2021.Ferreira, J., de Matos, D. M., & Ribeiro, R. (2016). Fast and Extensible Online

Page 67: Multivariate fuzzy modelling of time- series data

59

Multivariate Kernel Density Estimation. 1–17. Retrieved fromhttp://arxiv.org/abs/1606.02608Frantti, T., & Mähönen, P. (2001). Fuzzy logic-based forecasting model.Engineering Applications of Artificial Intelligence, 14(2), 189–201.https://doi.org/https://doi.org/10.1016/S0952-1976(00)00076-2Gramacki, A. (2018). Nonparametric Kernel Density Estimation and ItsComputational Aspects. Cham: Springer International Publishing AG.Gramacki, A., & Gramacki, J. (2017). FFT-based fast bandwidth selector formultivariate kernel density estimation. Computational Statistics and DataAnalysis, 106, 27–45. https://doi.org/10.1016/j.csda.2016.09.001Gramacki, A., Sawerwain, M., & Gramacki, J. (2015). FPGA-based bandwidthselection for kernel density estimation using high level synthesis approach.ArXiv Preprint ArXiv:1505.02100.Guidoum, A. C. (2015). Kernel estimator and bandwidth selection for density andits derivatives. Department of Probabilities and Statistics, University ofScience and Technology, Houari Boumediene, Algeria.Guresen, E., Kayakutlu, G., & Daim, T. U. (2011). Using artificial neural networkmodels in stock market index prediction. Expert Systems with Applications,38(8), 10389–10397.Hamamoto, A. H., Carvalho, L. F., Sampaio, L. D. H., Abrão, T., & Proença, M. L.(2018). Network Anomaly Detection System using Genetic Algorithm andFuzzy Logic. Expert Systems with Applications, 92, 390–402.https://doi.org/https://doi.org/10.1016/j.eswa.2017.09.013Hellmann, M. (2001). Fuzzy logic introduction. Université de Rennes, 1.Hung, M.-C., & Yang, D.-L. (2001). An efficient fuzzy c-means clusteringalgorithm. Proceedings 2001 IEEE International Conference on Data Mining,225–232. IEEE.Itoh, O., Migita, H., Itoh, J., & Irie, Y. (1993). Application of fuzzy control toautomatic crane operation. Proceedings of IECON ’93 - 19th AnnualConference of IEEE Industrial Electronics, 161–164 vol.1.https://doi.org/10.1109/IECON.1993.339088Jain, A. K., Duin, R. P. W., & Mao, J. (2000). Statistical pattern recognition: Areview. IEEE Transactions on Pattern Analysis and Machine Intelligence,22(1), 4–37.Jain, A. K., Mao, J., & Mohiuddin, K. M. (1996). Artificial neural networks: Atutorial. Computer, 29(3), 31–44.Kristan, M., Leonardis, A., & Skočaj, D. (2011). Multivariate online kernel density

Page 68: Multivariate fuzzy modelling of time- series data

60

estimation with Gaussian kernels. Pattern Recognition, 44(10–11), 2630–2642. https://doi.org/10.1016/j.patcog.2011.03.019Kumarage, H., Khalil, I., Tari, Z., & Zomaya, A. (2013). Distributed anomalydetection for industrial wireless sensor networks based on fuzzy datamodelling. Journal of Parallel and Distributed Computing, 73(6), 790–806.https://doi.org/https://doi.org/10.1016/j.jpdc.2013.02.004Larose, D. T. (2015). Data mining and predictive analytics. John Wiley & Sons.Livingstone, D. J., Manallack, D. T., & Tetko, I. V. (1997). Data modelling withneural networks: advantages and limitations. Journal of Computer-AidedMolecular Design, 11(2), 135–142.Łukasik, S. (2007). Parallel computing of kernel density estimates with MPI.International Conference on Computational Science, 726–733. Springer.Nauck, D. (1997). Neuro-fuzzy systems: review and prospects. In Proceedings ofFifth European Congress on Intelligent Techniques and Soft Computing(EUFIT’97), 1044–1053. Citeseer.O’Donovan, P., Leahy, K., Bruton, K., & O’Sullivan, D. T. J. (2015). An industrialbig data pipeline for data-driven analytics maintenance applications in large-scale smart manufacturing facilities. Journal of Big Data, 2(1), 1–26.https://doi.org/10.1186/s40537-015-0034-zPapamakarios, G. (2019). Neural Density Estimation and Likelihood-freeInference. ArXiv, (April).Qin, X., Cao, L., Rundensteiner, E. A., & Madden, S. (2019). Scalable kerneldensity estimation-based local outlier detection over large data streams.Advances in Database Technology - EDBT, 2019-March, 421–432.https://doi.org/10.5441/002/edbt.2019.37Rada-Vilela, J. (n.d.). FuzzyLite Language. Retrieved fromhttps://www.fuzzylite.com/fll-fld/. Accessed 21.5.2021.Rada-Vilela, J. (2018). The FuzzyLite libraries for fuzzy logic control.Ren, L., Sun, Y., Wang, H., & Zhang, L. (2018). Prediction of bearing remaininguseful life with deep convolution neural network. IEEE Access, 6, 13041–13049.Ross, T. J. (2005). Fuzzy logic with engineering applications. John Wiley & Sons.Sadek, R. M., Mohammed, S. A., Abunbehan, A. R. K., Ghattas, A. K. H. A.,Badawi, M. R., Mortaja, M. N., … Abu-Naser, S. S. (2019). Parkinson’s DiseasePrediction Using Artificial Neural Network.Scalabrini Sampaio, G., Vallim Filho, A. R. de A., Santos da Silva, L., & Augusto da

Page 69: Multivariate fuzzy modelling of time- series data

61

Silva, L. (2019). Prediction of motor failure time using an artificial neuralnetwork. Sensors, 19(19), 4342.Seaborn. (2021). Seaborn: statistical data visualization. Retrieved fromhttps://seaborn.pydata.org/index.html. Accessed 23.7.2021.Shihabudheen, K. V, & Pillai, G. N. (2018). Recent advances in neuro-fuzzy system:A survey. Knowledge-Based Systems, 152, 136–162.https://doi.org/https://doi.org/10.1016/j.knosys.2018.04.014Siegel, B. (2020). Industrial anomaly detection: A comparison of unsupervisedneural network architectures. IEEE Sensors Letters, 4(8), 1–4.Silverman, B. W. (2018). Density estimation for statistics and data analysis.Routledge.SQLWorkbench. (2021). SQLWorkbench. Retrieved from https://www.sql-workbench.eu/index.html. Accessed 23.7.2021.Staar, B., Lütjen, M., & Freitag, M. (2019). Anomaly detection with convolutionalneural networks for industrial surface inspection. Procedia CIRP, 79, 484–489.Tang, T.-W., Kuo, W.-H., Lan, J.-H., Ding, C.-F., Hsu, H., & Young, H.-T. (2020).Anomaly detection neural network with dual auto-encoders GAN and itsindustrial inspection applications. Sensors, 20(12), 3336.Tao, F., Sui, F., Liu, A., Qi, Q., Zhang, M., Song, B., … Nee, A. Y. C. (2019). Digitaltwin-driven product design framework. International Journal of ProductionResearch, 57(12), 3935–3953.Tarasov, V., Tan, H., Jarfors, A. E. W., & Seifeddine, S. (2020). Fuzzy logic-basedmodelling of yield strength of as-cast A356 alloy. Neural Computing andApplications, 32(10), 5833–5844. https://doi.org/10.1007/s00521-019-04056-5Timescale. (2021). TimescaleDocs. Retrieved fromhttps://docs.timescale.com/timescaledb/latest/. Accessed 23.7.2021.Vainio, A. M., Valtonen, M., & Vanhala, J. (2008). Proactive fuzzy control andadaptation methods for smart homes. IEEE Intelligent Systems, 23(2), 42–49. https://doi.org/10.1109/MIS.2008.33Wang, C., Wang, B., Liu, H., & Qu, H. (2020). Anomaly Detection for IndustrialControl System Based on Autoencoder Neural Network. WirelessCommunications and Mobile Computing, 2020, 8897926.https://doi.org/10.1155/2020/8897926Węglarczyk, S. (2018). Kernel density estimation and its application. ITM Web ofConferences, 23. EDP Sciences.

Page 70: Multivariate fuzzy modelling of time- series data

62

Wolf, T., Gutmann, B., Weber, H., Ferré-Borrull, J., Bosch, S., & Vallmitjana, S.(1996). Application of fuzzy-rule-based postprocessing to correlation methodsin pattern recognition. Applied Optics, 35(35), 6955.https://doi.org/10.1364/ao.35.006955Yuan, K., Cheng, X., Gui, Z., Li, F., & Wu, H. (2019). A quad-tree-based fast andadaptive Kernel Density Estimation algorithm for heat-map generation.8816. https://doi.org/10.1080/13658816.2018.1555831Zadeh, L. A. (1965). Fuzzy sets. Information and Control, (8), 338–353.Zadeh, L. A. (2008). Is there a need for fuzzy logic? Information Sciences, 178(13),2751–2779. https://doi.org/10.1016/j.ins.2008.02.012Zhang, L., Lin, J., & Karim, R. (2018). Adaptive kernel density-based anomalydetection for nonlinear systems. Knowledge-Based Systems, 139, 50–63.https://doi.org/https://doi.org/10.1016/j.knosys.2017.10.009Zhou, A., Cai, Z., Wei, L., & Qian, W. (2003). M-kernel merging: Towards densityestimation over data streams. Proceedings - 8th International Conference onDatabase Systems for Advanced Applications, DASFAA 2003, 285–292.https://doi.org/10.1109/DASFAA.2003.1192393