A Novel Feature Selection Method Based on Extreme Learning...

Research ArticleA Novel Feature Selection Method Based on Extreme LearningMachine and Fractional-Order Darwinian PSO

Yuan-YuanWang12 Huan Zhang12 Chen-Hui Qiu12 and Shun-Ren Xia 12

1Key Laboratory of Biomedical Engineering of Ministry of Education Zhejiang University Hangzhou China2Zhejiang Provincial Key Laboratory of Cardio-Cerebral Vascular Detection Technology and Medicinal Effectiveness AppraisalHangzhou China

Correspondence should be addressed to Shun-Ren Xia shunren xia163com

Received 26 January 2018 Revised 12 March 2018 Accepted 27 March 2018 Published 6 May 2018

Academic Editor Pedro Antonio Gutierrez

Copyright copy 2018 Yuan-Yuan Wang et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

The paper presents a novel approach for feature selection based on extreme learning machine (ELM) and Fractional-orderDarwinian particle swarm optimization (FODPSO) for regression problems The proposed method constructs a fitness functionby calculating mean square error (MSE) acquired from ELM And the optimal solution of the fitness function is searched by animproved particle swarm optimization FODPSO In order to evaluate the performance of the proposed method comparativeexperiments with other relative methods are conducted in seven public datasets The proposed method obtains six lowest MSEvalues among all the comparative methods Experimental results demonstrate that the proposed method has the superiority ofgetting lower MSE with the same scale of feature subset or requiring smaller scale of feature subset for similar MSE

1 Introduction

In the field of artificial intelligence more and more variablesor features are involved An excessive set of features maylead to lower computation accuracy slower speed andadditional memory occupation Feature selection is used tochoose smaller but sufficient feature subsets to improve orat least not significantly harm the predicting accuracy in themeantime Many studies have been conducted to optimizefeature selections [1ndash4] As far as we know there are twokey points in search-based feature selection process learningalgorithms and optimization algorithms Many techniquescould be involved in this process

Various learning algorithms could be included in thisprocess Classical neural networks such as 119870-nearest neigh-bors algorithm [5] and generalized regression neural network[6] were adopted for their simplicity and generality Moresophisticated algorithms are needed for better predictingcomplicated data Support vector machine (SVM) is one ofthemost popular nonlinear learning algorithms and has beenwidely used in feature selection [7ndash11] Extreme learningmachine (ELM) is one of the most popular single hidden

layer feedforward networks (SLFN) [12] It possesses fastercalculation speed and better generalization ability than tra-ditional artificial learning methods [13 14] which highlightsthe advantages of employing ELM in feature selection asreported in some studies [15ndash17]

In order to better locate optimal feature subsets anefficient global search technique is needed Particle swarmoptimization (PSO) [18 19] is an extremely simple yetfundamentally effective optimization algorithm and has pro-duced encouraging results in feature selection [7 20 21]Xue et al considered feature selection as a multiobjectiveoptimization problem [5] and firstly applied multiobjectivePSO [22 23] in feature selection Some improved PSO suchas hybridization of GA and PSO [9] micro-GA embeddedPSO [24] and fractional-order Darwinian particle swarmoptimization (FODPSO) [10] were introduced and achievedgood performance in feature selection

Training speed and optimization ability are two essentialelements relating to feature selection In this paper we pro-pose a novel feature selectionmethod which employs ELM aslearning algorithm and FODPSO as optimization algorithmThe proposed method is compared with SVM-based feature

HindawiComputational Intelligence and NeuroscienceVolume 2018 Article ID 5078268 8 pageshttpsdoiorg10115520185078268

2 Computational Intelligence and Neuroscience

Xinputlayer

Hhiddenlayer

Youtputlayer

input weightb threshold

G activation function

outputweight

Figure 1 Schematic of extreme learning machine

selection method in terms of training speed of learningalgorithm and compared with traditional PSO-based featureselectionmethod in terms of searching ability of optimizationalgorithm And also the proposed method is compared witha few well-known feature selection methods All the compar-isons are conducted on seven public regression datasets

The remainder of the paper is organized as followsSection 2 presents technical details about the proposedmethod Section 3 conducts the comparative experiments onseven datasets Section 4 makes conclusions of our work

2 Proposed Method

21 Learning Algorithm Extreme Learning Machine (ELM)The schematic of ELM structure is depicted as Figure 1 where120596 denotes the weight connecting the input layer and hiddenlayer and 120573 denotes the weight connecting the hidden layerand output layer 119887 is the threshold of the hidden layer and119866 is the nonlinear piecewise continuous activation functionwhich could be sigmoid RBF Fourier and so forth 119867represents the hidden layer outputmatrix119883 is the input layerand 119884 is the expected output Let 119884 be the real output ELMnetwork is used to choose appropriate parameters to make 119884and 119884 as close to each other as possible

min 10038171003817100381710038171003817119884 minus 11988410038171003817100381710038171003817 = min 1003817100381710038171003817119884 minus 1198671205731003817100381710038171003817 (1)

119867 is called the hidden layer output matrix computed by120596 and 119887 as (2) inwhich denotes the number of hidden layernodes and 119873 denotes the dimension of input 119883

119867 = 119866 (120596119883 + 119887)

=[[[[[

119892 (1205961 sdot 1199091 + 1198871) sdot sdot sdot 119892 (120596 sdot 1199091 + 119887) d

119892 (1205961 sdot 119909119873 + 1198871) sdot sdot sdot 119892 (120596 sdot 119909 + 119887)

]]]]]119873times

(2)

As rigorously proven in [13] for any randomly chosen120596 and 119887 119867 can always be full-rank if activation function 119866

is infinitely differentiable in any intervals As a general ruleone needs to find the appropriate solutions of 120596 119887 120573 to traina regular network However given infinitely differentiableactivation function the continuous output can be approxi-mately obtained through any randomly hidden layer neuronif certain tuning hidden layer neuron could successfullyestimate the output as proven by universal approximationtheory [24 25] Thus in ELM the only parameter that needsto be settled is 120573 120596 119887 can be generated randomly

By minimizing the absolute numerical value in (1) ELMcalculated the analytical solution as follows

120573 = 119867G119884 (3)

where 119867G is the Moore-Penrose pseudoinverse of matrix 119867ELM network tends to reach not only the smallest trainingerror but also the smallest norm of weights which indicatesthat ELM possesses good generalization ability

22 Optimization Algorithm Fractional-Order DarwinianParticle Swarm Optimization (FODPSO) Kiranyaz et al [19]developed a population-inspired metaheuristic algorithmnamed particle swarm optimization (PSO) PSO is an effec-tive evolutionary algorithm which searches for the optimumusing a population of individuals where the population iscalled ldquoswarmrdquo and individuals are called ldquoparticlesrdquo Duringthe evolutionary process each particle updates its movingdirection according to the best position of itself (pbest) andthe best position of the whole population (gbest) formulatedas follows

119881119894 (119905 + 1) = 120596119881119894 (119905) + 11988811199031 (119875119894 minus 119883119894 (119905))+ 11988821199032 (119875119892 minus 119883119894 (119905)) (4)

119883119894 (119905 + 1) = 119883119894 (119905) + 119881119894 (119905 + 1) (5)

where 119883119894 = (1198831119894 1198832119894 119883119863119894 ) is the particle position atgeneration 119894 in the 119863-dimension searching space 119881119894 is themoving velocity 119875119894 denotes the cognition part called pbestand 119875119892 represents the social part called gbest [18] 120596 119888 119903

Computational Intelligence and Neuroscience 3

Initialize parameters for FODPSO

Select features where corresponding gt 0

Calculate fitness value for each particle by ELM

Record pbest and gbest

Update velocity and position for each particle as equation (8) and equation (5)

Decide whether to kill or spawn swarms in DPSO

Select new feature subsetsRepeat FODPSO until reaching the maximum generation

Test the selected features on testing set

Figure 2 Procedure of the proposed methodology

denote the inertia weight learning factors and randomnum-bers respectivelyThe searching process terminates when thenumber of generation reaches the predefined value

Darwinian particle swarm optimization (DPSO) simu-lates natural selection in a collection of many swarms [25]Each swarm individually performs like an ordinary PSOAll the swarms run simultaneously in case of one trap in alocal optimum DPSO algorithm spawns particle or extendsswarm life when the swarm gets better optimum otherwise itdeletes particle or reduces swarm life DPSO has been provento be superior to original PSO in preventing prematureconvergence to local optimum [25]

Fractional-order particle swarm optimization (FOPSO)introduces fractional calculus to model particlesrsquo trajectorywhich demonstrates a potential for controlling the conver-gence of algorithm [26] Velocity function in (4) is rearrangedwith 120596 = 1 namely

119881119894 (119905 + 1) minus 119881119894 (119905) = 11988811199031 (119875119894 minus 119883119894 (119905))+ 11988821199032 (119875119892 minus 119883119894 (119905)) (6)

The left side of (6) can be seen as the discrete version of thederivative of velocity 119863120572[V119905+1] with order 120572 = 1 The discretetime implementation of the GrunwaldndashLetnikov derivative isintroduced and expressed as

119863120572 [V119905] = 1119879120572119903

sum119896=0

(minus1)119896 Γ (120572 + 1) V (119905 minus 119896119879)Γ (119896 + 1) Γ (120572 minus 119896 + 1) (7)

where119879 is the sample period and 119903 is the truncate order Bring(7) into (6) with 119903 = 4 yielding the following

119881119894 (119905 + 1) = 120572119881119894 (119905) + 1205722 119881119894 (119905 minus 1) + 120572 (1 minus 120572)

6 119881119894 (119905 minus 2)

+ 120572 (1 minus 120572) (2 minus 120572)24 119881119894 (119905 minus 3)

+ 11988811199031 (119875119894 minus 119883119894 (119905)) + 11988821199032 (119875119892 minus 119883119894 (119905))

(8)

Employ (8) to update each particlersquos velocity in DPSOgenerating a new algorithm named fractional-order Dar-winian particle swarm optimization (FODPSO) [27 28]Different values of 120572 control the convergence speed of opti-mization processThe literature [27] illustrates that FODPSOoutperforms FOPSO and DPSO in searching global opti-mum

23 Procedure of ELM FODPSO Each feature is assignedwith a parameter 120579within the interval [minus1 1]The 119894th feature isselectedwhen its corresponding 120579119894 is greater than 0 otherwisethe feature is abandoned Assuming the features are in 119873-dimensional space 119873 variables are involved in the FODPSOoptimization process The procedure of ELM FODPSO isdepicted in Figure 2

3 Results and Discussions

31 Comparative Methods Four methods ELM PSO [15]ELM FS [29] SVM FODPSO [10] and RReliefF [30] areused for comparison All of the codes used in this studyare implemented inMATLAB 810 (TheMathWorks Natick


Table 1 Information about datasets and comparative methods A1 A2 A3 A4 and A5 represent ELM PSO ELM FS SVM FODPSORReliefF and ELM FODPSO respectively

Label Dataset Number of instances Number of features Comparative methodsD1 Poland 1370 30 A1 A2 A3 A4 A5D2 Diabetes 442 10 A1 A2 A3 A4 A5D3 Santa Fe Laser 10081 12 A1 A2 A3 A4 A5D4 Anthrokids 1019 53 A1 A2 A3 A4 A5D5 Housing 4177 8 A1 A3 A4 A5D6 Abalone 506 13 A1 A3 A4 A5D7 Cpusmall 8192 12 A1 A3 A4 A5

0 50 100 150 200

04

05

06

07

08

09

1

11

12

13

D1D2D3

D4D5D6

D7

Figure 3 Convergence analysis of seven datasets

MA USA) on a desktop computer with a Pentium eight-coreCPU (4GHz) and 32GB memory

32 Datasets and Parameter Settings Seven public datasetsfor regression problems are adopted including four men-tioned in [29] and additional three in [31] where ELM FSis used as a comparative method Information about sevendatasets and themethods involved in comparisons are shownin Table 1 Only the datasets adopted in [29] can be tested bytheir feature selection paths thus D5 D6 and D7 in Table 1are tested by four methods except ELM FS

Each dataset is split into training set and testing set70 of the total instances are used as training sets if notparticularly specified and the rest are testing sets Duringthe training process each particle has a series of featurecoefficients 120579 isin [minus1 1] Hidden layer neurons number is setas 150 and kernel type as sigmoid 10-fold cross-validation isperformed to gain relatively stable MSE

For FODPSO searching process parameters are setas follows 120572 is formulated by (9) where 119872 denotes the

ELM-PSOELM-FSSVM-FODPSO

rReliefFELM-FODPSO

0

02

04

06

08

1

12

14

mea

n sq

uare

erro

r

5 10 15 20 25 300number of features

Figure 4 The evaluation results of Dataset 1

maximal iterations and 119872 equals 200 Larger 120572 increases theconvergence speed in the early stage of iterations Numbersof swarms and populations are set to 5 and 10 respectively1198881 1198882 in (8) are both initialized by 2 We run FODPSOfor 30 independent times to gain relatively stable resultsParameters for ELM PSO ELM FS SVM FODPSO andRReliefF are set based on former literatures

120572 = 08 minus 04 times 119905119872 119905 = 0 1 119872 (9)

Convergence rate is analyzed to ensure the algorithmcon-vergence within 200 generations The median of the fitnessevolution of the best global particle is taken for convergenceanalysis depicted in Figure 3 To observe convergence ofseven datasets in one figure more clearly the normalizedfitness value is adopted in Figure 3 calculated as follows

119891Normolized = MSEselected feature119904MSEall features

(10)

33 Comparative Experiments In the testing set MSEacquired by ELM is utilized to evaluate performances of


Table 2 Running time of SVM and ELM on seven datasets

Running time (s) D1 D2 D3 D4 D5 D6 D7SVM 0021 0002 0612 0016 0093 0045 0245ELM 0018 0009 0056 0013 0027 0010 0051

Table 3 MinimumMSE values and the corresponding number of selected features

Dataset MethodELM PSO ELM FS SVM FODPSO RReliefF ELM FODPSO all features

MSE N featureD1 00983|8 00806|27 00804|14 00804|26 00791|11 00820|30D2 02844|9 02003|1 02919|9 02003|1 01982|1 03172|10D3 00099|5 00160|11 00106|7 00108|6 00098|5 00171|12D4 00157|8 00157|9 00253|20 00238|18 00156|7 00437|53D5 00838|8 mdash 00853|7 00838|8 00841|6 00838|8D6 00827|10 mdash 00981|7 01292|1 00819|9 01502|13D7 00339|9 mdash 00343|6 00355|12 00336|8 00355|12

0

1

2

3

4

5

6

7

mea

n sq

uare

erro

r

2 3 4 5 6 7 8 9 101number of features


rReliefFELM-FODPSO


four methods For all the methods the minimal MSE isrecorded if more than one feature subset exists in the samefeature scale MSEs of D1ndashD7 are depicted in Figures 4ndash10respectively The 119909-axis represents increasing number ofselected features while the 119910-axis represents the minimumMSE value calculated with features selected by differentmethods at each scale Feature selection aims at selectingsmaller feature subsets to obtain similar or lower MSEThusin Figures 4ndash10 the closer one curve gets to the left corner ofcoordinate the better one method performs

ELM FODPSO and SVM FODPSO adopt the same opti-mization algorithm yet employ ELM and SVM as learningalgorithm respectively For each dataset training time ofELM and SVM is obtained by randomly running them 30times in two methods the averaged training time of ELM


rReliefFELM-FODPSO


0

002

004

006

008

01

012

014

016

018m

ean

squa

re er

ror


and SVM in seven datasets is recorded in Table 2 It isobserved that ELM acquires faster training speed in six ofseven datasets Compared with SVM single hidden layer andanalytical approach make ELM more efficient Faster speedof ELM highlights its use in feature selection due to manyiterative actions involved in FODPSO

ELM FODPSO ELM PSO and ELM FS adopt the samelearning algorithm yet employ FODPSO PSO and GradientDescent Search as optimization algorithms respectively ForD1 D2 and D3 ELM FODPSO and ELM PSO performbetter than ELM FS the former two acquire lower MSE thanELM FS under similar feature scales For D4 three methodsget comparable performance

Table 3 shows the minimum MSE values acquired byfive methods and the corresponding numbers of selected



rReliefFELM-FODPSO

0

02

04

06

08

1

12

14

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

mea

n sq

uare

erro

r

2 3 4 5 6 7 81number of features


features separated by a vertical bar The last column repre-sents the MSE values calculated by all features and the totalnumber of features The lowest MSE values on each datasetare labeled as bold Among all datasets ELM FODPSOobtains six lowest MSE values ELM PSO obtains twoand RReliefF obtains one For D3 ELM FODPSO andELM PSO get comparable MSE values by the same fea-ture subset therefore 00099 and 00098 are both labeledas lowest MSE values For D5 ELM PSO and RReliefFget the lowest MSE 00838 using all the 8 features andELM FODPSO gets a similar MSE 00841 with only 6 fea-tures

ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

45

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO


0

05

1

15

2

25

3

35

mea

n sq

uare

erro

r


4 Conclusions

Feature selection techniques have been widely studied andcommonly used in machine learning The proposed methodcontains two steps constructing fitness functions by ELMand seeking the optimal solutions of fitness functions byFODPSO ELM is a simple yet effective single hidden layerneural network which is suitable for feature selection dueto its gratifying computational efficiency FODPSO is anintelligent optimization algorithm which owns good globalsearch ability

The proposed method is evaluated on seven regressiondatasets and it achieves better performance than othercomparativemethods on six datasetsWemay concentrate on


exploring ELM FODPSO in various situations of regressionand classification applications in the future

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work is supported by National Key Research andDevelopment Program of China (no 2016YFC1306600)

References

[1] T Lindeberg ldquoFeature detectionwith automatic scale selectionrdquoInternational Journal of Computer Vision vol 30 no 2 pp 79ndash116 1998

[2] M Dash and H Liu ldquoFeature selection for classificationrdquoIntelligent Data Analysis vol 1 no 1ndash4 pp 131ndash156 1997

[3] I Iguyon and A Elisseeff ldquoAn introduction to variable andfeature selectionrdquo Journal of Machine Learning Research vol 3pp 1157ndash1182 2003

[4] A Jovic K Brkic and N Bogunovic ldquoA review of featureselection methods with applicationsrdquo in Proceedings of the 38thInternational Convention on Information and CommunicationTechnology Electronics and Microelectronics MIPRO 2015 pp1200ndash1205 Croatia May 2015

[5] B Xue M Zhang and W N Browne ldquoParticle swarm opti-mization for feature selection in classification a multi-objectiveapproachrdquo IEEE Transactions on Cybernetics vol 43 no 6 pp1656ndash1671 2013

[6] I A Gheyas and L S Smith ldquoFeature subset selection in largedimensionality domainsrdquo Pattern Recognition vol 43 no 1 pp5ndash13 2010

[7] X-W Chen X Zeng and D van Alphen ldquoMulti-class featureselection for texture classificationrdquo Pattern Recognition Lettersvol 27 no 14 pp 1685ndash1691 2006

[8] S-W Lin K-C Ying S-C Chen and Z-J Lee ldquoParticle swarmoptimization for parameter determination and feature selectionof support vector machinesrdquo Expert Systems with Applicationsvol 35 no 4 pp 1817ndash1824 2008

[9] P Ghamisi and J A Benediktsson ldquoFeature selection basedon hybridization of genetic algorithm and particle swarmoptimizationrdquo IEEE Geoscience and Remote Sensing Letters vol12 no 2 pp 309ndash313 2015

[10] P Ghamisi M S Couceiro and J A Benediktsson ldquoA novelfeature selection approach based on FODPSO and SVMrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 5pp 2935ndash2947 2015

[11] Q Li H Chen H Huang et al ldquoAn enhanced grey wolfoptimization based feature selection wrapped kernel extremelearning machine for medical diagnosisrdquo Computational andMathematicalMethods inMedicine vol 2017 Article ID 951274115 pages 2017

[12] G-B Huang and H A Babri ldquoUpper bounds on the numberof hidden neurons in feedforward networks with arbitrarybounded nonlinear activation functionsrdquo IEEE Transactions onNeural Networks and Learning Systems vol 9 no 1 pp 224ndash2291998

[13] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[14] G-B Huang ldquoWhat are extreme learning machines Fillingthe gap between Frank Rosenblattrsquos dream and John VonNeumannrsquos puzzlerdquo Cognitive Computation vol 7 no 3 pp263ndash278 2015

[15] S Saraswathi S Sundaram N Sundararajan M Zimmer-mann and M Nilsen-Hamilton ldquoICGA-PSO-ELM approachfor accuratemulticlass cancer classification resulting in reducedgene sets in which genes encoding secreted proteins are highlyrepresentedrdquo IEEE Transactions on Computational Biology andBioinformatics vol 8 no 2 pp 452ndash463 2011

[16] D Chyzhyk A Savio and M Grana ldquoEvolutionary ELMwrapper feature selection for Alzheimerrsquos disease CAD onanatomical brain MRIrdquo Neurocomputing vol 128 pp 73ndash802014

[17] R Ahila V Sadasivam and K Manimala ldquoAn integrated PSOfor parameter determination and feature selection of ELM andits application in classification of power system disturbancesrdquoApplied Soft Computing vol 32 pp 23ndash37 2015

[18] Y H Shi and R C Eberhart ldquoA modified particle swarmoptimizerrdquo in Proceedings of the IEEE International Conferenceon EvolutionaryComputation (ICEC rsquo98) pp 69ndash73 AnchorageAlaska USA May 1998

[19] S Kiranyaz T Ince and M Gabbouj ldquoMulti-dimensional Par-ticle SwarmOptimizationrdquo inMultidimensional Particle SwarmOptimization for Machine Learning and Pattern Recognitionvol 15 of Adaptation Learning and Optimization pp 83ndash99Springer Berlin Heidelberg Berlin Heidelberg 2014

[20] L Shang Z Zhou and X Liu ldquoParticle swarm optimization-based feature selection in sentiment classificationrdquo Soft Com-puting vol 20 no 10 pp 3821ndash3834 2016

[21] H B Nguyen B Xue I Liu P Andreae and M ZhangldquoNewmechanism for archivemaintenance in PSO-basedmulti-objective feature selectionrdquo Soft Computing vol 20 no 10 pp3927ndash3946 2016

[22] C A Coello Coello andM S Lechuga ldquoMOPSO a proposal formultiple objective particle swarm optimizationrdquo in Proceedingsof the Congress on Evolutionary Computation (CEC rsquo02) pp1051ndash1056 May 2002

[23] J J Durillo J Garcıa-Nieto A J Nebro C A Coello F Lunaand E Alba ldquoMulti-objective particle swarm optimizers anexperimental comparisonrdquo in Evolutionary Multi-CriterionOptimization vol 5467 of Lecture Notes in Computer Sciencepp 495ndash509 Springer Berlin Germany 2009

[24] K Mistry L Zhang S C Neoh C P Lim and B FieldingldquoA Micro-GA Embedded PSO Feature Selection Approach toIntelligent Facial Emotion Recognitionrdquo IEEE Transactions onCybernetics vol 47 no 6 pp 1496ndash1509 2017

[25] J Tillett R Rao and F Sahin ldquoCluster-head identification inad hoc sensor networks using particle swarm optimizationrdquo inProceedings of the ICPWC 2002 - IEEE International Conferenceon Personal Wireless Communications pp 201ndash205 New DelhiIndia

[26] E J S Pires J A T MacHado P B de Moura Oliveira JB Cunha and L Mendes ldquoParticle swarm optimization withfractional-order velocityrdquo Nonlinear Dynamics vol 61 no 1-2pp 295ndash301 2010

[27] M S Couceiro R P Rocha N M F Ferreira and J A TMachado ldquoIntroducing the fractional-order Darwinian PSOrdquo


Signal Image and Video Processing vol 6 no 3 pp 343ndash3502012

[28] M S Couceiro F M L Martins R P Rocha and N M FFerreira ldquoMechanism and Convergence Analysis of a Multi-Robot Swarm Approach Based on Natural Selectionrdquo Journal ofIntelligent amp Robotic Systems vol 76 no 2 pp 353ndash381 2014

[29] F Benoıt M van Heeswijk Y Miche M Verleysen and ALendasse ldquoFeature selection for nonlinear models with extremelearning machinesrdquo Neurocomputing vol 102 pp 111ndash124 2013

[30] M Robnik-Sikonja and I Kononenko ldquoTheoretical and empir-ical analysis of ReliefF and RReliefFrdquoMachine Learning vol 53no 1-2 pp 23ndash69 2003

[31] L Bravi V Piccialli and M Sciandrone ldquoAn optimization-based method for feature ranking in nonlinear regressionproblemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 28 no 4 pp 1005ndash1010 2016

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems


Volume 2018


ReconfigurableComputing



Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018


Civil EngineeringAdvances in


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia


Biomedical Imaging



Engineering Mathematics


RoboticsJournal of



Computational Intelligence and Neuroscience


Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018


Human-ComputerInteraction

Advances in


Scientic Programming

Submit your manuscripts atwwwhindawicom


Xinputlayer

Hhiddenlayer

Youtputlayer

input weightb threshold

G activation function

outputweight

Figure 1 Schematic of extreme learning machine

selection method in terms of training speed of learningalgorithm and compared with traditional PSO-based featureselectionmethod in terms of searching ability of optimizationalgorithm And also the proposed method is compared witha few well-known feature selection methods All the compar-isons are conducted on seven public regression datasets

The remainder of the paper is organized as followsSection 2 presents technical details about the proposedmethod Section 3 conducts the comparative experiments onseven datasets Section 4 makes conclusions of our work

2 Proposed Method

21 Learning Algorithm Extreme Learning Machine (ELM)The schematic of ELM structure is depicted as Figure 1 where120596 denotes the weight connecting the input layer and hiddenlayer and 120573 denotes the weight connecting the hidden layerand output layer 119887 is the threshold of the hidden layer and119866 is the nonlinear piecewise continuous activation functionwhich could be sigmoid RBF Fourier and so forth 119867represents the hidden layer outputmatrix119883 is the input layerand 119884 is the expected output Let 119884 be the real output ELMnetwork is used to choose appropriate parameters to make 119884and 119884 as close to each other as possible

min 10038171003817100381710038171003817119884 minus 11988410038171003817100381710038171003817 = min 1003817100381710038171003817119884 minus 1198671205731003817100381710038171003817 (1)

119867 is called the hidden layer output matrix computed by120596 and 119887 as (2) inwhich denotes the number of hidden layernodes and 119873 denotes the dimension of input 119883

119867 = 119866 (120596119883 + 119887)

=[[[[[

119892 (1205961 sdot 1199091 + 1198871) sdot sdot sdot 119892 (120596 sdot 1199091 + 119887) d

119892 (1205961 sdot 119909119873 + 1198871) sdot sdot sdot 119892 (120596 sdot 119909 + 119887)

]]]]]119873times

(2)

As rigorously proven in [13] for any randomly chosen120596 and 119887 119867 can always be full-rank if activation function 119866

is infinitely differentiable in any intervals As a general ruleone needs to find the appropriate solutions of 120596 119887 120573 to traina regular network However given infinitely differentiableactivation function the continuous output can be approxi-mately obtained through any randomly hidden layer neuronif certain tuning hidden layer neuron could successfullyestimate the output as proven by universal approximationtheory [24 25] Thus in ELM the only parameter that needsto be settled is 120573 120596 119887 can be generated randomly

By minimizing the absolute numerical value in (1) ELMcalculated the analytical solution as follows

120573 = 119867G119884 (3)

where 119867G is the Moore-Penrose pseudoinverse of matrix 119867ELM network tends to reach not only the smallest trainingerror but also the smallest norm of weights which indicatesthat ELM possesses good generalization ability

22 Optimization Algorithm Fractional-Order DarwinianParticle Swarm Optimization (FODPSO) Kiranyaz et al [19]developed a population-inspired metaheuristic algorithmnamed particle swarm optimization (PSO) PSO is an effec-tive evolutionary algorithm which searches for the optimumusing a population of individuals where the population iscalled ldquoswarmrdquo and individuals are called ldquoparticlesrdquo Duringthe evolutionary process each particle updates its movingdirection according to the best position of itself (pbest) andthe best position of the whole population (gbest) formulatedas follows

119881119894 (119905 + 1) = 120596119881119894 (119905) + 11988811199031 (119875119894 minus 119883119894 (119905))+ 11988821199032 (119875119892 minus 119883119894 (119905)) (4)

119883119894 (119905 + 1) = 119883119894 (119905) + 119881119894 (119905 + 1) (5)

where 119883119894 = (1198831119894 1198832119894 119883119863119894 ) is the particle position atgeneration 119894 in the 119863-dimension searching space 119881119894 is themoving velocity 119875119894 denotes the cognition part called pbestand 119875119892 represents the social part called gbest [18] 120596 119888 119903
















119863120572 [V119905] = 1119879120572119903

sum119896=0



119881119894 (119905 + 1) = 120572119881119894 (119905) + 1205722 119881119894 (119905 minus 1) + 120572 (1 minus 120572)

6 119881119894 (119905 minus 2)


+ 11988811199031 (119875119894 minus 119883119894 (119905)) + 11988821199032 (119875119892 minus 119883119894 (119905))

(8)








0 50 100 150 200

04

05

06

07

08

09

1

11

12

13

D1D2D3

D4D5D6

D7







rReliefFELM-FODPSO

0

02

04

06

08

1

12

14

mea

n sq

uare

erro

r




120572 = 08 minus 04 times 119905119872 119905 = 0 1 119872 (9)



(10)








0

1

2

3

4

5

6

7

mea

n sq

uare

erro

r



rReliefFELM-FODPSO





rReliefFELM-FODPSO


0

002

004

006

008

01

012

014

016

018m

ean

squa

re er

ror







rReliefFELM-FODPSO

0

02

04

06

08

1

12

14

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

mea

n sq

uare

erro

r




ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

45

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO


0

05

1

15

2

25

3

35

mea

n sq

uare

erro

r


4 Conclusions







Acknowledgments


References







































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in



















119863120572 [V119905] = 1119879120572119903

sum119896=0



119881119894 (119905 + 1) = 120572119881119894 (119905) + 1205722 119881119894 (119905 minus 1) + 120572 (1 minus 120572)

6 119881119894 (119905 minus 2)


+ 11988811199031 (119875119894 minus 119883119894 (119905)) + 11988821199032 (119875119892 minus 119883119894 (119905))

(8)








0 50 100 150 200

04

05

06

07

08

09

1

11

12

13

D1D2D3

D4D5D6

D7







rReliefFELM-FODPSO

0

02

04

06

08

1

12

14

mea

n sq

uare

erro

r




120572 = 08 minus 04 times 119905119872 119905 = 0 1 119872 (9)



(10)








0

1

2

3

4

5

6

7

mea

n sq

uare

erro

r



rReliefFELM-FODPSO





rReliefFELM-FODPSO


0

002

004

006

008

01

012

014

016

018m

ean

squa

re er

ror







rReliefFELM-FODPSO

0

02

04

06

08

1

12

14

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

mea

n sq

uare

erro

r




ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

45

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO


0

05

1

15

2

25

3

35

mea

n sq

uare

erro

r


4 Conclusions







Acknowledgments


References







































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in







0 50 100 150 200

04

05

06

07

08

09

1

11

12

13

D1D2D3

D4D5D6

D7







rReliefFELM-FODPSO

0

02

04

06

08

1

12

14

mea

n sq

uare

erro

r




120572 = 08 minus 04 times 119905119872 119905 = 0 1 119872 (9)



(10)








0

1

2

3

4

5

6

7

mea

n sq

uare

erro

r



rReliefFELM-FODPSO





rReliefFELM-FODPSO


0

002

004

006

008

01

012

014

016

018m

ean

squa

re er

ror







rReliefFELM-FODPSO

0

02

04

06

08

1

12

14

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

mea

n sq

uare

erro

r




ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

45

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO


0

05

1

15

2

25

3

35

mea

n sq

uare

erro

r


4 Conclusions







Acknowledgments


References







































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in










0

1

2

3

4

5

6

7

mea

n sq

uare

erro

r



rReliefFELM-FODPSO





rReliefFELM-FODPSO


0

002

004

006

008

01

012

014

016

018m

ean

squa

re er

ror







rReliefFELM-FODPSO

0

02

04

06

08

1

12

14

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

mea

n sq

uare

erro

r




ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

45

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO


0

05

1

15

2

25

3

35

mea

n sq

uare

erro

r


4 Conclusions







Acknowledgments


References







































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in






rReliefFELM-FODPSO

0

02

04

06

08

1

12

14

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

mea

n sq

uare

erro

r




ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO

0

05

1

15

2

25

3

35

4

45

mea

n sq

uare

erro

r



ELM-PSOSVM-FODPSO

rReliefFELM-FODPSO


0

05

1

15

2

25

3

35

mea

n sq

uare

erro

r


4 Conclusions







Acknowledgments


References







































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in








Acknowledgments


References







































Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in















Advances in

FuzzySystems


Volume 2018













Journal of

Journal of



Hindawi


Advances in

Multimedia


Biomedical Imaging





RoboticsJournal of









Volume 2018



Advances in




A Novel Feature Selection Method Based on Extreme Learning...

Documents

Transcript of A Novel Feature Selection Method Based on Extreme Learning...