Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

30
Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar Mohammad Shorif Uddin Jagdish Chand Bansal   Editors Proceedings of International Joint Conference on Computational Intelligence IJCCI 2019

Transcript of Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

Page 1: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

Algorithms for Intelligent SystemsSeries Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar

Mohammad Shorif UddinJagdish Chand Bansal   Editors

Proceedings of International Joint Conference on Computational IntelligenceIJCCI 2019

Page 2: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

Algorithms for Intelligent Systems

Series Editors

Jagdish Chand Bansal, Department of Mathematics, South Asian University,New Delhi, Delhi, India

Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee,Roorkee, Uttarakhand, India

Atulya K. Nagar, Department of Mathematics and Computer Science,Liverpool Hope University, Liverpool, UK

Page 3: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

This book series publishes research on the analysis and development of algorithmsfor intelligent systems with their applications to various real world problems. Itcovers research related to autonomous agents, multi-agent systems, behavioralmodeling, reinforcement learning, game theory, mechanism design, machinelearning, meta-heuristic search, optimization, planning and scheduling, artificialneural networks, evolutionary computation, swarm intelligence and other algo-rithms for intelligent systems.

The book series includes recent advancements, modification and applicationsof the artificial neural networks, evolutionary computation, swarm intelligence,artificial immune systems, fuzzy system, autonomous and multi agent systems,machine learning and other intelligent systems related areas. The material will bebeneficial for the graduate students, post-graduate students as well as theresearchers who want a broader view of advances in algorithms for intelligentsystems. The contents will also be useful to the researchers from other fields whohave no knowledge of the power of intelligent systems, e.g. the researchers in thefield of bioinformatics, biochemists, mechanical and chemical engineers,economists, musicians and medical practitioners.

The series publishes monographs, edited volumes, advanced textbooks andselected proceedings.

More information about this series at http://www.springer.com/series/16171

Page 4: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

Mohammad Shorif Uddin • Jagdish Chand BansalEditors

Proceedings of InternationalJoint Conferenceon ComputationalIntelligenceIJCCI 2019

123

Page 5: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

EditorsMohammad Shorif UddinDepartment of Computer Scienceand EngineeringJahangirnagar UniversityDhaka, Bangladesh

Jagdish Chand BansalDepartment of MathematicsSouth Asian UniversityNew Delhi, India

ISSN 2524-7565 ISSN 2524-7573 (electronic)Algorithms for Intelligent SystemsISBN 978-981-15-3606-9 ISBN 978-981-15-3607-6 (eBook)https://doi.org/10.1007/978-981-15-3607-6

© Springer Nature Singapore Pte Ltd. 2020This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or partof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmissionor information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exempt fromthe relevant protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information in thisbook are believed to be true and accurate at the date of publication. Neither the publisher nor theauthors or the editors give a warranty, expressed or implied, with respect to the material containedherein or for any errors or omissions that may have been made. The publisher remains neutral with regardto jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,Singapore

Page 6: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

Preface

This book contains the high-quality research papers as the proceedings of theInternational Joint Conference on Computational Intelligence (IJCCI 2019). IJCCI2019 has been jointly organized by the University of Liberal Arts Bangladesh(ULAB), Bangladesh, Jahangirnagar University (JU), Bangladesh, and South AsianUniversity (SAU), India. It was held on October 25–26, 2019, at ULAB, Dhaka,Bangladesh. The conference was conceived as a platform for disseminating andexchanging ideas, concepts and results of the researchers from academia andindustry to develop a comprehensive understanding of the challenges of theadvancements of intelligence in computational viewpoints. This book will help instrengthening a congenial and nice networking between academia and industry. Theconference focused on collective intelligence, soft computing, optimization, cloudcomputing, machine learning, intelligent software, robotics, data science, datasecurity, big data analytics, signal and natural language processing. This conferenceis an update of the first two conferences: (1) International Workshop onComputational Intelligence (IWCI 2016) that was held on December 12–13, 2016,at JU, Dhaka, Bangladesh, in collaboration with SAU, India, under the technicalco-sponsorship of IEEE Bangladesh Section; and (2) International Joint Conferenceon Computational Intelligence (IJCCI 2018) that was held on December 14–15,2018, at Daffodil International University (DIU) in collaboration with JU,Bangladesh, and SAU, India. All accepted and presented papers of IWCI 2016 arein IEEE Xplore Digital Library and IJCCI 2018 are in Springer Nature Book SeriesAlgorithms for Intelligent Systems (AIS). We have tried our best to enrich thequality of the IJCCI 2019 through stringent and careful peer review process. IJCCI2019 received 143 papers from 18 different countries, and 54 papers were acceptedfor presentation. However, three papers were excluded after a careful editorialreview, and the final proceedings contained 51 papers.

v

Page 7: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

In fact, this book presents the novel contributions in the areas of computationalintelligence, and it serves as a reference material for advance research.

Dhaka, Bangladesh Mohammad Shorif UddinNew Delhi, India Jagdish Chand Bansal

Acknowledgements Our sincere appreciation goes to everyone involved with their tireless effortsin making IJCCI 2019 successful. The honorable Vice-Chancellor of ULAB, Bangladesh,Vice-Chancellor of JU, Bangladesh, and President of SAU, India, deserve our special gratitude fortheir considerations to organize this conference. We are grateful to the organizing committee, thetechnical program committee, authors, reviewers, volunteers and participants for their utmostdedication in making this conference fruitful. We are thankful to all sponsors includingICT-Division and the University Grants Commission, Bangladesh, for their generous support.

vi Preface

Page 8: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

Contents

1 CerebLearn: Biologically Motivated Learning Rule for ArtificialFeedforward Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Sumaiya Tabassum Nimi, Md. Adnan Arefeen,and Muhammad Abdullah Adnan

2 Conceptual Content in Deep Convolutional Neural Networks:An Analysis into Multi-faceted Properties of Neurons . . . . . . . . . . . 19Zahra Sadeghi

3 Toward Lexicon-Free Bangla Automatic Speech RecognitionSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Md. Mehadi Hasan, Md. Ariful Islam, Shafkat Kibria,and Muhammad Shahidur Rahman

4 Automated Tax Return Verification with BlockchainTechnology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Safayet Hossain, Showrav Saha, Jannatul Ferdous Akhi,and Tanjina Helaly

5 Voice-Enabled Intelligent IDE in Cloud . . . . . . . . . . . . . . . . . . . . . 57Sabila Nawshin, Sarah Ahsin, Mohmmad Ali, Salekul Islam,and Swakkhar Shatabda

6 Recognition and Classification of Fruit Diseases Basedon the Decomposition of Color Wavelet and Higher-OrderStatistical Texture Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71A. S. M. Shafi and Mohammad Motiur Rahman

7 Diabetes Mellitus Risk Prediction Using Artificial NeuralNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85M. Raihan, Nasif Alvi, Md. Tanvir Islam, Fahmida Farzana,and Md. Mahadi Hassan

vii

Page 9: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

8 IoT-Based Smart Agriculture Monitoring Systemwith Double-Tier Data Storage Facility . . . . . . . . . . . . . . . . . . . . . . 99MD Safayet Ahmad and Akhlak Uz Zaman

9 Bangla Phoneme Recognition: Probabilistic Approach . . . . . . . . . . 111Md. Shafiul Alam Chowdhury and Md. Farukuzzaman Khan

10 EEG Motor Signal Analysis-Based Enhanced Motor ActivityRecognition Using Optimal De-noising Algorithm . . . . . . . . . . . . . . 125Nuray Jannat, Sabbir Ahmed Sibli, Md. Anisur Rahaman Shuhag,and Md. Rashedul Islam

11 Automatic Missing-Child Recovery System using Eigenfaces . . . . . 137Nuruzzaman Faruqui and Mohammad Abu Yousuf

12 Prediction of Financial Distress in Bangladesh’s Banking SectorUsing Data Mining and Machine-Learning Technique . . . . . . . . . . 151Mohammed Mahmudur Rahman, Zinnia Sultana, Musrat Jahan,and Ramis Fariha

13 Pedestrian Age and Gender Identification from Far View ImagesUsing Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . 163Fatema Yeasmin Chowdhury, Md. Khaliluzzaman,Khondoker Md Arif Raihan, and M. Moazzam Hossen

14 Handwritten Numeral Superposition to Printed Form UsingConvolutional Auto-Encoder and Recognition UsingConvolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179M. I. R. Shuvo, M. A. H. Akhand, and N. Siddique

15 Chemical Reaction Optimization for Mobile RobotPath Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191Pranta Protik, Sudipto Das, and Md. Rafiqul Islam

16 An Automated Wireless Irrigation System by Using MoistureSensor and DTMF Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205Pallab Kumar Nandi, Romana Rahman Ema, Tajul Islam,and Shadman Jahan

17 Human Age Prediction from Facial Image Using TransferLearning in Deep Convolutional Neural Networks . . . . . . . . . . . . . 217M. A. H. Akhand, Md. Ijaj Sayim, Shuvendu Roy, and N. Siddique

18 Semantic Segmentation of Retinal Blood Vessel via Multi-scaleConvolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231SM Mazharul Islam

viii Contents

Page 10: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

19 Drug–Protein Interaction Network Detection and Analysisof Cardiovascular Disease-Related Genes: A BioinformaticsApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243Md. Rakibul Islam, Bikash Kumar Paul, Kawsar Ahmed,Ashraful Alam, and Moniruzzaman Rony

20 An Approach for Detecting Heart Rate Analyzing QRS Complexin Noise and Saturation Filtered ECG Signal . . . . . . . . . . . . . . . . . 253Sanjana Khan Shammi, Faysal Bin Hasan, and Jia Uddin

21 Identification of Genetic Links of Thyroid Cancerto the Neurodegenerative and Chronic Diseases Progression:Insights from Systems Biology Approach . . . . . . . . . . . . . . . . . . . . 263Md. Ali Hossain, Sheikh Muhammad Saiful Islam, Tania Akter Asa,Muhammad Sajjad Hussain, Md. Rezanur Rahman, Ahmed Moustafa,and Mohammad Ali Moni

22 DNA Motif Discovery Using a Hybrid Algorithm . . . . . . . . . . . . . . 275Sumit Kumar Saha and Md. Rafiqul Islam

23 Implementation of GSM Cellular Network Using USRP B200SDR and OpenBTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287Saulin Tuhin, Fahim, Syed Fahim Rahman, Syed Foysal Rahman,Abu Al Mursalin, and A. K. M. Muzahidul Islam

24 Sources and Impact of Uncertainty on Rule-BasedDecision-Making Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299Md. Zahid Hasan, Shakhawat Hossain, Mohammad Shorif Uddin,and Mohammad Shahidul Islam

25 A Faster Decoding Technique for Huffman Codes Using AdjacentDistance Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309Mir Lutfur Rahman, Pranta Sarker, and Ahsan Habib

26 A Closer Look into Paintings’ Style Using Convolutional NeuralNetwork with Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 317Protik Nag, Sristy Sangskriti, and Marium-E Jannat

27 How Can a Robot Calculate the Level of Visual Focusof Human’s Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329Partha Chakraborty, Mohammad Abu Yousuf, Md. Zahidur Rahman,and Nuruzzaman Faruqui

28 A Computer Vision Approach for Jackfruit DiseaseRecognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343Md. Tarek Habib, Md. Robel Mia, Md. Jueal Mia,Mohammad Shorif Uddin, and Farruk Ahmed

Contents ix

Page 11: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

29 A Novel Hybrid Swarm Intelligence Algorithm CombiningModified Artificial Bee Colony and Firefly Algorithms . . . . . . . . . . 355Sadman Sakib, Mahzabeen Emu,Syed Mustafizur Rahman Chowdhury, and Mohammad Shafiul Alam

30 Prediction of DNA-Binding Protein from Profile-Based HiddenMarkov Model Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371Rianon Zaman, Khan Raqib Mahmud, Abul Kalam Al Azad,and Md. Asifuzzaman Jishan

31 A Subword Level Language Model for Bangla Language . . . . . . . . 385Aisha Khatun, Anisur Rahman, Hemayet Ahmed Chowdhury,Md. Saiful Islam, and Ayesha Tasnim

32 Sentiment Analysis Based on Users’ Emotional ReactionsAbout Ride-Sharing Services on Facebook and Twitter . . . . . . . . . 397Md. Sabab Zulfiker, Nasrin Kabir, Hafsa Moontari Ali,Mohammad Reduanul Haque, Morium Akter,and Mohammad Shorif Uddin

33 An SDN-Enabled IoT Architecture with Fog Computingand Edge Encryption Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409Md. Rashid Al Asif, Nagib Mahfuz, and Md. Abdul Momin

34 Upgrading YouTube Video Search by Generating TagsThrough Semantic Analysis of Contextual Data . . . . . . . . . . . . . . . 425Jinat Ara and Hanif Bhuiyan

35 Internet of Things-Based Smart Security ProvisioningUsing Voice-Controlled Door Locking System . . . . . . . . . . . . . . . . 439Sumaiya Amin Anamika, Md. Ar Rafi Khan,Md. Mayen Hasan Sefat, and Muhammad Golam Kibria

36 A Novel Hybrid Machine Learning Model to Predict DiabetesMellitus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453Md. Shahriare Satu, Syeda Tanjila Atik, and Mohammad Ali Moni

37 An Optimized Pruning Technique for Handling Uncertaintyin Decision-Making Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467Md. Zahid Hasan, Shakhawat Hossain, Mohammad Shorif Uddin,and Mohammad Shahidul Islam

38 Design Exploration of LH-CAM with Updating Mechanism . . . . . . 477Omer Mujahid, Zahid Ullah, Abdul Hafeez, and Tama Fouzder

39 Computational Techniques for Structure Preserving ModelReduction of Constrain Dynamical Models . . . . . . . . . . . . . . . . . . . 487M. Monir Uddin

x Contents

Page 12: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

40 A Digital Platform Design for Supply Chain of Existing FishMarket in Bangladesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497Ratul Hasan Shaon, Md. Ariful Islam, M. Saddam Hossain Khan,and Amit Kumar Das

41 End-to-End Optical Character Recognition Using SytheticDataset Generator for Noisy Conditions . . . . . . . . . . . . . . . . . . . . . 515Md. Shopon, Nazmul Alam Diptu, and Nabeel Mohammed

42 Unsupervised Pretraining and Transfer Learning-Based BanglaSign Language Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529Zinnia Khan Nishat and Md. Shopon

43 Facilitating Hard-to-Defeat Car AI Using Flood-Fill Algorithm . . . 541Md. Sabir Hossain, Mohammad Robaitul Islam Bhuiyan,Md. Shariful Islam, Md. Faridul Islam, and Md. Al-Hasan

44 A Novel Method for Ghost Removal in High-Dynamic RangeImages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555Anil Mahmud, Rakib Ul Haque,Md. Akhtaruzzaman Adnan, and H. M. Sultan Al Mahmud

45 Internet of Things-Based Household Water Quality MonitoringSystem Using Wireless Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567S. M. Rahadul Islam, Md. Mayen Hasan Sefat, Md. Ahsan Habib,and Muhammad Golam Kibria

46 Developing a Fuzzy Feature-Based Online Bengali HandwrittenWord Recognition System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577Nur-A-Alam Shiddiki and Mohammed Moshiul Hoque

47 Automatic Summarization of Scientific Articles from BiomedicalDomain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591Md. Kaykobad Reza, Rifat Rubayatul Islam, Sadik Siddique,Md. Mostofa Akbar, and M. Sohel Rahman

48 Bringing a Change in Digital Mobile Banking ThroughDistributed Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603Md. Shahin Uddin Talukder, Nabil Islam, Shaila Sharmin,Abdullah Al Omar, and Sakib Hasan

49 Internal Abnormalities’ Detection of Human Body AnalyzingSkin Images Using Convolutional Neural Network . . . . . . . . . . . . . 615Parisa Mehera, M. F. Mridha, Nasima Begum,and Md. Mohaiminul Islam

Contents xi

Page 13: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

50 Orchestration-Based Task Offloading for Mobile EdgeComputing in Small-Cell Networks . . . . . . . . . . . . . . . . . . . . . . . . . 629Md. Delowar Hossain, Md. Abu Layek, Tangina Sultana,Md. Alamgir Hossain, Md. Imtiaz Hossain, Waqas ur Rahman,Yun Hwan Kim, and Eui-Nam Huh

51 DEB: A Delay and Energy-Based Routing Protocol for CognitiveRadio Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643Sumaiya Tasmim, Abrar Hasin Kamal,Mohammad Obaidullah Tusher, and Nafees Mansoor

xii Contents

Page 14: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

About the Editors

Dr. Mohammad Shorif Uddin is a Professor at Jahangirnagar University,Bangladesh. He completed his Doctor of Engineering in Information Science atKyoto Institute of Technology, Japan in 2002; his Master of Technology Educationat Shiga University, Japan, in 1999; and his MBA at Jahangirnagar University in2013. He is the Editor-in-Chief of ULAB Journal of Science and Engineering, anAssociate Editor of IEEE Access, and has served as General Chair of variousconferences, including the ICAEM 2019, IJCCI 2018, and IWCI 2016. He holdstwo patents for his scientific inventions, is a senior member of several academicassociations, and has published extensively in international journals and conferenceproceedings.

Dr. Jagdish Chand Bansal is an Assistant Professor (Senior Grade) at SouthAsian University, New Delhi, and Visiting Research Fellow of Mathematicsand Computer Science, Liverpool Hope University, UK. Dr. Bansal received hisPh.D. in Mathematics from the IIT Roorkee. Before joining SAU New Delhi, heworked as an Assistant Professor at the ABV - Indian Institute of InformationTechnology and Management Gwalior and at BITS Pilani. He is the Series Editorof the book series “Algorithms for Intelligent Systems (AIS),” published bySpringer; Editor-in-Chief of International Journal of Swarm Intelligence (IJSI),published by Inderscience; and an Associate Editor of IEEE Access, published bythe IEEE. He is the general secretary of the Soft Computing Research Society(SCRS). His chief areas of interest are swarm intelligence and nature-inspiredoptimization techniques. Recently, he proposed a fission-fusion social structure-based optimization algorithm, Spider Monkey Optimization (SMO), which is nowbeing applied to various problems from the engineering domain. He has publishedmore than 60 research papers in international journals/conference proceedings.

xiii

Page 15: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

Chapter 1CerebLearn: Biologically MotivatedLearning Rule for Artificial FeedforwardNeural Networks

Sumaiya Tabassum Nimi , Md. Adnan Arefeen ,and Muhammad Abdullah Adnan

1 Introduction

Artificial neural network (ANN) is a statistical datamodeling toolwhich is inspired bybiological neural networks in the brain. Itmodels complex relation between the inputsgiven to the network and the corresponding outputs by changing network parametersby flow of information through the network. Throughout this process, it learns usefulpatterns so that it can approximate complicated functions mapping inputs to thedesired outputs. AlthoughANNhad been developed being inspired by the brain, lateron machine learning researchers started to focus on more computationally efficientmodels inspired by applied mathematics and statistics rather than making the modelsbehave like the brain. Whereas computational neuroscientists continued studyingneural network modeling of biological brain by analyzing real data obtained byconducting experiments on the brain and providing mathematical reasoning behindthem, bridging the gap between these two research fields has gained interest recently.Glorot et al. [15] stated that rectifying nonlinearity as activation function is morebiologically plausible than hyperbolic tangent activation function. Recent works byBengio et al. [4] proposed a learning rule based on STDP, which is a well-knownform of synaptic plasticity responsible for the development of neuronal circuit inthe brain [7, 31]. We observe that none of the research works conducted so far onbiologically inspired neural networks paid attention to the learned weights. It was

S. T. Nimi · Md. Adnan Arefeen (B) · M. A. AdnanDepartment of Computer Science and Engineering, Bangladesh Universityof Engineering and Technology, Dhaka, Bangladeshe-mail: [email protected]

S. T. Nimie-mail: [email protected]

M. A. Adnane-mail: [email protected]

S. T. Nimi · Md. Adnan ArefeenDepartment of Computer Science and Engineering, United International University,Dhaka, Bangladesh

© Springer Nature Singapore Pte Ltd. 2020M. S. Uddin and J. C. Bansal (eds.), Proceedings of International Joint Conference onComputational Intelligence, Algorithms for Intelligent Systems,https://doi.org/10.1007/978-981-15-3607-6_1

1

Page 16: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

2 S. T. Nimi et al.

not observed whether the weights learned by the network conform to the weights inbiological neural network,whereas it is believed that distribution ofweights underlieslearning and memory in the brain. Distribution of learned weights gives us insightinto underlying learning algorithm. So if we can show that the weights learned by aparticular network are similar to the weights in biological neuronal network of thesame type, we can say that our network actually learns like biological neural network,which will be an important step toward bridging the gap between the two researchfields that study neural network—computational neuroscience andmachine learning.It will give an important insight into understanding how the brain learns and alsowill benefit the artificial intelligence community with a novel learning approach. Inthis paper, we adopt this approach. For a particular type of network in a particularregion of the brain, we investigate the distribution of weights. Then for that type ofartificial neural network, we find a learning rule that generates similar distributionof weights.

Neuroscience research found that the cerebellum region of the brain learns in afeedforward fashion [18]. Synaptic weights in the cerebellum have a specific form ofGaussian distribution. The distribution is marked by three characteristics—majorityof zero or near-zero weights, few large weights and values of weights are strictlymonotonically decaying [2, 9, 10, 23, 34]. Yet another important theory guidingthe weight values is Dale’s law [12], which says that biological neurons are of twoparticular types—inhibitory and excitatory. All synaptic weights coming into andout of inhibitory neurons are negative and those coming into and out of excitatoryneurons are positive. We note that none of the research works conducted so faron artificial neural network paid attention to these biologically imposed constraintswhile updating the weight values. In artificial neural network, weights are updatedtoward the direction of minimizing or maximizing the chosen objective function. Thesign of weights is not constrained to be positive or negative as per nature of neuron(excitatory or inhibitory). Also, values of the weights typically never become zerowhile training, which is not consistent with the biological observations. Distributionof weights from input layer to hidden layer of a typical ANN at the end of trainingis shown in Fig. 1a.

Fig. 1 Weight distribution in neural networks. a Distribution of weights obtained after trainingtypical feedforwardANN.bDistribution ofweights obtainedwhen trained byCerebLearn. cWeightdistribution of Granule cell-Purkinje cell synapses [9]

Page 17: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

1 CerebLearn: Biologically Motivated Learning Rule … 3

In this paper, we present a learning rule called CerebLearn for artificial neuralnetwork that trains the network with biologically imposed constraints. We proposeinitialization scheme for weights that satisfy biological constraints based on nor-malized initialization proposed by Glorot et al. [16]. All weights in our network areconstrained to stay positive for conformity to Dale’s law. We show that commonlyused penalties like L1 and L2 do not satisfy biological constraints. We introduce useof softsign [6]-based penalty on weight values. This penalty pushes small weightstoward zero and keeps some large weight values resulting in distribution of weightssimilar to that found in biological neural network as shown in Fig. 1c. Penalizingweight values using this penalty function improves the performance on ANN withnonnegative constraint on weight values. We test our rule to train a two-layer feed-forward network on digit recognition dataset MNIST. And we find that our networkachieves accuracy which is almost the state of the art on MNIST for two-layer feed-forward neural network without any distortion on training data [30] and without anyspecific technique applied to improve convergence like momentum, batch normal-ization [22], etc. Distribution of weights we obtain at the end of training by ourproposed learning rule CerebLearn (shown in Fig. 1b) is similar to the distributionof weights in cerebellum (as shown in Fig. 1c).

We organize our paper as follows. In Sect. 2, we discuss in detail neuroscienceobservations that motivated our work and shed light on our assumed constraints fromneuroscience perspectives. In Sect. 3, we present our proposed learning rule in anANN. In Sect. 4, we discuss our experimental findings. Finally, Sect. 5 presents theconclusion and future work.

2 Neuroscience Background

We at first discuss the nature of distribution of synaptic weights in the brain, partic-ularly in cerebellum region of the brain. We find notable properties of distributionof weights in cerebellum. We then discuss feedforward nature of neuronal networkin cerebellum. We then introduce another important constraint on values of synapticweights in the brain called Dale’s law [12]. Finally, we summarize some crucial bio-logical constraints on the basis of neuroscience observations. Our proposed learningrule CerebLearn learns weights during training under these constraints.

Computational neuroscientists strive to find modes of operations in differentregions of the brain through rule-based analytical and logical reasoning of observeddata. One such useful data is distribution of synaptic efficacy. Neurons in the brain areconnected by synapses. Changes in synaptic efficacy occur due to learning or mem-ory storage in the brain [19]. Hence from experimental data obtained about synapticweight distribution, neuroscientists attempt to find out modes of learning that gen-erated such distribution. Synaptic weight distribution is experimentally obtained indifferent regions of the brain like cortical layer 2

3 pyramidal-pyramidal synapses [13,20, 26], cortical layer 5 pyramidal-pyramidal synapses [14, 32, 33], hippocampalCA3-CA1 synapses [28] and cerebellar granule cell–Purkinje cell synapses [9, 23].

Page 18: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

4 S. T. Nimi et al.

All these distributions suggest that a large number of silent or nearly silent synapsesexist in all regions of the brain. The synaptic weight distribution of unitary granulecell and Purkinje cell synapses is observed in adult rat cerebellar slices [23]. Thedistribution of synaptic weights that was found had a large number of weak synapses.Rest of the synapses had distribution which was similar to partial Gaussian with along tail. The authors’ viewwas that this distribution leads to discriminative selectionof features provided by parallel fibers. It is found that such distribution arises dueto optimal learning and to make storage reliable [9]. They claimed this by findingout that perceptron with continuous nonnegative synaptic weights, learning randombinary input and output associations, has a distribution of synaptic weights that con-sisted of majority of silent synapses and strictly monotonically decaying Gaussiantail, for learning with maximal storage capacity. They also explained that fraction ofsilent synapses increases with increasing reliability of storage. The authors’ insightwas that active synapses need to be depressed to avoid erroneous action potentials.This process of depression causes many synapses to hit the barrier at zero weight(as synapses are constrained to be nonnegative). However, majority of exactly silentsynapses is not the general picture. Incomplete training or non-maximal capacitycauses synapses to achieve small but not exactly zero weights. As the perceptronlearns more, distribution becomes more constrained, driving many synapses closerto zero and broadening the Gaussian tail of the distribution [9].

Varshney et al. [34] investigated reasoning behind existing distribution of synapticstrength from an information-theoretic point of view. Their result was that undervolume constraint, to optimize information storage capacity per unit volume, synapticconnectivity needs to be sparse. But in spite of the presence of many synapses ofweight zero and a large number having weight near about zero, there will also besome synapses having large weight, leading to distribution of synaptic weights likestretched exponential [34]. This is the case, in fact, with distribution of synapticweight at different regions of the brain.

As we see, the brain is characterized by sparse connectivity. Among the differentparts of the brain, we focus on cerebellum in this paper because neuronal networkof cerebellum behaves like artificial feedforward network. The human brain learnsto produce motor behavior under varying circumstances by a feedforward controlsystem in cerebellum, and this learning has appropriate generalization to adopt tonew situations [18]. Prior to this work, Marr et al. [25] and Albus et al. [1] showedthat Purkinje cell in cerebellum has behavior similar to perceptron and climbingfiber (CF) provides error signal to this cell. Purkinje cell receives both input patternand associated output. Error signal propagated by CFs adjusts plasticity of granulecell–Purkinje cell synapses [1]. In this way, cerebellar cortex learns to produce con-textual motor output in response to sensory input. Weight distribution of synapsesin cerebellum has a partial Gaussian tail with a sharp spike at zero [2, 9, 23]. Thus,many synapses have near-zero weights. Also for maximal information storage capac-ity per unit volume, there will be some large synapses in spite of many silent andsmall synapses [34]. The long tail observed in experimentally observed Gaussiandistribution of synaptic weights supports this fact too [34].

Page 19: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

1 CerebLearn: Biologically Motivated Learning Rule … 5

In this paper, we attempt to find efficient feedforward learning that generatesweight distribution like that found in cerebellum. Our network satisfies constraintsimposed by Dale’s law, which says that all the synapses coming out of excitatoryneurons are excitatory and those from inhibitory are all inhibitory [12]. The networkwe propose is all-excitatory; i.e., we initialize the weights with positive values andconstrain them to stay positive while training. This implies that all neurons in ourmodel are excitatory and all synaptic weights have values conforming to Dale’s lawfor all-excitatory network of neurons. Training on this network by our proposed learn-ing produces distribution of weights similar to the distribution of synaptic weights ingranule cell–Purkinje cell synapses [23]; i.e., it has the property that many weightshave near-zero values, few weights have large values and weights follow a strictlydecaying partial Gaussian distribution. To the best of our knowledge, this is thefirst attempt to train an artificial neural network with these biological constraints onweight values.

3 CerebLearn

In this section, we have discussed our proposed learning rule CerebLearn that trainsan ANN under the biologically imposed constraints discussed in Sect. 2. CerebLearngenerates distribution of weights similar to that in cerebellum. The constraints arethat weight values should conform to Dale’s law and the values should be such thatthey have a monotonically decaying Gaussian distribution with some zero or near-zero weights and few large weights. We at first discuss how CerebLearn ensuresconformity to Dale’s law. Then, we discuss how CerebLearn trains an ANN with thefollowing constraints on weight values—some weights should be zero or near zero,few weights should be large, and weights should have a monotonically decayingdistribution. CerebLearn trains an ANN with these constraints on weight values byenforcing a suitable additional penalty on weight values. We show how commonlyused penalties on weights such as L1 and L2 do not satisfy these constraints. Weintroduce a softsign-based penalty function and show how it penalizes weight valuessatisfying all these constraints. Finally, we describe architecture of a simple two-layerfeedforward network which we train with our proposed learning rule CerebLearn.

Dale’s law states that biological neurons are of two particular types—inhibitoryand excitatory. All synaptic weights coming into and out of inhibitory neurons arenegative and those coming into and out of excitatory neurons are positive. PlainANN does not obey Dale’s law as we noted in Sect. 1. To conform to Dale’s law,we choose an all-excitatory network of neurons. That means, all synaptic weightscoming into and going out of all neurons are always positive. We achieve this byfirst initializing weights of all layers with positive values and then constrain them tostay positive while training. We propose an initialization scheme for positive initial-ization of weights based on normalized initialization [16]. As discussed by Glorotet al. [16], weights should be initialized with such values that values of activationsduring forward propagation phase and that of gradients during backward propagation

Page 20: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

6 S. T. Nimi et al.

phase neither vanish nor explode during deep feedforward training. For that, valuesof weights at each layer must satisfy the following condition:

Var [ωi ] = 2

ni + ni+1(1)

where Var [ωi ] is the variance of weights at layer i , ni is the size of layer i and ni+1

is the size of layer i + 1.The suggested [16] uniform distribution is

ω ∼ U [−√6√

n j + n j+1,

√6√

n j + n j+1] (2)

Tomaintain the samevariance as inEq. (1) and to accomplish positive initializationof weights, we shift this distribution to

ω ∼ U [0, 2√6√

n j + n j+1] (3)

This initialization serves our purpose of positive initialization of weights, andit also satisfies the condition for variance of weights presented in Eq. (1). Afterinitializing weights in this way, to ensure that weight values always stay positivewhile training, we put constraint on weight values during weight update. Duringweight update, if any weight goes negative, it is projected to the value zero.

Now, we discuss distribution of weights. We want a monotonically decayingGaussian distribution of weights with many zero or near-zero and few large weights.To accomplish this, we need to penalize weight values in such a way that these con-straints are satisfied.We know in anANN,we update parameter values by performinggradient descent on an objective function that we want to minimize by learning. Weuse cross-entropy between training data and model predictions in objective or costfunction as we want to maximize likelihood. Hence if we want to penalize or regu-larize any parameter value, then we include suitable penalty term or regularizationterm in cost function. Hence, we also need to include penalty term that serves ourpurpose in cost function.

If we denote cross-entropy by E(ω; X, y), cost by ˜E(ω; X, y) and penalty termby Ω(ω), we can write the cost function as:

˜E(ω; X, y) = E(ω; X, y) + Ω(ω) (4)

Now when this cost function is used, weights are updated as per the followingequation:

ω = ω − η(∇ω˜E(ω; X, y)) (5)

where ∇ω˜E(ω; X, y) = parameter gradient and η = learning rate.

Page 21: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

1 CerebLearn: Biologically Motivated Learning Rule … 7

Now, we discuss which penalty term, i.e., Ω(ω), we should use for our purpose.L1 and L2 regularizers are frequently used in ANN regime.We discuss whether thesepenalty terms serve our purpose. L1 penalty on weights is the sum of absolute valuesof weights. Hence, we can define L1 penalty as:

Ω(ω) = ||ω||11 (6)

When this function is used, we can write parameter gradient as:

∇ω˜E(ω; X, y) = λsign(ω) + ∇ωE(ω; X, y) (7)

whereλ is hyperparameter that weighs relative importance of L1 penalty term to errorin cost function and sign(ω) is simply the sign of ω which is applied element-wise.

Hence from Eqs. (5) and (7), we can say that L1 penalty penalizes weights byamount proportional to their sign. As such, L1 penalty on weights results in distri-bution of weights which has a large number of zero weights and some very largeweights. This distribution is not monotonically decaying Gaussian that we want.Also, L1 penalty results in poor accuracy. Now, we discuss the effect of L2 penalty.L2 penalty is defined as:

Ω(ω) = δωTω (8)

where δ is hyperparameter that weighs relative importance of L2 penalty term toerror in cost function. When this function is used, we can write parameter gradientas:

∇ω˜E(ω; X, y) = 2δω + ∇ωE(ω; X, y) (9)

Hence from Eqs. (5) and (9), we can see L2 penalty shrinks weights proportionalto their values. This penalty pushes all weights closer to zero. L2 penalty on weightsleads to monotonically decaying partial Gaussian distribution that we want. But theproblem is that for small values of δ it does not make weight matrix sparse. Smallweights get small penalty which is not sufficient to make them zero. For large valuesof δ, we get some zero weights. But performance becomes poor. The reason is thatwhen some weights in the network become zero or very small, we need some largeweight values for optimal learning. We note that Varshney et al. [34] also proved thatfew large weights are needed for optimal information storage capacity in the brainwhen there are large number of small weights. When L2 penalizes small weights tozero with large δ, it penalizes large weights more. Hence, training becomes harderwith large values of δ which leads to irregular performance. Hence, we can say forboth small and large values of δ, L2 is not suitable for our purpose.

Another penalty term that is used in the literature [27, 29] for pushing small valuestoward zero and keeping large values intact is hyperbolic tangent-based penalty. Thispenalty can be defined as:

Ω(ω) = tanh(σωTω) (10)

Page 22: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

8 S. T. Nimi et al.

When this function is used, we can write parameter gradient as:

∇ω˜E(ω; X, y) = 2σω(sech2(σωTω)) + ∇ωE(ω; X, y) (11)

Here, σ is hyperparameter that scales weight matrix. We choose σ as per ourrequirement. For same values of σ, this penalty penalizes small weights proportionalto their values and large weights inversely proportional to their values. It means thatsmall weights are made smaller and driven toward zero and large weights remainintact. Use of this penalty term gives monotonically decaying Gaussian distributionof weights with many zero or near-zero and few large weights. Hence, this penaltyfunction serves our purpose. Also, use of this penalty term gives smooth convergenceand good accuracy. But close inspection reveals some problems with this penaltyterm. Firstly, it converges exponentially toward zero. Hence, some large weightsget very small, i.e., nearly zero penalty, which tends to cause overfitting. Secondly,hyperbolic tangent function is computationally expensive as it involves calculatingexponential terms.

We replace hyperbolic tangent-based penalty by softsign-based penalty term.Softsign-based penalty term can be defined as:

Ω(ω) = σωTω

1 + |σωTω|= so f tsign(σωTω)

(12)

When this function is used, we can write parameter gradient as:

∇ω˜E(ω; X, y) = 2σω

(1 + σωTω)2+ ∇ωE(ω; X, y) (13)

This penalty behaves similar to hyperbolic tangent-based penalty in all respectsexcept that chances of overfitting decrease with the use of this function as it con-verges polynomially toward zero. Also, training becomes faster when this penalty isused because it is computationally less expensive as it involves calculation of someproducts only.

It is to be noted that although our proposed softsign-based penalty reduces chancesof overfitting, slight overfitting tends to occur as training proceeds. To avoid overfit-ting, we use L2 regularization with a very small hyperparameter δ. Since δ is verysmall, our added L2 penalty term has negligible effect on smaller weights. It hasnoticeable effect on much larger weights only. It penalizes larger weights by a smallamount which is essential to avoid overfitting.

In this paper, we consider simple feedforward network consisting of single hiddenlayer only. It is known that feedforward network consisting of single hidden layer canapproximate any function [21]. We denote by ωhidden matrix of weights from inputlayer to hidden layer and by ωoutput matrix of weights from hidden layer to outputlayer. We impose constraints on weights of both hidden layer and output layer. Ouroverall cost function becomes:

Page 23: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

1 CerebLearn: Biologically Motivated Learning Rule … 9

cost = αE + βSh + γSo + δL (14)

whereE = cross-entropySh = softsign-based penalty on hidden layer weightsSo = softsign-based penalty on output layer weightsL = L2 penalty on weights.

As stated above, we train our network using maximum likelihood. So, we add tothe cost function negative log-likelihood which is also known as cross-entropy. FromEq. (14), we can write E as:

E = −1

n

n∑

i=1

(log(P(y|X i ))

According to Eq. (13), we can write softsign-based penalty Sh and So as:

Sh =∑

i, j

σω2hiddeni, j

1 + |σω2hiddeni, j

| (15)

So =∑

j,k

σω2output j,k

1 + |σω2output j,k |

(16)

Similarly, we can write L2 penalty on both weights as:

L =∑

i, j

ω2hiddeni, j +

j,k

ω2output j,k

Now for this network, we can write the total cost function as:

cost = −α

n

n∑

i=1

(log(P(y|X i ))

+ β∑

i, j

σω2hiddeni, j

1 + |σω2hiddeni, j

|

+ γ∑

j,k

σω2output j,k

1 + |σω2output j,k |

+ δ(∑

i, j

ω2hiddeni, j +

j,k

ω2output j,k )

(17)

Here, α, β, γ and δ are hyperparameters that denote relative importance of theterms included in the cost function. Now, we discuss how we resolve this issue ofrelative importance. We always set α to 1 and set β, γ and δ to values less than 1

Page 24: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

10 S. T. Nimi et al.

to ensure that the cross-entropy always remains the most important term in the costfunction. Weights are updated more toward the direction of reduced entropy thantoward other penalty or regularization terms. As for β, γ and δ values, we note thatusing a constant value for β, γ and δ leads to poor performance because constantsβ and γ mean that during each epoch of training, a same number of weights aremade zero or pushed toward zero. This ultimately makes too many weights zero tobe useful. Hence, the overall performance gets hampered. And constant value of δalso creates problems. As training proceeds, the tendency of getting large weightvalues increases whereas it is less at the beginning. In addition, small values of δ willnot punish large values whereas large values of δ will hamper training at beginning.Hence, we need dynamic β, dynamic γ and dynamic δ.

Being inspired by simulated annealing algorithm, we keep high values of β andγ at beginning and then decrease β and γ as training proceeds. High values of β andγ at the beginning will make more weights zero at the beginning. Then, that effectdecreases as training proceeds. For implementing this, we initialize β with a constantvalue m and γ with a constant value n. To decrease β and γ as training proceeds, ateach training epoch, we set β and γ to m/epoch and n/epoch, respectively, whereepoch is the number of ongoing training epoch. As for δ, we ensure that value of δincreases as training proceeds. To accomplish this, we initialize δ to a constant valued. At each training epoch, we set δ as d × epoch, where epoch is the number ofongoing training epochs.

We use softsign function [6] as activation from hidden layer to output layer. Thisfunction has smoother asymptotes than hyperbolic tangent or sigmoid; hence, itschances of saturation become less when this activation function is used. We canwrite hidden layer activation zh as

zh = so f tsign(ωThidden · X + bhidden)

whereso f tsign(x) = x

1 + |x |Here, bhidden is the bias of hidden layer. We use softmax activation function as

activation at the output layer. Softmax function first exponentiates and then normal-izes outputs so that the sum of all network outputs is 1. Hence, we can write outputlayer activation zo as

zo = so f tmax(ωToutput · zh + boutput )

where

so f tmax(x)i = exp(xi )∑

j exp(x j )

Here, boutput is the bias of output layer. It is interesting to note that softmax functionhas a neuroscientific impact. Outputs from all softmax units sum to one, whichmeans

Page 25: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

1 CerebLearn: Biologically Motivated Learning Rule … 11

when output from one unit increases, output from another unit decreases. This createscompetition between output units, which is a neuroscientific concept [17].

4 Experimentation

We test our proposed rule on image recognition datasetMNIST [24].MNIST consistsof 60000 training examples and 10000 test examples of 28 × 28 grayscale images ofdigits from 0 to 9. We use 50000 examples from training set to train the network andkeep the rest for validation. We have used a 2.3 GHz Intel(R) Core(TM) i3-6100UCPU to run the experiment.We did not use any GPUs.We performed this experimentusing Theano framework [3]. We do not use any transformation on input data. Weuse a two-layer neural network with softsign as hidden unit activation and softmax asoutput unit activation. We do not use any additional optimization technique used inmachine learning regime for faster convergence and better performance, e.g., batchnormalization [22], adaptive learning rate, etc. We run the experiment for differentnumbers of hidden neurons. The number of output units is 10, one unit for each digitto be classified. Experimental results under different settings of hyperparametersare presented in Table1. We run all these experiments with our proposed positiveinitialization ofweights and enforce nonnegative constraint onweights duringupdate.We initialize bias values to zero and constrain them to stay positive during training.We set hyperparameter values of α = 1, γ = β/2, δ = 5 × 10−4 × epoch, learningrate = 0.01.

From Table1, we see that softsign-based penalty improves performance of two-layer neural networkunder positive constraints andwithweights initialized to positivevalues. For comparing with the state-of-the-art result [30], we ran experiment with800 hidden units and find that in spite of added constraints, our results are comparablewith the state-of-the-art result.

In Fig. 2, we present learning curve for several initial values of β and γ (β = 2γ)with 500 hidden neurons, our proposed positive initialization ofweights, nonnegativeconstraint on weights and hyperparameter values of α = 1, δ = 5 × 10−4 × epoch,learning rate= 0.01 upto 50 epochs.We observe that we obtain best performance forβ = 0.2/epoch and worst performance for β = 0, i.e., when we do not use softsign-based penalty. When the value of β is increased, convergence becomes slightlyslower.

InFig. 3,wepresent learning curve for different constraints anddifferent activationfunctions. For all the cases, we set α = 1, δ = 5 × 10−4 × epoch, β = γ = 1. We runthe experiments with 500 hidden neurons and learning rate = 0.01 upto 40 epochs.

In Figs. 4 and 5, we present distribution of weights we obtain in hidden layer andoutput layer, respectively,with softsign-based constraint.We see that the distributionsare similar to the distribution of weights obtained in cerebellum (Fig. 1c), since ithas many zero or near-zero and few large weights and a monotonically decayingGaussian distribution. Also in Figs. 6 and 7, we present distribution of weights we

Page 26: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

12 S. T. Nimi et al.

Table 1 Performance of different learning rules on two-layer neural network with MNIST dataset

Learning criteria β Hidden units Validation error Test error

Positiveinitialization +Positiveconstraints

0 500 2.15 2.25

Positiveinitialization +Positiveconstraints +Softsign penalty

0.2epoch 500 1.90 1.89

Positiveinitialization +Positiveconstraints +Softsign penalty

0.2epoch 800 1.70 1.92

State-of-the-art[30]

0 800 N/A 1.6

Fig. 2 Learning curve fordifferent initial values ofsoftsign penaltyhyperparameter. Number ofhidden units = 500, α = 1,δ = 5 × 10−4 × epoch,learning rate = 0.01 upto 50epochs with several initialvalues of β and γ whereβ = 2γ. Best performancefor β = 0.2/epoch and worstconvergence for beta = 0

obtain in hidden layer and output layer, respectively, with L1 constraint. We see thatthe distributions are not similar to the distribution of weights obtained in cerebellum(Fig. 1c).

In Fig. 8, we present the curve that shows the impact of β and γ (β = 2γ) valueson the sparsity of matrix of hidden layer weights.

We run all these experiments with our proposed positive initialization of weights,nonnegative constraint on weights, 500 hidden neurons and hyperparameter valuesof α = 1, δ = 5 × 10−4 × epoch, learning rate = 0.01. Percentage of exactly zeroweights varies slightly during training from epoch to epoch. We plot percentageof exactly zero weights which we obtain in average during training with varying

Page 27: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

1 CerebLearn: Biologically Motivated Learning Rule … 13

Fig. 3 Learning curve for different constraints and activation functions. Number of hidden units= 500, α = 1, δ = 1 × 10−4 × epoch, β = γ = 1

Fig. 4 Distribution ofhidden layer weights.Number of hidden units= 500, α = 1,β = 0.2/epoch,γ = 0.1/epoch,δ = 5 × 10−4 × epoch,learning rate = 0.01

initial values of β. We see that when initial β = 0, i.e., without adding softsign-based penalty, we obtain less than 3% zero weights. Softsign-based penalty makesthe matrix of weights sparser. As the initial value of β is increased, the percentageof zero weights increases.

Page 28: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

14 S. T. Nimi et al.

Fig. 5 Distribution of outputlayer weights. Number ofhidden units = 500, α = 1,β = 0.2/epoch,γ = 0.1/epoch,δ = 5 × 10−4 × epoch,learning rate = 0.01

Fig. 6 Distribution ofhidden layer weights with L1constraint. Number ofhidden units = 500, α = 1,β = 0.2/epoch,γ = 0.1/epoch,δ = 5 × 10−4 × epoch,learning rate = 0.01

Fig. 7 Distribution of outputlayer weights with L1constraint. Number ofhidden units = 500, α = 1,β = 0.2/epoch,γ = 0.1/epoch,δ = 5 × 10−4 × epoch,learning rate = 0.01

Page 29: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

1 CerebLearn: Biologically Motivated Learning Rule … 15

Fig. 8 Plot of averagepercentage of zero weights inhidden layer during trainingversus initial values ofsoftsign penaltyhyperparameter (β). Numberof hidden units = 500,hyperparameter valuesα = 1, γ= β/2,δ = 5 × 10−4 × epoch,learning rate = 0.01

5 Conclusion

In this paper,we have developed a learning rule forANNwhich learnsweights similarto the cerebellum region of the brain. We have done this by finding out properties ofdistribution of weights in cerebellum and then finding constraints under which ANNwill learn similar weight values. We have also studied a crucial biological constrainton weight change depending on the type of neurons, called Dale’s law which wasnever considered before in researches about biologically inspired learning. We haveensured conformity to Dale’s law in our work. For that, we have kept all weights inour network positive which implies that our network is all-excitatory in biologicalterminology. For ensuring all-positive weight values, we have adopted normalizedinitialization for positive weight values and then applied nonnegative constraint dur-ing weight update. We have tested our rule on MNIST dataset and obtained 98.11%accuracy, which is comparable to the state-of-the-art accuracy (98.4%) for a two-layer feedforward neural network without any transformation of input data or anyoptimization technique. Distribution ofweights, we have obtained usingCerebLearn,has been found to be similar to the distribution of weights in cerebellum. It is to benoted that the quality of the results might have been affected by hardware constraints.

6 Future Work

In the future, wewish to apply our rule to train deeper network architectures. Also, wewish to include negative weights in our networkwithout violatingDale’s law. Finally,wewill find out exactmathematical relations among values of hyperparameters in ournetwork and corresponding prediction error and percentage of weights that becomezero while learning. We will run the experiment using GPUs to improve the results.

Page 30: Mohammad Shorif Uddin Jagdish Chand Bansal Editors ...

16 S. T. Nimi et al.

Acknowledgements This research is partially supported by ICT Division Innovation Fund, GrantNo. 56.00.0000.028.33.080.17-214, Government of Peoples’ Republic of Bangladesh.

References

1. Albus JS (1971) A theory of cerebellar function. Math Biosci 10(1):25–612. Barbour B, Brunel N, Hakim V, Nadal JP (2007) What can we learn from synaptic weight

distributions ? TRENDS Neurosci 30(12):622–6293. Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I, Bergeron A, Bouchard N, Warde-

Farley D, Bengio Y (2012) Theano: new features and speed improvements. arXiv:1211.55904. Bengio Y, Lee DH, Bornschein J, Lin Z (2015) Towards biologically plausible deep learning.

arXiv:1502.041565. Bengio Y, Mesnard T, Fischer A, Zhang S, Wu Y (2015) An objective function for STDP.

arXiv:1509.059366. Bergstra J, DesjardinsG, Lamblin P, BengioY (2009)Quadratic polynomials learn better image

features. Technical Report 1337, Département d’Informatique et de Recherche Opérationnelle,Université de Montréal

7. BiGQ,PooMM(2001) Synapticmodification by correlated activity:Hebb’s postulate revisited.Annu Rev Neurosci 24(1):139–166

8. Brunel N (2016) Is cortical connectivity optimized for storing information? Nat Neurosci9. Brunel N, Hakim V, Isope P, Nadal JP, Barbour B (2004) Optimal information storage and the

distribution of synaptic weights: perceptron versus Purkinje cell. Neuron 43(5):745–75710. Chapeton J, Fares T, LaSota D, Stepanyants A (2012) Efficient associative memory storage in

cortical circuits of inhibitory and excitatory neurons. ProcNatlAcadSci 109(51):E3614–E362211. Csäji BC (2001) Approximation with artificial neural networks. Fac Sci Etvs Lornd Univ

Hungary 24:4812. Eccles J (1976) From electrical to chemical transmission in the central nervous system. Notes

Rec R Soc Lond 30(2):219–23013. Feldmeyer D, Lübke J, Sakmann B (2006) Efficacy and connectivity of intracolumnar pairs of

layer 2/3 pyramidal cells in the barrel cortex of juvenile rats. J Physiol 575(2):583–60214. FrickA, FeldmeyerD,HelmstaedterM, SakmannB (2008)Monosynaptic connections between

pairs of L5A pyramidal neurons in columns of juvenile rat somatosensory cortex. Cereb Cortex18(2):397–406

15. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Aistats, vol15, No 106, p 275

16. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neuralnetworks. In: Aistats, vol 9, pp 249–256

17. Goodfellow IJ, BengioY, Courville A (2015)Deep learning. AnMITPress book in preparation.Draft chapters available at http://www.deeplearningbook.org

18. HarunoM,Wolpert DM, Kawato M (1999) Multiple paired forward-inverse models for humanmotor learning and control. Advances in neural information processing systems, pp 31–37

19. Hebb DO (1949) The organization of behavior20. Holmgren C, Harkany T, Svennenfors B, Zilberter Y (2003) Pyramidal cell communication

within local networks in layer 2/3 of rat neocortex. J Physiol 551(1):139–15321. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal

approximators. Neural Netw 2(5):359–36622. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing

internal covariate shift. arXiv:1502.0316723. Isope P, Barbour B (2002) Properties of unitary granule cell → Purkinje cell synapses in adult

rat cerebellar slices. J Neurosci 22(22):9668–9678