Subtractive Perceptrons for Learning Images: A …Subtractive Perceptrons for Learning Images: A...

6
Subtractive Perceptrons for Learning Images: A Preliminary Report H.R.Tizhoosh 1,2 , Shivam Kalra 1 , Shalev Lifshitz 1 , Morteza Babaie 1 1 Kimia Lab, University of Waterloo, 2 Vector Institute, Toronto, Canada Ontario, Canada Abstract—In recent years, artificial neural networks have achieved tremendous success for many vision-based tasks. How- ever, this success remains within the paradigm of weak AI where networks, among others, are specialized for just one given task. The path toward strong AI, or Artificial General Intelligence, remains rather obscure. One factor, however, is clear, namely that the feed-forward structure of current networks is not a realistic abstraction of the human brain. In this preliminary work, some ideas are proposed to define a subtractive Perceptron (s- Perceptron), a graph-based neural network that delivers a more compact topology to learn one specific task. In this preliminary study, we test the s-Perceptron with the MNIST dataset, a commonly used image archive for digit recognition. The proposed network achieves excellent results compared to the benchmark networks that rely on more complex topologies. I. MOTIVATION The quest for artificial intelligence has been focusing on the creation of an ultimate machine that can perform cognitive tasks at the level of human capabilities. In the past seven decades, tremendous effort has been focused on developing learning automatons that can learn from the past experience in order to generalize to unseen observations. After decades of slow progress and stagnation, we are finally witnessing the success of artificial neural networks (ANNs) mainly achieved though the concept of deep learning. A large number of cognitive tasks have been performed with deep architectures to accurately recognize faces, objects and scenes. The recent success has also made it clear that the intrinsic nature of current ANNs is based on uni-task orientation [41]. Besides, multiple ANNs cannot be easily integrated into a larger network in order to perform multiple tasks as is probably the case in human brains (see Fig. 1) [32], [7]. It is a well- known fact that the feed-forward and layered topology of ANNs is not a realistic model of how the human brain is structured. This has been the case since the early days of artificial intelligence when Perceptrons were first introduced. Hence, revisiting the Perceptron as a building block of ANNs appears to be the right focus of any investigation to move toward more capable ANNs. In this paper, we introduce the notion of a subtractive Perceptron (s-Perceptron), a graph- based neural network that uses shared edges between neurons to process input differences in addition to the ordinary inputs. The differences should not only provide more information on the input but also provide the values for additional weights, namely for the edges between the neurons. We test the Fig. 1. Motivation: If we define the Perceptron as a graph (blue nodes; to perform one specific task) it may be more easily integrated within a larger graph (gray nodes; responsible for many other tasks). proposed architecture with the MNIST dataset for digit recog- nition and critically summarize the findings. II. BACKGROUND ANNs have been under investigations for more than half a century. A fundamental limitation of ANNs is that they are a collection of feed-forward layers of neurons which is different from the rather graph-like and massively connected topology of human brain [32]. This layered structure is imple- mented to simplify the learning for gradient descent and its variations to solve the credit assignment problem. However, ANNs can hardly claim to be a realistic imitation of the human brain. The latter is a massively connected network of neurons, approximately 80 × 10 9 neurons with more or less 10 14 connections, with an average of 10,000 connections per neuron. We are all aware of this “model flaw” but there is not much we can do about it, and considering the recent success stories of deep ANNs we may not even have to. In addition, there have been other compelling reasons to stick with feed- forward structures and not investigate graphs as a potential model extension. The most apparent reason is perhaps that arXiv:1909.12933v1 [cs.CV] 15 Sep 2019

Transcript of Subtractive Perceptrons for Learning Images: A …Subtractive Perceptrons for Learning Images: A...

Page 1: Subtractive Perceptrons for Learning Images: A …Subtractive Perceptrons for Learning Images: A Preliminary Report H.R.Tizhoosh 1;2, Shivam Kalra , Shalev Lifshitz , Morteza Babaie

Subtractive Perceptrons for Learning Images:A Preliminary Report

H.R.Tizhoosh1,2, Shivam Kalra1, Shalev Lifshitz1, Morteza Babaie1

1 Kimia Lab, University of Waterloo,2 Vector Institute, Toronto, Canada Ontario, Canada

Abstract—In recent years, artificial neural networks haveachieved tremendous success for many vision-based tasks. How-ever, this success remains within the paradigm of weak AI wherenetworks, among others, are specialized for just one given task.The path toward strong AI, or Artificial General Intelligence,remains rather obscure. One factor, however, is clear, namely thatthe feed-forward structure of current networks is not a realisticabstraction of the human brain. In this preliminary work,some ideas are proposed to define a subtractive Perceptron (s-Perceptron), a graph-based neural network that delivers a morecompact topology to learn one specific task. In this preliminarystudy, we test the s-Perceptron with the MNIST dataset, acommonly used image archive for digit recognition. The proposednetwork achieves excellent results compared to the benchmarknetworks that rely on more complex topologies.

I. MOTIVATION

The quest for artificial intelligence has been focusing on thecreation of an ultimate machine that can perform cognitivetasks at the level of human capabilities. In the past sevendecades, tremendous effort has been focused on developinglearning automatons that can learn from the past experiencein order to generalize to unseen observations. After decadesof slow progress and stagnation, we are finally witnessing thesuccess of artificial neural networks (ANNs) mainly achievedthough the concept of deep learning. A large number ofcognitive tasks have been performed with deep architecturesto accurately recognize faces, objects and scenes.

The recent success has also made it clear that the intrinsicnature of current ANNs is based on uni-task orientation [41].Besides, multiple ANNs cannot be easily integrated into alarger network in order to perform multiple tasks as is probablythe case in human brains (see Fig. 1) [32], [7]. It is a well-known fact that the feed-forward and layered topology ofANNs is not a realistic model of how the human brain isstructured. This has been the case since the early days ofartificial intelligence when Perceptrons were first introduced.Hence, revisiting the Perceptron as a building block of ANNsappears to be the right focus of any investigation to movetoward more capable ANNs. In this paper, we introduce thenotion of a subtractive Perceptron (s-Perceptron), a graph-based neural network that uses shared edges between neuronsto process input differences in addition to the ordinary inputs.The differences should not only provide more information onthe input but also provide the values for additional weights,namely for the edges between the neurons. We test the

Fig. 1. Motivation: If we define the Perceptron as a graph (blue nodes; toperform one specific task) it may be more easily integrated within a largergraph (gray nodes; responsible for many other tasks).

proposed architecture with the MNIST dataset for digit recog-nition and critically summarize the findings.

II. BACKGROUND

ANNs have been under investigations for more than halfa century. A fundamental limitation of ANNs is that theyare a collection of feed-forward layers of neurons which isdifferent from the rather graph-like and massively connectedtopology of human brain [32]. This layered structure is imple-mented to simplify the learning for gradient descent and itsvariations to solve the credit assignment problem. However,ANNs can hardly claim to be a realistic imitation of thehuman brain. The latter is a massively connected network ofneurons, approximately 80 × 109 neurons with more or less1014 connections, with an average of 10,000 connections perneuron. We are all aware of this “model flaw” but there is notmuch we can do about it, and considering the recent successstories of deep ANNs we may not even have to. In addition,there have been other compelling reasons to stick with feed-forward structures and not investigate graphs as a potentialmodel extension. The most apparent reason is perhaps that

arX

iv:1

909.

1293

3v1

[cs

.CV

] 1

5 Se

p 20

19

Page 2: Subtractive Perceptrons for Learning Images: A …Subtractive Perceptrons for Learning Images: A Preliminary Report H.R.Tizhoosh 1;2, Shivam Kalra , Shalev Lifshitz , Morteza Babaie

To appear in the 9th Intern. Conf. on Image Processing Theory, Tools and Applications (IPTA 2019), Istanbul, Turkey

our training graphs with neurons with mutual dependenciesmay be very difficult. Even breakthroughs such as layerwisepre-training are based on “restricted” Boltzmann machines,architectures that eliminate intra-layer connections to converta general Boltzmann machine into a feed-forward network.

The history of ANNs starts with the introduction of Percep-trons in the 1950s [30], [26] which can be defined as linearclassifiers. The class of feed-forward multi-layer networksfound attraction after several authors introduced the backprop-agation algorithm was conceived and further developed [9],[31]. Continuing up to the late 1990s, multi-layer Perceptrons(MLPs) were intensively investigated with many applicationsin classification, clustering and regression [44]. However, itwas becoming increasingly clear that MLPs were providinglow recognition rates for challenging problems when config-ured with a small number of layers, in which the increasing thenumber of layers would result in intractable training sessions,unable to deliver results [15]. Alternative architectures such asthe Neocognitron [14] with the goal of finding a solution butno capable learning algorithm was available to exploit a newtopology.

On the other hand, self-organizing maps were among thefirst generation of practical ANNs which could cluster datain an unsupervised manner [20]. The introduction of convolu-tional neural networks (CNNs), was a major milestone towarda breakthrough [24]. CNNs used multi-resolutional represen-tations and, through weight sharing, incorporated the learningof a multitude of filters to extract unique features of inputimages [4]. This ultimately solved the issue with external andnon-discriminating features yet the training of CNNs remaineda challenge. In subsequent years, several ideas changed this:the introduction of restricted Boltzmann machines, and thelayer-wise greedy training enabled researchers to train “deepnetworks”; CNNs with many hidden layers [16], [5]. The firstimpressive results of such networks for image recognitionstarted the chain of success stories leading to what we calldeep learning [21]. Of course, there still may be cases when acapable non-neural classifier combined with cleverly designedfeatures might be the solution [11] but the deep networks havedrastically restricted the applications of such techniques.

There currently exists a rather large body of literature on“complex brain networks” [36], [37], [6], [10]. The generalidea of using “graphs” in connection with neural networks isnot new. Kitano used graph notation to generate feed-forwardnetworks and learn via evolutionary algorithms [19]. However,this had no effect on the topology of the network. Generalizedrandom connectivity has been explored to shed light on morecomplex networks [40]. Modelling the human brain as acollection of interacting networks has recently drawn attentionin research as well [29], [38]. This may clarify anatomicaland functional connectivity in the brain, and subsequentlyimpact the way that we understand and design ANNs. In suchnetworks that can be implemented as graphs, edges could beboth weighted or unweighted [29].

The theories of “small-world and scale-free networks” canalso contribute to the development of neuroscience, pathology

and to the field of AI [38]. Small-world networks lay betweenregular and random graphs and are structurally close to socialnetworks in that they have a higher clustering and almost thesame average path than random networks with the same num-ber of nodes and edges [34]. In contrast scale-free networksare heterogeneous with respect to degree distribution. Theyare called scale-free since when zooming in on any par, onecan observe a few but significant number of nodes with manyconnections, and there is a trailing tail of nodes with very fewconnections at each level of magnification [22], [43]. So far,few papers have suggested to put this type of brain modellinginto any implementation. The main challenge was, and stillis, to propose a new “learning” procedure for such complexnetworks.

In 2009, Scarselli et al. suggested Graph Neural Networks(GNNs) [33]. The authors had applications such as computervision and molecular biology in mind, but GNNs do infact use specific algorithm for training, which is based onbackpropagation like traditional architectures. This is becausetheir proposed architecture is still of “feed-forward” natureand the “graph” aspect of their model was the capability ofextending to the inputs available in graph forms, and not theinternal topology of the network itself. Similarly, Duvenaudet al. apply deep networks on graphs to learn molecularfingerprints [13]. As well, Niepert et al. use neighbourhoodgraph construction and subsequent graph normalization to feeddata into a convolutional network [27]. Multi-layer Graph Con-volutional Network (GCN) also use graphs but they keep thegeneral feed-forward architecture [18]. In contrast, Bruna etal. have recognized that incorporating “manifold-like graphs”structure can create more compact topologies (less parametersto adjust) [8]. They propose an exploitation of the graph’sglobal structure with the spectrum of its graph-Laplacianto generalize the convolution operator when convolutionalnetworks are the base for the design. Hence, their graphunderstanding is rather implicit and restricted to grids in orderto sustain the useful, but limiting, convolutional structure. Qiet al. demonstrated some usability of these networks for imagesegmentation [28] while Li et al. proposed the gated recurrentversion of graph neural networks to process non-sequentialdata [25].

In summary, although most scientists suggest that the brainis composed of densely connected (graph-like) structures [7],[17], research has primarily focused on ANNs with a seriesof stacked layers such that the main ANN learning algorithm,gradient descent, can be applied through error backpropaga-tion. Any innovation in the ANN domain, thus, requires usnot only to fully embrace “connected graphs” as the buildingblock for not-layered networks, but also to be prepared toabandon gradient descent and explore unknown territories insearch of new learning algorithms that can handle the neurondependencies as a result of intra-layer connections.

A network that similar to brain topology has to be agraph with a large number of connections. In the section, theconcept of a subtractive Perceptron is introduced. Inputs areprojected into a graph and the output of neurons are projected

Page 2

Page 3: Subtractive Perceptrons for Learning Images: A …Subtractive Perceptrons for Learning Images: A Preliminary Report H.R.Tizhoosh 1;2, Shivam Kalra , Shalev Lifshitz , Morteza Babaie

To appear in the 9th Intern. Conf. on Image Processing Theory, Tools and Applications (IPTA 2019), Istanbul, Turkey

into output nodes. We will discuss the design challenges andattempt to provide solutions. Additionally, we run preliminaryexperiments to demonstrate the feasibility of our ideas.

III. SUBTRACTIVE PERCEPTRON

Consider an undirected complete graph G = (V,E) witha set of nodes V and their edges E. This means for everyrandomly selected vi and vj (i 6= j), the edges E(vi, vj) andE(vj , vi) exist and share the same weight (Fig. 2). Such anetwork can be considered a small sub-graph within a muchlarger graph (i.e., similar to the brain, see Fig. 1) that eachlearn a specific task.

Fig. 2. A complete undirected graph G = (V,E) with n = |V | = 20 vertices(neurons/nodes) and |E| = n(n−1)/2 = 190 edges. All edges are assigneda weight that is shared between corresponding nodes. If outputs (axons) ofneurons are the inputs (dendrites) to other neurons, learning algorithms suchas gradient descent may not be able to adjust the weights due to mutualdependencies.

The transition from a traditional Perceptron to its graphversion creates several design and algorithmic problems thatneed to be addressed in order to create a new type ofneural network. In the following sub-section, we explore thesechallenges and attempt to propose preliminary solutions.

A. Problem 1: In- and Outputs

Working on a complete Graph G with n neurons, the firstquestion would be how to organize the in- and outputs. Asthere are no layers, there is also no obvious access points forinputting data and outputting the results, something that areobvious and simple in feed-forward layered networks.

Assuming that for n neurons we have ninput input neuronsand noutput output neurons, the most straightforward solutionwould be n = ninput = noutput. If we project all inputs intoall neurons and we take an output from each neuron, thisquestion can be easily answered. Taking as many neurons aswe have inputs and outputs may be convenient but will becomputationally challenging for large n. However, this may

also contribute to the storage and recognition capacity of thenetwork. In some cases, we may choose to use averagingand majority vote to decrease the number of outputs, a quitecommon approach (see Section IV).

The main change compared to a conventional Perceptronis that we are allowing intra-layer connections. If the outputof neurons are used as inputs for other neurons, we cannotcalculate local gradients, as mutual dependencies do not allowfor gradient-based local adjustments.

B. Problem 2: Neurons

Another problem, perhaps even more severe than the previ-ous one, in defining a complete graph as a new neural network,would be the characterization of a neuron. Artificial neuronshave weighted inputs from the input layer or other neuronsfrom a previous layer. However, they do not have incomingweighted connections from other neurons of the same layer.As a matter of fact, some of the breakthroughs in recent yearswere achieved because we excluded exactly this possibility;in Boltzmann machines, by excluding connections betweenneurons of the same layer, restricted Boltzmann machineswere born and enabled us to train deep networks [2]. In acomplete graph, however, there is no notion of a “layer”. Sowhat does a neuron look like in such a network?

Let’s call, for sake of distinction, a neuron in a completegraph a paraneuron. A paraneuron has two distinct sets ofinputs: 1) directed inputs xi weighted with correspondingweights wij for the i-th input going to the j-th neuron, and2) bidirectional graph edges with shared weights vi (thatmake the paraneurons suitable for building a connected graph)applied on input differences xi − xi+1 (Fig. 3). Using inputdifferences emerged from practical design constraints: we hadto assign weights to the graph edges but to give them a rolerather than a collection of “biases” they needed to be attachedto some values rather the inputs; input differences can severeas inputs as they do provide additional information about thebehaviour of the inputs but their selection is empirical based ongeneral usefulness of “differences” (e.g., using differences toextend genetic algorithms to differential evolution [39]). Giventhree logistic functions f(·), fw(·), and fv(·), the output yj ofthe j-th paraneuron can be given as

yj=f

(fv(

n−1∑i=1

(xi−xi+1)vi) + fw(

n∑i=1

xiwij)+wj0

), (1)

with the bias wj0. As it can be seen in a simple examplewith only 3 paraneurons (Fig. 4), all inputs can be fed intoall paraneurons and n edges between the n paraneurons areweighted and shared between connected nodes to process inputdifferences.

C. Problem 3: Learning

The idea of graphs and their connection to neural networkshave been around for some time. The major reason for notpursuing complete graphs as the main concept of neuralnetwork implementation is perhaps the immense difficulty intraining such networks. The sheer number of weights, and

Page 3

Page 4: Subtractive Perceptrons for Learning Images: A …Subtractive Perceptrons for Learning Images: A Preliminary Report H.R.Tizhoosh 1;2, Shivam Kalra , Shalev Lifshitz , Morteza Babaie

To appear in the 9th Intern. Conf. on Image Processing Theory, Tools and Applications (IPTA 2019), Istanbul, Turkey

Fig. 3. Paraneuron – all inputs are inserted into a processing unit as usualwhereas the bidirectional connections from other neurons are also used toweight the input differences (see Equation 1). The weighting of differenceshappens inside the unit.

Fig. 4. A s-Perceptron with 3 inputs and 3 paraneurons.

the existence of mutual dependencies, makes the training ofa complete graph appear impossible, and not just for large n.To train the proposed s-Perceptron with n paraneurons, wepropose three key ideas:

1) Adjusting the weights for large sub-graphs G′ ⊂ G (with|V ′| ≈ |V |) at the beginning of learning and continue thelearning by progressively making the sub-graphs smallerand smaller until |V ′| � |V |,

2) the sub-graphs G′ are selected randomly, and3) the adjustments are graph-wise, meaning that we chose

between the sub-graph G′ and its modified version G′ toeither update the graph with the weights of G′ (hence,no change) or with the weights of its modified versionG′ (hence, adjustment) depending on the total error ofthe graph G.

The network error is as always the driving force behind thelearning. The main design challenge, however, is to define themodification that generates G′.

IV. EXPERIMENTS

To conduct preliminary tests on the proposed s-Perceptron,we selected “digit recognition”. Handwritten digit recognitionis a well-established application for neural networks. Theconfiguration for digit recognition with the s-Perceptron isillustrated in Fig. 5. The main change for using the s-Perceptron for digit recognition is that we average b n

10coutputs to generate probabilities for digits. Fully connectedlayers have a large number of parameters and are prone tooverfitting. In the architecture of s-Perceptron we avoid usingthe fully connected at the end of the network. Instead, weare using the global average pooling. Global average poolinggroups adjacent neurons into multiple groups and calculatetheir average values. The average values from all the groupsare activated with Softmax function to calculate the outputclass distribution. The average operation itself doesn’t haveany parameter thus tend not to overfit as much as fullyconnected layers. In fact, many modern CNN architecturesthese days tend to use global average pooling instead offully connected layers. For examples, DenseNet, SqueezeNet,Inception Network.

A. MNIST Dataset

The MNIST dataset is one of the most popular imageprocessing datasets and contains several thousands handwrittendigits [23] (Fig. 6). In particular, there are a total of 70,000images of size 28 × 28 depicting digits 0 to 9. The datasethas a preset configuration of 60,000 images for training and10,000 images for testing. The s-Perceptron, hence, will haven = 28× 28 = 784 paraneurons.

B. Training via Gradient Descent

The proposed architecture is a graph. However, since noparaneuron’s output is used as input to another paraneuron, thegradient descent algorithm is still applicable. The proposed s-Perceptron was first implemented using the TensorFlow [1]library which supports automatic differentiation of compu-tational graphs. Paraneurons were implemented accordingto Fig. 3 with a slight modification. The modification involves– the addition of an extra term to the input differences, i.e.xn− x1 for the simplification of the code. The addition of anextra term balances the length of input x and input differencesxn − xn+1. With the preceding modification, equation (1)can be rewritten in form of matrix multiplication as yj =f (fv (xdv) + fw (xw)), where xd is input differences, v & ware weight matrices of size |x| × |x|, and v is a symmetricalmatrix. The symmetric constraint of v can be compensatedby substituting v with v+v′

2 . Rectified Linear Unit (ReLU)is chosen as the activation function fv and fw. For positiveinputs, this function will return the input itself, otherwise itwill return zero. Since the output of a ReLU function is apositive number, the activation function f can be ignored.

Fig. 7 shows the accuracy of the model for each iterationduring the training phase. The maximum training accuracyachieved by the model is 98.63%. Whereas the accuracyreported in literature for a 2 layer MLP on MNIST dataset is

Page 4

Page 5: Subtractive Perceptrons for Learning Images: A …Subtractive Perceptrons for Learning Images: A Preliminary Report H.R.Tizhoosh 1;2, Shivam Kalra , Shalev Lifshitz , Morteza Babaie

To appear in the 9th Intern. Conf. on Image Processing Theory, Tools and Applications (IPTA 2019), Istanbul, Turkey

Fig. 5. Using s-Perceptron for digit recognition: The setup for using a s-Perceptron for recognizing digits from the MNIST dataset.

Fig. 6. Examples from MNIST dataset.

Fig. 7. The training accuracy of s-Perceptron for gradient descent algorithm.

98.4% [35]. The higher accuracy of the s-Perceptron justifiesits learning capacity. The number of trainable parameters inthe model is 784× 784+ 784×784

2 = 921984. There is a hugeliterature on MNIST dataset with a leader-board reporting allmajor results 1. Ciresan et al. reported 99.77% accuracy usingen ensemble of n multi-column deep convolutional neural

1http://yann.lecun.com/exdb/mnist/

networks followed by averaging [12]. Wan et al. used Drop-Connect on a network with 700 × 200 × 10 = 140, 000, 000weights and reported highest accuracy equal to 99.79% [42].However, when simpler structures are used, accuracy cannotexceed 98.88% (Table 2 in [42]). Such numbers were achievedwith two layer-networks with 800 × 800 × 10 = 6, 400, 000weights. This a network 14 times larger than the s-Perceptron.

Critical Summary – The extensions of the historic Percep-tron may have the potential to become the building block of anew generation of artificial neural networks. The s-Perceptron,however, suffers from multiple shortcomings that should beaddressed in future works:• The s-Perceptron becomes a realistic graph implementa-

tion when the axons of paraneurons become the dendritesto other paraneurons. In its current form, the s-Perceptronis just a graph-like topology.

• Vectorizing the image to be used as the input for the s-Perceptron limits the applications to cases where smallimages are used or downsampling is not a disadvantagetoward accuracy. Employing features, or some type ofembedding, instead of vectorized raw pixels could bemore effective. This may be regarded as the imitationof the optic nerve transferring the encoded retinal infor-mation to the visual cortex.

• Using input differences as inputs for paraneurons seemsto be a simple (and arbitrary) remedy to extend Percep-trons to graphs. However, investigations need to assesswhether processing the differences is necessary or not.

• The output of the s-Perceptron needs to be customizedfor each application. This is due to the assumption thatall paraneurons do contribute to the output.

• Although we did not test the case when a neu-rons/paraneurons exhibit dependencies,it seems gradientdescent would not work anymore. Hence, investigatingnon-gradient based learning is necessary.

• Graphs do inherently provide more access points forbeing integrated within a larger network. However, thisdoes not mean that embedding subgraphs (for smalltasks) inside larger graphs (for a multitude of tasks) is

Page 5

Page 6: Subtractive Perceptrons for Learning Images: A …Subtractive Perceptrons for Learning Images: A Preliminary Report H.R.Tizhoosh 1;2, Shivam Kalra , Shalev Lifshitz , Morteza Babaie

To appear in the 9th Intern. Conf. on Image Processing Theory, Tools and Applications (IPTA 2019), Istanbul, Turkey

straightforward. The research in this regard is still in itsinfancy.

Beyond gradient descent, we also experimented with newlearning algorithms based on opposition [3] where the weightswere adjusted by switched between their values and oppositevalues simultaneously shrinking the search interval. However,we could not achieve accuracy values higher than 50%.

V. CONCLUSIONS

In this paper, we introduced the notion of a subtractivePerceptron (short s-Perceptron), a graph implementation ofa Perecptron which in addition to existing inputs also usesthe pairwise input differences as shared edge weights betweenits neurons, called paraneurons. The s-Perceptron is an undi-rected graph with no paraneuron’s input coming from anotherparaneuron, to investigate a simple graph topology. The mo-tivation for working on graph-based networks is to create amore realistic imitation of the human brain’s architecture; acapable d-Perceptron could be the building block for a muchlarger network (i.e., brain) that learns one specific task withinthat larger network. The proposed s-Perceptron delivers goodresults if trained with the gradient descent algorithm. However,if we extend it to a directed graph, then we may have toabandon gradient descent and find other learning algorithmsto solve the credit assignment problem.

REFERENCES

[1] M. Abadi et al. TensorFlow: Large-scale machine learning on hetero-geneous systems, 2015. Software available from tensorflow.org.

[2] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski. A learning algorithmfor boltzmann machines. Cognitive Science, 9(1):147 – 169, 1985.

[3] F. S. Al-Qunaieer, H. R. Tizhoosh, and S. Rahnamayan. Oppositionbased computinga survey. In The 2010 International Joint Conferenceon Neural Networks (IJCNN), pages 1–7. IEEE, 2010.

[4] Y. Bengio, A. Courville, and P. Vincent. Representation learning: Areview and new perspectives. IEEE transactions on pattern analysisand machine intelligence, 35(8):1798–1828, 2013.

[5] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In Advances in neural informationprocessing systems, pages 153–160, 2007.

[6] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang.Complex networks: Structure and dynamics. Physics reports, 424(4-5):175–308, 2006.

[7] U. Braun, A. Schafer, H. Walter, S. Erk, N. Romanczuk-Seiferth,L. Haddad, J. I. Schweiger, O. Grimm, A. Heinz, H. Tost, et al.Dynamic reconfiguration of frontal brain networks during executivecognition in humans. Proceedings of the National Academy of Sciences,112(37):11678–11683, 2015.

[8] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks andlocally connected networks on graphs. preprint arXiv:1312.6203, 2013.

[9] A. Bryson and Y.-C. Ho. Applied optimal control. 1969. Blaisdell,Waltham, Mass, 8:72, 1969.

[10] R. L. Buckner. The cerebellum and cognitive function: 25 years ofinsight from anatomy and neuroimaging. Neuron, 80(3):807–815, 2013.

[11] Z. Camlica, H. R. Tizhoosh, and F. Khalvati. Medical image classifi-cation via svm using lbp features from saliency-based folded data. In2015 IEEE 14th International Conference on Machine Learning andApplications (ICMLA), pages 128–132. IEEE, 2015.

[12] D. Ciresan, U. Meier, and J. Schmidhuber. Multi-column deep neuralnetworks for image classification. arXiv preprint arXiv:1202.2745, 2012.

[13] D. K. Duvenaud, D. Maclaurin, J. Iparraguirre, R. Bombarell, T. Hirzel,A. Aspuru-Guzik, and R. P. Adams. Convolutional networks on graphsfor learning molecular fingerprints. In Advances in neural informationprocessing systems, pages 2224–2232, 2015.

[14] K. Fukushima. Neocognitron: A hierarchical neural network capable ofvisual pattern recognition. Neural networks, 1(2):119–130, 1988.

[15] X. Glorot and Y. Bengio. Understanding the difficulty of trainingdeep feedforward neural networks. In Proceedings of the thirteenthinternational conference on artificial intelligence and statistics, pages249–256, 2010.

[16] G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm fordeep belief nets. Neural computation, 18(7):1527–1554, 2006.

[17] S. H. Hosseini, F. Hoeft, and S. R. Kesler. Gat: a graph-theoreticalanalysis toolbox for analyzing between-group differences in large-scalestructural and functional brain networks. PloS one, 7(7):e40709, 2012.

[18] T. N. Kipf and M. Welling. Semi-supervised classification with graphconvolutional networks. arXiv preprint arXiv:1609.02907, 2016.

[19] H. Kitano. Designing neural networks using genetic algorithms withgraph generation system. Complex systems, 4(4):461–476, 1990.

[20] T. Kohonen and S.-O. Maps. Springer series in information sciences.Self-organizing maps, 30, 1995.

[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classificationwith deep convolutional neural networks. In Advances in neuralinformation processing systems, pages 1097–1105, 2012.

[22] N. Labs. Types of Networks: Random, Small-World, Scale-Free, 2018.[23] Y. LeCun. The mnist database of handwritten digits. http://yann. lecun.

com/exdb/mnist/, 1998.[24] Y. LeCun, Y. Bengio, et al. Convolutional networks for images, speech,

and time series. The handbook of brain theory and neural networks,3361(10):1995, 1995.

[25] Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel. Gated graph sequenceneural networks. arXiv preprint arXiv:1511.05493, 2015.

[26] M. Minsky and S. A. Papert. Perceptrons: An introduction to computa-tional geometry. MIT press, 2017.

[27] M. Niepert, M. Ahmed, and K. Kutzkov. Learning convolutional neuralnetworks for graphs. In International conference on machine learning,pages 2014–2023, 2016.

[28] X. Qi, R. Liao, J. Jia, S. Fidler, and R. Urtasun. 3d graph neural networksfor rgbd semantic segmentation. In Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, pages 5199–5208, 2017.

[29] J. C. Reijneveld, S. C. Ponten, H. W. Berendse, and C. J. Stam. Theapplication of graph theoretical analysis to complex networks in thebrain. Clinical neurophysiology, 118(11):2317–2331, 2007.

[30] F. Rosenblatt. The perceptron: a probabilistic model for informationstorage and organization in the brain. Psychological review, 65(6):386,1958.

[31] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internalrepresentations by error propagation. Technical report, California UnivSan Diego La Jolla Inst for Cognitive Science, 1985.

[32] T. L. Saaty. The Brain: Unraveling the Mystery of how it Works: theNeural Network Process. Rws Publications, 2019.

[33] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini.The graph neural network model. IEEE Transactions on NeuralNetworks, 20(1):61–80, 2009.

[34] D. Simard, L. Nadeau, and H. Kroger. Fastest learning in small-worldneural networks. Physics Letters A, 336(1):8–15, 2005.

[35] P. Y. Simard, D. Steinkraus, and J. C. Platt. Best practices forconvolutional neural networks applied to visual document analysis. Innull, page 958. IEEE, 2003.

[36] O. Sporns, D. R. Chialvo, M. Kaiser, and C. C. Hilgetag. Organization,development and function of complex brain networks. Trends incognitive sciences, 8(9):418–425, 2004.

[37] O. Sporns and R. Kotter. Motifs in brain networks. PLoS biology,2(11):e369, 2004.

[38] C. J. Stam and J. C. Reijneveld. Graph theoretical analysis of complexnetworks in the brain. Nonlinear Biomedical Physics, 1(1), Jul 2007.

[39] R. Storn and K. Price. Differential evolution–a simple and efficientheuristic for global optimization over continuous spaces. Journal ofglobal optimization, 11(4):341–359, 1997.

[40] S. H. Strogatz. Exploring complex networks. Nature, 410:268–276,Mar. 2001.

[41] H. R. Tizhoosh and L. Pantanowitz. Artificial intelligence and digitalpathology: Challenges and opportunities. Journal of pathology infor-matics, 9, 2018.

[42] L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus. Regularizationof neural networks using dropconnect. In International conference onmachine learning, pages 1058–1066, 2013.

[43] X. F. Wang and G. Chen. Complex networks: small-world, scale-freeand beyond. IEEE circuits and systems magazine, 3(1):6–20, 2003.

[44] G. P. Zhang. Neural networks for classification: a survey. IEEETransactions on Systems, Man, and Cybernetics, Part C (Applications

and Reviews), 30(4):451–462, Nov 2000.

Page 6