Reservoir Computing Methods - e-learning · Recurrent Neural Networks (RNNs)}Feedbacks allows the...

Post on 18-Oct-2020

8 views 0 download

Transcript of Reservoir Computing Methods - e-learning · Recurrent Neural Networks (RNNs)}Feedbacks allows the...

ReservoirComputingMethodsBasicsandRecentAdvances

ClaudioGallicchio

1

ContactInformationClaudio Gallicchio, Ph.D.AssistantProfessorComputational Intelligence and Machine Learning Grouphttp://www.di.unipi.it/groups/ciml/

Dipartimento di Informatica - Universita' di PisaLargo Bruno Pontecorvo 3, 56127 Pisa, Italy– PoloFibonacci

Research onReservoir Computing

ChairoftheIEEETaskForceonRChttps://sites.google.com/view/reservoir-computing-tf/home

web: www.di.unipi.it/~gallicchemail: gallicch@di.unipi.ittel.:+390502213145

2

DynamicalRecurrentModels} Neuralnetworkarchitectureswithfeedbackconnectionsareabletodealwithtemporaldata inanaturalfashion

} Computationisbasedondynamicalsystems

3

dynamical component

feed-forward component

RecurrentNeuralNetworks(RNNs)} Feedbacksallowstherepresentationofthetemporal context inthestate(neuralmemory)

} Discrete-timenon-autonomousdynamicalsystem} Potentiallytheinputhistorycanbemaintainedforarbitraryperiodsoftime

} Theoreticallyverypowerful} Universalapproximationthroughlearning

4

LearningwithRNNs(repetita)} UniversalapproximationofRNNs(e.g.SRN,NARX)throughlearning

} Trainingalgorithmsinvolvesomedownsidesthatyoualreadyknow} Relativelyhighcomputationaltrainingcostsandpotentiallyslowconvergence

} Localminimaoftheerrorfunction(whichisgenerallynon-convex)

} Vanishingofthegradientsandproblemoflearninglong-termdependencies} Alleviatedbygatedrecurrentarchitectures(althoughtrainingismadequitecomplexinthiscase)

5

DynamicalRecurrentNetworkstrainedeasily

Question:} IsitpossibletotrainRNNarchitecturesmoreefficiently?

} Wecanshiftthefocusfromtrainingalgorithmstothestudyofinitializationconditionsandstability oftheinput-drivensystem

} Toensurestabilityofthedynamicalpartwemustimposeacontractivepropertytothesystemdynamics

6

LiquidStateMachines} W.Maas,T.Natschlaeger,H.Markram (2002)

} Originatedfromthestudyofbiologicallyinspiredspikingneurons} Theliquidshouldsatisfyapointwiseseparationproperty} Dynamicsprovidedbyapoolofspikingneuronswithbio-inspiredarch.

7

W. Maass, T. Natschlaeger, and H. Markram, Real-time computing withoutstable states: A new framework for neural computation based on perturbations,Neural Computation. 14(11), 2531–2560, (2002)

Integrate-and-fire

Izhikevich

FractalPredictionMachines} P.Tino,G.Dorffner (2001)

} ContractiveIteratedFunctionSystems} FractalAnalysis

8

Tino, P., Dor®ner, G.: Predicting the future of discrete sequences from fractal representations of the past. Machine Learning 45 (2001) 187-218

EchoStateNetworks} H.Jaeger(2001)

} Controlthespectralpropertiesoftherecurrencematrix} EchoStateProperty

9

Jaeger, H.: The "echo state" approach to analysing and training recurrent neural networks. Technical Report GMD Report 148, German National Research Center for Information Technology (2001)

Jaeger, H., Haas, H.: Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304 (2004) 78-80

𝒅(𝑡)

𝒅 𝑡− 𝑦(𝑡)

ReservoirComputing

10

} Reservoir:untrainednon-linearrecurrenthiddenlayer} Readout:(linear)outputlayer

} Initialize𝐖() and𝐖*randomly

} Scale𝐖* tomeetthecontractive/stabilityproperty

} Drivethenetworkwiththeinputsignal

} Discardaninitialtransient} Trainthereadout

𝐱 t = tanh(𝐖()𝒖 𝑡 +𝐖* 𝐱(t − 1))

𝐲 t = 𝐖678𝒙 𝑡

Verstraeten, David, et al. "An experimentalunification of reservoir computingmethods." Neural networks 20.3 (2007): 391-403.

11

EchoStateNetworks

EchostateNetwork:Architecture

12

Input Space: Reservoir State Space: Output Space:

} Reservoir:untrained,large,sparselyconnected,non-linearlayer

} Readout:trained,linearlayer

EchostateNetwork:Architecture

13

Input Space: Reservoir State Space: Output Space:

Reservoir} Non-linearlyembedtheinputintoahigherdimensionalfeaturespacewhere

theoriginalproblemismorelikelytobesolvedlinearly(Cover’sTh.)} Randomizedbasisexpansioncomputedbyapoolofrandomizedfilters} Providesa“rich”setofinput-drivendynamics} Contextualizeeachnewinputgiventhepreviousstate:memory

Efficient:therecurrentpartisleftcompletelyuntrained

EchostateNetwork:Architecture

14

Input Space: Reservoir State Space: Output Space:

Readout} Computethefeaturesinthereservoirstatespacefortheoutputcomputation

} Typicallyimplementedbyusinglinearmodels

Reservoir:StateComputation

15

} Itisalsousefultoconsidertheiteratedversionofthestatetransitionfunction} thereservoirstateafterthepresentationofanentireinputsequence

} Thereservoirlayerimplementsthestatetransitionfunctionofthedynamicalsystem

EchoStateProperty(ESP)AvalidESNshouldsatisfythe“EchoStateProperty”(ESP)} Def.AnESNsatisfiestheESPwhenever:

} Thestateofthenetworkasymptoticallydependsonlyonthedrivinginputsignal

} Dependenciesontheinitialconditionsareprogressivelylost} Equivalentdefinitions:statecontractivity,stateforgettingandinputforgetting

16

ConditionsfortheESPTheESPcanbeinspectedbycontrollingthealgebraicpropertiesoftherecurrentweightmatrix} Theorem.Ifthemaximumsingularvalueofislessthan1thentheESNsatisfiestheESP.} SufficientconditionfortheESP(contractivedynamicsforeveryinput)

} Theorem. Ifthespectralradiusofisgreaterthan1than(undermildassumptions)theESNdoesnotsatisfytheESP.} NecessaryconditionfortheESP(stabledynamics)

} recall:thespectralradiusisthemaximumamongtheabsolutevaluesoftheeigenvalues

17

ESNInitialization:HowtosetuptheReservoir

18

} Elementsin𝐖() areselectedrandomlyin[−𝑠𝑐𝑎𝑙𝑒(), 𝑠𝑐𝑎𝑙𝑒()]} 𝐖* initializationprocedure:

} Startwitharandomlygeneratedmatrix𝐖*AB)C} Scale𝐖*AB)C tomeettheconditionfortheESP(usually:thenecessaryone)

ESNTraining

19

} Runthenetworkonthewholeinputsequenceandcollectthereservoirstates

} Discardaninitialtransient(washout)} Solvetheleastsquaresproblemdefinedby

s

TrainingtheReadout

20

} On-linetrainingisnotthestandardchoiceforESNs} LeastMeanSquaresistypicallynotsuitable

} Higheigenvaluespread(i.e.largeconditionnumber)ofX

} RecursiveLeastSquaresismoresuitable

} Off-linetrainingisstandardinmostapplications} Closedformsolutionoftheleastsquaresproblembydirectmethods} Moore-Penrosepseudo-inversion

} Possibleregularizationusingrandomnoiseinthestates

} Ridge-regression

} 𝝀𝒓 isaregularizationcoefficient(thehigher,themorethereadoutisregularized)

TrainingtheReadout/2

21

} Multiplereadoutsforthesamereservoir} Solvingmorethan1taskwiththesamereservoirdynamics

} Otherchoicesforthereadout:} Multi-layerPerceptron} SupportVectorMachine} K-NearestNeighbor} …

ESN– AlgorithmicDescription:Training} Initialization

} Win = 2*rand(Nr,Nu) - 1; Win = scale_in * Win;

} Wh = 2*rand(Nr,Nr) - 1; Wh = rho * (Wh / max(abs(eig(Wh)));

} state = zeros(Nr,1);

} Runthereservoirontheinputstream} for t = 1:trainingSteps

state = tanh(Win * u(t) + Wh * state);X(:,end+1) = state;

end

} Discardthewashout} X = X(:,Nwashout+1:end);

} Trainthereadout} Wout = Ytarget(:,Nwashout+1:end)*X’*inv(X*X’+lambda_r*eye(Nr));

} TheESNisnowreadyforoperation(estimations/predictions)

22

ESN– AlgorithmicDescription:OperationPhase} Runthereservoirontheinputstream(testpart)

} for t = testStepsstate = tanh(Win * u(t) + Wh * state);output(:,end+1) = Wout * state;

end

} Note:whenworkingonasingletime-seriesyoudonotneedto} re-initializethestate} discardtheinitialtransient

23

ESNHyper-parameterization&ModelSelectionImplementESNsfollowingagoodpracticeformodelselection(likeforanyotherML/NNmodel)} Carefulselectionofnetwork’shyper-parameters

} reservoirdimension} spectralradius} inputscaling} readoutregularization} …

24

ESNMajorArchitecturalVariants

25

} directconnectionsfromtheinputtothereadout

} feedbackconnectionsfromtheoutputtothereservoir

} mightaffectthestabilityofthenetwork’sdynamics} smallvaluesaretypicallyused

ESNforsequence-to-elementtasks} Thelearningproblemrequiresonesingleoutputforeachinputsequence} Granularityofthetaskisonentiresequences(notontime-steps)

} example:sequenceclassification

26

inputsequence

outputelement

time→

𝒙(𝟏)

𝒙(𝟑)

𝒙(𝟒)𝒙(𝟓)

𝒙(𝟐)

𝒙(𝒔) 𝒚(𝒔)

} Laststate} 𝒙 𝑠 = 𝒙 𝑁N

} Meanstate} 𝒙 𝑠 = O

PQR ∑ 𝒙(𝑡)PQ8TO

} Sumstate} 𝒙 𝑠 = ∑ 𝒙(𝑡)PQ

8TO

𝒔 = [𝒖 𝟏 ,… , 𝒖 𝑵𝒔 ]

LeakyIntegratorESN(LI-ESN)

27

} Useleakyintegratorreservoirunits

} Applyanexponentialmovingaveragetoreservoirstates} low-passfiltertobetterhandleinputsignalsthatchangeslowlywith

respecttothesamplingfrequency

} theleakingrateparameter 𝑎 ∈ 0,1} controlsthespeedofreservoirdynamicsinreactiontotheinput} smallervaluesimplyreservoirthatreactmoreslowlytotheinputchanges} ifa=1 thenstandardESNdynamicsareobtained

28

ExamplesofApplications

ApplicationsofESNs:Examples/1} ESNsformodelingchaotictimeseries} Mackey-Glasstimeseries

29

contraction coefficient reservoir dimension

} forα>16.8thesystemhasachaoticattractor

} mostusedvaluesare17and30

ESNperformanceontheMG17task

ApplicationsofESNs:Examples/2} Forecastingofindoorusermovements

30

Generalizationofpredictiveperformancetounseenenvironments

https://archive.ics.uci.edu/ml/datasets/Indoor+User+Movement+Prediction+from+RSS+dataDatasetisavailableonlineontheUCIrepository

q DeployedWSN:4fixedsensors(anchors)&1sensorwornbytheuser(mobile)

q PredictiftheuserwillchangeroomwhensheisinpositionM

q Input:receivedsignalstrength(RSS)datafromthe4anchors(10dimensionalvectorforeachtimestep,noisydata)

q Target:binaryclassification(changeenvironmentalcontextornot)

http://fp7rubicon.eu/

ApplicationsofESNs:Examples/2} Forecastingofindoorusermovements– Inputdata

31

exampleoftheRSStracesgatheredfromallthe4anchorsintheWSN,fordifferentpossiblemovementpaths

ApplicationsofESNs:Examples/3} HumanActivityRecognition(HAR)andLocalization

32

q Input fromheterogeneoussensorsources(datafusion)q Predictingeventoccurrenceandconfidenceq Highaccuracy ofeventrecognition/indoorlocalization

>90%ontestdataq Effectivenessinlearning avarietyofHAR tasksq Effectivenessintrainingonnew events

ApplicationsofESNs:Examples/4} Robotics

33

q Indoorlocalizationestimationincriticalenvironment(StellaMarisHospital)

q PreciserobotlocalizationestimationusingnoisyRSSIdata(35cm)

q Recalibrationincaseofenvironmentalalterationsorsensormalfunctions

q Input:temporalsequencesofRSSIvalues(10dimensionalvectorforeachtimestep,noisydata)

q Target:temporalsequencesoflaser-basedlocalization(x,y)

ApplicationsofESNs:Examples/7} HumanActivityRecognition

34

q Classificationofhumandailyactivities fromRSSdatageneratedbysensorswornbytheuser

q Input:temporalsequencesofRSSvalues(6dimensionalvectorforeachtimestep,noisydata)

q Target:classificationofhumanactivity(bending,cycling,lying,sitting,standing,walking)

q Extremelygoodaccuracy(≈0,99)andF1score(≈0,96)

q 2ndPrizeat2013EvAAL InternationalCompetition

http://archive.ics.uci.edu/ml/datasets/Activity+Recognition+system+based+on+Multisensor+data+fusion+%28AReM%29DatasetisavailableonlineontheUCIrepository

ApplicationsofESNs:Examples/8

35

Wii Balance Board

BBS

} AutonomousBalanceAssessment} Anunobtrusiveautomaticsystemforbalanceassessmentinelderly} BergBalanceScale(BBS)test:14exercises/items(̴30min.)

} Input:streamofpressuredatagatheredfromthe4cornersNintendoWiiboardduringtheexecutionofjust1(overthe14)BBSexercises

} Target:globalBBSscoreoftheuser(0-56)} TheuseofRNNsallowtoautomaticallyexploittherichnessofthesignal

dynamics oremi

ApplicationsofESNs:Examples/8

36

} AutonomousBalanceAssessment} ExcellentpredictionperformanceusingLI-ESNs

} Practicalexampleofhowperformancecanbeimprovedinareal-worldcase} Byanappropriatedesignofthetaske.g.inclusionofclinicalparametersininput

} Byanappropriatechoicesforthenetworkdesigne.g.byusingaweightsharingapproachontheinput-to-reservoirconnections

LI-ESNmodel TestMAE(BBSpoints)

TestR

standard 4,80± 0,40 0,68

+weight 4,62± 0,30 0,69

LR weightsharing(ws) 4,03± 0,13 0,71

ws+weight 3,80± 0,17 0,76

q Verygoodcomparison

q withrelatedmodels(MLPs,TDNN,RNNs,NARX,…)

q withliteratureapproaches

oremi

ApplicationsofESNs:Examples/9

37

} Phonesrecognitionwithreservoirnetworks} 2-layeredad-hocreservoirarchitecture

} layersfocusondifferentrangesoffrequencies(usingappropriateleakyparameters)andfocusondifferentsub-problems

Triefenbach, Fabian, et al. "Phoneme recognition with large hierarchical reservoirs." Advances in neural information processing systems. 2010.

Triefenbach, Fabian, et al. "Acoustic modeling with hierarchical reservoirs." IEEE Transactions on Audio, Speech, and Language Processing 21.11 (2013): 2439-2450.

38

ExtensionstoStructuredData

LearninginStructuredDomains

39

Sequences Trees Graphs

QSPR analysis of Alkanes

Boiling Point

Predictive Toxicology ChallengeMG -Chaotic Time Series Prediction

{−1, +1}

} Inmanyreal-worldapplicationdomainstheinformationofinterestcanbenaturallyrepresentedbythemeansofstructureddatarepresentations.

} Theproblemsofinterestcanbemodeledasregressionorclassificationtasksonstructureddomains.

LearninginStructuralDomains} RecursiveNeuralNetworksextendtheapplicabilityofRNNmethodologies

tolearningindomainsoftreesandgraphs} Randomizedapproachesenableefficienttrainingandstate-of-theart

performance} EchoStateNetworksextendedtodiscretestructures:

TreeandGraphEchoStateNetworks

} BasicIdea:thereservoirisappliedtoeachnode/vertexoftheinputstructure

[C. Gallicchio, A. Micheli, Priceedings of IJCNN 2010] [C. Gallicchio, A. Micheli, Neurocomputing, 2013]

LearninginStructuredDomains} Thereservoiroperationisgeneralizedfromtemporalsequencestodiscretestructures

} Statetransitionsystemsondiscretetree/graphstructures

41

Standard ESN Tree ESN Graph ESN

TreeESN:Reservoir} Large,sparselyconnected,untrainedlayerofnon-linearrecursiveunits

} Inputdrivendynamicalsystemondiscretetreestructures

42

TreeESN:StatemappingandReadout} StateMappingfunctionfortree-to-elementtasks

} onesingleoutputvector(unstructured)isrequiredforeachinputtree

} example:documentclassification

} ReadoutcomputationandtrainingisasinthecaseofstandardReservoirComputingapproaches} tree-to-tree(isomorphictransductions)} tree-to-element

43

( )x t

Root State Mapping

Mean State Mapping

TreeESN:EchoStateProperty} Therecursivereservoirdynamicscanbeleftuntrainedprovidedthatastabilitypropertyissatisfied

} TreeESP:asymptoticstabilityconditionsontreestructures

} fortwoanyinitialstates,thestatecomputedfortherootoftheinputtreeshouldconvergeforincreasingheightofthetree

} theinfluenceofaperturbationinthelabelofanodewillprogressivelyfadeaway

} SufficientconditionfortheESPfortrees} beingcontractive

} Necessarycondition} beingstable

44

[C. Gallicchio, A. Micheli, Information Sciences, 2019]

degree

TreeESN:Efficiency

45

Computational ComplexityExtremely efficient RC approach: only the linear readout parameters are trained

Encoding Process

For each tree t

• Scales linearly with the number of nodes and the reservoir dimension• The same cost for training and test• Compares well with state of art methods for trees:

• RecNNs: extra cost (time + memory) for gradient computations• Kernel methods: higher cost of encoding (e.g. Quadratic in PT kernels)

number of nodes max degree number of reservoir unitsdegree of connectivity

Output Computation• Depends on the method used (e.g. Direct using SVD or iterative)• The cost of training the linear TreeESN readout is generally inferior to the cost

of training MLPs or SVMs (used in RecNNs and Kernels)

ReservoirinGraphEchoStateNetworks} Input-drivendynamicalsystemondiscretegraphs} Thesamereservoirarchitectureisappliedtoalltheverticesinthestructure

} Stabilityofthestateupdateensuresasolutionevenincaseofcyclicdependenciesamongstatevariables

46

t

ttt

t

t

tttt

v

t

t

t tt̂Standard ESN GraphESN

GraphESN:EchoStateProperty} Astabilityconstraintisrequiredtoachieveusabledynamics

} Foundationalidea:resorttocontractivedynamics} Contractivedynamicsensuretheconvergenceoftheencodingprocess(Banach Th.)

} Enablesaniterativecomputationoftheencodingprocess

} Sufficientcondition} Necessarycondition

47

ResearchonReservoirComputing(Inourgroup)} Applicationstocomplexreal-worldtasks

} NLP,earthquaketime-series,humanmonitoring,…

} GatedReservoirComputingModels} RC-basedanalysisoffullytrainedRNNs} Unsupervisedadaptationofreservoirs

} edgeofstability/chaos} Bayesianoptimization

} DeepEchoStateNetworks} Advancedmathematicalanalysis} ArchitecturalconstructionofhierarchicalRC

} DeepRCforStructures

48

49

EchoStateNetworkDynamics

EchoStateProperty} Assumption:Inputandstatespacesarecompactsets} Areservoirnetworkwhosestateupdateisruledby

satisfiestheESPifinitialconditionsareasymptoticallyforgotten

} Thestatedynamicsprovidesapoolof“echoes”ofthedrivinginput

} Essentially,thisisanstability condition(globalasymptoticstabilityinthesenseofLyapunov)

50

EchoStateProperty:Stability} Whyastableregimeissoimportant?} Anunstablenetworkexhibitssensitivitytoinputperturbations} Twoslightlydifferent(long)inputsequencesdrivethenetworkinto(asymptoticallyvery)differentstates

} Goodfortraining} Thestatevectorstendtobemoreandmorelinearlyseparable(foranygiventask)

} Badforgeneralization:overfitting!} Nogeneralizationabilityifatemporalsequencesimilartooneinthetrainingsetdrivesthenetworkintocompletelydifferentstates

51

ESP:SufficientCondition} ThesufficientconditionfortheESPanalyzesthecaseofcontractive dynamics ofthestatetransitionfunction

} Whateveristhedrivinginputsignal:Ifthesystemiscontractivethenitwillexhibitstability

} Inwhatfollows,weassumestatetransitionfunctionsoftheform:

52

input previous statenew state

input weight matrix recurrent weight matrix

Contractivity

53

Thereservoirstatetransitionfunctionrulestheevolutionofthecorrespondingdynamicalsystem

} Def.ThereservoirhascontractivedynamicswheneveritsstatetransitionfunctionF isLipschitzcontinuouswithconstantC<1

Contractivity andtheESP} Theorem. IfanESNhasacontractivestatetransitionfunctionF(andbounded

statespace),thenitsatisfiestheEchoStateProperty} Assumption:F iscontractivewithparameterC<1

} Giventhiscondition:

} WewanttoshowthattheESPholdstrue:

54

(Contractivity)

(ESP)

Contractivity andtheESP} Theorem. IfanESNhasacontractivestatetransitionfunctionF,thenit

satisfiestheEchoStateProperty} Assumption:F iscontractivewithparameterC<1

55

goes to 0 as n goes to infinity à the ESP holds

Contractivity andReservoirInitialization} IfthereservoirisinitializedtoimplementacontractivemappingthantheESPisguaranteed(inanynorm,foranyinput)

} FormulationofasufficientconditionfortheESP} Assumptions:

} Euclideandistanceasmetricinthestatespace(useL2-norm)} Reservoirunitswithtanh activationfunction(note:squashingnonlinearitiesboundthestatespace)

56

MarkovianNatureofstatespaceorganizations} Contractivedynamicalsystemsarerelatedtosuffix-basedstatespaceorganizations

} Statesassumedincorrespondenceofdifferentinputsequencessharingacommonsuffixareclosetoeachotherproportionallytothelengthofthecommonsuffix} similarsequencesaremappedtoclosestates} differentsequencesaremappedtodifferentstates} similaritiesanddissimilaritiesareintendedinasuffix-basedfashion

} RNNsinitializedwithsmallweights(withcontractivestatetransitionfunction)andboundedstatespaceimplement(approximatearbitrarilywell)definitememorymachines

57

Hammer, B., Tino, P.: Recurrent neural networks with small weights implementdefinite memory machines. Neural Computation 15 (2003) 1897-1929

MarkovianNatureofstatespaceorganizations} MarkovianArchitecturalbiasofRNNs

} recurrentweightsaretypicallyinitializedwithsmallvalues} thisleadstoatypicallycontractiveinitializationofrecurrentdynamics

} IteratedFunctionSystems,fractaltheory,architecturalbiasofRNNs

} RNNsinitializedwithsmallweights(withcontractivestatetransitionfunction)andboundedstatespaceimplement(approximatearbitrarilywell)definitememorymachines

} Thischaracterizationisabias forfullytrainedRNNs:holdsintheearlystagesoflearning

58

Hammer, B., Tino, P.: Recurrent neural networks with small weights implementdefinite memory machines. Neural Computation 15 (2003) 1897-1929

Markovianity andESNs} Usingdynamicalsystemswithcontractivestatetransitionfunctions(inanynorm)impliestheEchoStateProperty(foranyinput)

} ESNsfeaturedbyfixedcontractivedynamics} RelationswiththeuniversalityofRCforboundedmemorycomputation(LSMstheory)

} ESNswithuntrainedcontractivereservoirsarealreadyabletodistinguishinputsequencesonasuffix-basedfashion

} IntheRCframeworkthisisnolongerabias,itisafixedcharacterizationoftheRNNmodel

59

Maass, W., Natschlager, T., Markram, H.: Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation 14 (2002) 2531-2560

Gallicchio C, Micheli A. Architectural and Markovian Factors of Echo State Networks. Neural Networks 2011;24(5):440–456.

WhydoEchoStateNetworkswork?} BecausetheyexploittheMarkovianstatespaceorganization} Thereservoirconstructsahigh-dimensionalMarkovianstatespace

representationoftheinputhistory} Inputsequencessharingacommonsuffixdrivethesystemintoclosestates

} Thestatesareclosetoeachotherproportionallytothelengthofthecommonsuffix

} Asimpleoutput(readout)toolcanthenbesufficienttoseparatethedifferentcases

60

Gallicchio C, Micheli A. Architectural and Markovian Factors of Echo State Networks. Neural Networks 2011;24(5):440–456.

WhendoEchoStateNetworkswork?} WhenthetargetmatchestheMarkovianassumptionbehindthereservoir

statespaceorganization} Markovianity canbeusedtocharacterizeeasy/hardtasksforESNs

Example:easytask

61

+1

+1

-1

-1

WhendoEchoStateNetworkswork?} WhenthetargetmatchestheMarkovianassumptionbehindthereservoir

statespaceorganization} Markovianity canbeusedtocharacterizeeasy/hardtasksforESNs

Example:hardtask

62

+1

-1

+1

-1

?

ESP:NecessaryCondition} Investigating thestability ofreservoirdynamicsfromadynamicalsystemperspective

} Theorem. IfanESNhasunstabledynamicsaroundthezerostateandthezerosequenceisanadmissibleinput,thentheESPisnotsatisfied.

} Approachthisstudybylinearizingthestatetransitionfunction

63

Jacobian matrix

ESP:NecessaryCondition} Linearizationaroundthezerostateandfornullinput

} Remember:

} TheJacobianwithtanh neuronsisgivenby

64

ESP:NecessaryCondition} Linearizationaroundthezerostateandfornullinput

} Remember:

} TheJacobianwithtanh neuronsisgivenby

65

Null input assumption

ESP:NecessaryCondition} Thelinearizedsystemnowreads:

} 0isafixedpoint.Isitstable?} LineardynamicalsystemstheorytellsusthatIf𝜌 𝐖* < 1 thenthefixedpointisstable

} Otherwise:0isnotstableifwestartfromastatenear0 andwedrivethenetworkwitha(infinite-length)nullsequencewedonotendupin0

} Thenullsequenceisacounter-example:theESPdoesnothold!} Thereareatleasttwodifferentorbitsresultingfromthesameinputsequence

66

ESP:NecessaryCondition} Asufficientcondition(underourassumptions)fortheabsenceoftheESPisthat𝜌 𝐖* ≥ 1

} Hence,anecessaryconditionfortheESPisthat𝜌 𝐖* < 1

} Ingeneral:𝜌 𝐖* ≤ 𝐖* 2

} Typically,inapplications:} Thesufficientconditionisoftentoorestrictive} ThenecessaryconditionfortheESPisoftenusedtoscalethe

recurrentweightmatrix(e.g.toavalueofthespectralradiusof0.9)

} However,notethatscalingthespectralradiusbelow1inpresenceofdrivinginputisneithersufficientnornecessaryforensuringechostates

67

ESPIndex:aconcreteperspective} Drivinginputisnotproperlytakenintoaccountinconventionalreservoirinitializationstrategies

} Idea:calculateaveragedeviationsofreservoirstateorbitsfromdifferentinitialconditionsandundertheinfluenceofthesamedrivingexternalinput

68

ESPIndex:aconcreteperspective

• The set of configurations that satisfy the ESPin real cases are well beyond thosecommonly adopted in ESN practice

• A large portion of "good" reservoirs areusually neglected in common practice

69

C. Gallicchio, "Chasing the Echo State Property", ESANN 2019.

70

AdvancesonEchoStateNetworks

ResearchonEchoStateNetworks} TheresearchonESNsfollowstwomajorcomplementaryobjectives:} StudytheintrinsicpropertiesofRNNs,takingasidetheaspectsrelatedtotrainingoftherecurrentconnections

} DevelopefficientlytrainedRNNs

} Advances…} Theoreticalanalysis} Qualityofreservoirdynamics} Architecturalstudies} DeepEchoStateNetworks} ReservoirComputingforStructures

71

EchoStateProperty:Advances} MuchofthetheoreticaladvancesinthestudyofESNsaimtoestablishsimpleconditionsforreservoirinitialization

} Focusofthetheoreticalinvestigationsisshiftedtothestudyofstabilityconstraint

} Analysisofthesystemstabilitygiventheinput

} Non-autonomousdynamicalsystems

} MeanFieldTheory

} LocalLyapunovexponents

72

G. Manjunath and H. Jaeger. Echo state property linked to an input: Exploring a fundamental characteristic of recurrent neural networks. Neural computation, 25(3):671–696, 2013.

M. Massar and S. Massar. Mean-field theory of echo state networks. Physical Review E,87(4):042809, 2013.

G. Wainrib and M.N. Galtier. A local echo state property through the largest lyapunovexponent. Neural Networks, 76:39–45, 2016.

I.B. Yildiz, H. Jaeger, and S.J. Kiebel. Re-visiting the echo state property. Neural networks, 35:1–9, 2012.

ApplicationstoReal-worldProblems} Successfulapplicationsinseveralreal-worldproblems

} Chaotictime-seriesmodeling} Non-linearsystemidentification} Speechrecognition} Financialforecasting} Bio-medicalapplications} Robotlocalization&control} …

} Highdimensionalreservoirsareoftenneededtoachieveexcellentperformanceincomplexreal-worldtasks

} Question:canwecombinetrainingefficiencywithcompact (i.e.smallsize)ESNs?

QualityofReservoirDynamics} Howtoestablishthequalityofareservoir?} IfIcanfindasuitablewaytocharacterizehowgoodareservoirisIcantrytooptimizeit

} Entropyofrecurrentunitsactivations} UnsupervisedadaptationofreservoirsusingIntrinsicPlasticity

} Studytheshort-termmemoryabilityofthesystem} MemoryCapacityandrelationstolinearity

} Edgeofstability/chaos:reservoirdynamicsattheborderofstability} Recurrentsystemsclosetoinstabilityshowoptimalperformanceswheneverthetaskathandrequireslongshort-termmemory

74

Short-termMemoryCapacity} Anaspectofgreatimportanceinthestudyofdynamicalsystems istheanalysisoftheirmemoryabilities

} Jaegerintroducedalearningtask,calledMemoryCapacity(MC)toquantifyit

} Trainindividualoutputunitstorecallincreasinglydelayedversionsofaunivariatei.i.d.inputsignal

75

Jaeger, Herbert. Short term memory in echo state networks. Vol. 5. GMD-Forschungszentrum Informationstechnik, 2001.

MCisthesumofsquaredcorrelationcoefficientsbetweenthedelayedsignalsandtheoutputs

Short-termMemoryCapacity} Forgettingcurvestostudythememorystructure} Plotthesquaredcorrelation(Y-axis,i.e.detCoeff)withrespecttoindividualdelays(X-axis)

76

linear reservoir units tanh reservoir units

Short-termMemoryCapacitySomefundamentaltheoreticalresults:} TheMCofanetworkwithNrecurrentunitsisupper-boundedbyN} MC≤N} ItisimpossibletotrainanESNontaskswhichrequireunbounded-timememory

} Linearreservoirscanachievethemaximumbound} e.g.sufficientcondition:thematrix𝐖* O𝐰()𝐖* a𝐰() …𝐖*P𝐰() hasfullrank} example:unitaryrecurrentmatrices(i.e.orthogonalmatricesintherealcase)

} MemoryversusNon-linearitydilemma} Linearreservoirsarefeaturedbylongershort-termmemories,butnon-linear

reservoirsarerequiredtosolvecomplexreal-worldproblems....

} MemoryCapacityvsPredictiveCapacity

77

ArchitecturalSetup} Howtoconstruct“better”reservoirsthanjustrandomreservoirs?

} CriticalEchoStateNetworks:relevanceoforthogonalrecurrenceweightmatrices(e.g.permutationmatrices)

78

[M.A. Hajnal, A. Lorincz, 2006]

CyclicReservoirs DelayLineReservoirs[A. Rodan, P. Tino, 2011][T. Strauss et al., 2012][J. Boedecker et al., 2009]

[M. Cernansky, P. Tino 2008][A. Rodan, P. Tino 2012]

EdgeofStability} Reservoirdynamicalregimeclosetothetransitionbetweenstableandunstabledynamics} E.g.,studyofLyapunovexponents} λ=0identifiesthetransitionbetween(locally)stable(λ<0)andunstable(λ>0)dynamics

79

[J. Boedecker, O. Obst, J.T. Lizier, N.M. Mayer, and M. Asada. Information processing in echostate networks at the edge of chaos. Theory in Biosciences, 131(3):205–213, 2012.]

DeepNeuralNetworks

80

DeepLearningisanattractiveresearcharea} Hierarchyofmanynon-linearmodels} Abilitytolearndatarepresentationsatdifferent(higher)levelsofabstraction

DeepNeuralNetworks(DeepNNs)} Feed-forwardhierarchyofmultiplehiddenlayersofnon-linearunits

} Impressiveperformance inreal-worldproblems(especiallyinthecognitivearea)

} Remember:deeplearninghasastrongbiologicalplausibility

DeepRecurrentNeuralNetworksExtensionofthedeeplearningmethodologytotemporalprocessing.Aimat naturallycapturetemporal feature representations atdifferenttime-scales

} Text,speech,languageprocessing} Multi-layeredprocessingoftemporalinformationwithfeedbackshasastrongbiologicalplausibility

TheanalysisofdeepRNNsisstillyoung} DeepReservoirComputing:Investigatetheactualroleoflayeringindeeprecurrentarchitectures

} Stability:characterizethedynamicsofhierarchicallyorganizedrecurrentmodels

81

DeepEchoStateNetwork

} Whatistheintrinsicrole oflayering inrecurrentarchitectures?

} Developnovelefficientapproachestoexploit} Multipletime-scalesrepresentations

} Extremeefficiencyoftraining

82

C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing, 2017

untrained

DeepRNNArchitecture:TheroleofLayering

83

ConstraintstothearchitectureofafullyconnectedRNN,by:} Removingconnectionfrominputtohigherlayers

} Removingconnectionsfromhigherlayerstolowerones

} Removingconnectionstolayersatlevelshigherthan+1

LessweightstostorethanafullyconnectedRNN

DeepESN:OutputComputation

84

} Thereadoutcanmodulatethe(qualitativelydifferent)temporalfeaturesdevelopedatthedifferentlayers

DeepESN:ArchitectureandDynamics

85

first layer

l-th layer (l>1)

Each layer has its own:- leaky integration

constant - Input scaling- Spectral radius- Inter-layer scaling

DeepESN:ArchitectureandDynamics

86

first layer

l-th layer (l>1)

Each layer has its own:- leaky integration

constant - Input scaling- Spectral radius- Inter-layer scaling

The recurrent part of the system is hierarchically structured.Interestingly, this naturally entails a structure into the developed system dynamics

* seminar topic

IntrinsicallyRicherDynamicsLayeringinRNN:aconvenientwayofarchitecturalsetup} Multipletime-scalesrepresentations} Richerdynamicsclosertotheedgeofstability} Longershort-timememory

DeepESN:HierarchicalTemporalFeatures

88

Structuredrepresentationoftemporaldatathroughthedeeparchitecture

EmpiricalInvestigations} Effectsofinputperturbationslastslongerinhigherlayers

} Multipletime-scalesrepresentation

} Orderedalongthenetwork’shierarchy

C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing, 2017

DeepESN:HierarchicalTemporalFeatures

89

Structuredrepresentationoftemporaldatathroughthedeeparchitecture

FrequencyAnalysis } DiversifiedmagnitudesofFFTcomponents

} Multiplefrequencyrepresentation

} Orderedalongthenetwork’shierarchy

} Higherlayerstendtofocusonlowerfrequencies[C. Gallicchio, A. Micheli, L. Pedrelli, WIRN 2017]

layer 1 layer 4

layer 7 layer 10

DeepESN:MathematicalBackground

90

Structuredrepresentationoftemporaldatathroughthedeeparchitecture

TheoreticalAnalysis} Higher layersintrinsicallyimplementless contractive dynamics

} EchoStatePropertyforDeepESNs

} Deepernetworksnaturallydevelopricher dynamics,closertotheedge of stability

[C. Gallicchio, A. Micheli. Cognitive Computation (2017).]

[C. Gallicchio, A. Micheli, L. Silvestri. Neurocomputing 2018.]

DeepESN:PerformanceinApplications

91

Formidabletrade-offbetweenperformanceandcomputationaltime

C. Gallicchio, A. Micheli, L. Pedrelli, "Comparison between DeepESNs and gatedRNNs on multivariate time-series prediction", ESANN 2019.

DeepTreeEchoStateNetworks

92

} DeepTreeEchoStateNetworks} Untrainedmulti-layeredrecursiveneuralnetwork

Gallicchio, Micheli, IJCNN, 2018.Gallicchio, Micheli, Information Sciences, 2019.

DeepTreeEchoStateNetworks:Unfolding

93

DeepTreeEchoStateNetworks:Advantages

94

} Effectiveinapplications

} Extremelyefficient} Layeredrecursivearchitecture} Shallowcase

Conclusions} ReservoirComputing:paradigmforefficientmodelingofRNNs

} Reservoir:non-lineardynamiccomponent,untrainedaftercontractiveinitialization

} Readout:linearfeed-forwardcomponent,trained} Easy toimplement,fast totrain} Markovianflavourofreservoirstatedynamics} SuccessfulapplicationsRecentextensionstoward:

} DeepLearningarchitecture} StructuredDomains

95

References} JaegerH.The“echostate”approachtoanalysing andtrainingrecurrent

neuralnetworks- withanerratumnote.Tech.rep.GMD- GermanNationalResearchInstituteforComputerScience,Tech.Rep.2001.

} JaegerH,HaasH.Harnessingnonlinearity:predictingchaoticsystemsandsavingenergyinwirelesscommunication.Science.2004;304(5667):78–80.

} Gallicchio C,Micheli A.ArchitecturalandMarkovianFactorsofEchoStateNetworks.NeuralNetworks2011;24(5):440–456.

} Lukoševičius,Mantas,andHerbertJaeger."Reservoircomputingapproachestorecurrentneuralnetworktraining." ComputerScienceReview 3.3(2009):127-149.

} Lukoševičius,Mantas."Apracticalguidetoapplyingechostatenetworks." Neuralnetworks:Tricksofthetrade.SpringerBerlinHeidelberg,2012.659-686.

96

References} Gallicchio,Claudio,AlessioMicheli,andLucaPedrelli."Deepreservoir

computing:Acriticalexperimentalanalysis." Neurocomputing 268(2017):87-99.

} Gallicchio,Claudio,AlessioMicheli,andLucaPedrelli."Designofdeepechostatenetworks." NeuralNetworks 108(2018):33-47.

} Gallicchio,Claudio,andAlessioMicheli."Deepechostatenetwork(deepesn):Abriefsurvey." arXiv preprintarXiv:1712.04323 (2017).

} Gallicchio,Claudio,andAlessioMicheli."DeepReservoirNeuralNetworksforTrees." InformationSciences 480(2019):174-193.

} Gallicchio,Claudio,andAlessioMicheli."Graphechostatenetworks." The2010InternationalJointConferenceonNeuralNetworks(IJCNN).IEEE,2010.

97