Reservoir Computing Methods - e-learning · Recurrent Neural Networks (RNNs)}Feedbacks allows the...

ReservoirComputingMethodsBasicsandRecentAdvances

ClaudioGallicchio

ContactInformationClaudio Gallicchio, Ph.D.AssistantProfessorComputational Intelligence and Machine Learning Grouphttp://www.di.unipi.it/groups/ciml/

Dipartimento di Informatica - Universita' di PisaLargo Bruno Pontecorvo 3, 56127 Pisa, Italy– PoloFibonacci

Research onReservoir Computing

ChairoftheIEEETaskForceonRChttps://sites.google.com/view/reservoir-computing-tf/home

web: www.di.unipi.it/~gallicchemail: gallicch@di.unipi.ittel.:+390502213145

DynamicalRecurrentModels} Neuralnetworkarchitectureswithfeedbackconnectionsareabletodealwithtemporaldata inanaturalfashion

} Computationisbasedondynamicalsystems

dynamical component

feed-forward component

RecurrentNeuralNetworks(RNNs)} Feedbacksallowstherepresentationofthetemporal context inthestate(neuralmemory)

} Discrete-timenon-autonomousdynamicalsystem} Potentiallytheinputhistorycanbemaintainedforarbitraryperiodsoftime

} Theoreticallyverypowerful} Universalapproximationthroughlearning

LearningwithRNNs(repetita)} UniversalapproximationofRNNs(e.g.SRN,NARX)throughlearning

} Trainingalgorithmsinvolvesomedownsidesthatyoualreadyknow} Relativelyhighcomputationaltrainingcostsandpotentiallyslowconvergence

} Localminimaoftheerrorfunction(whichisgenerallynon-convex)

} Vanishingofthegradientsandproblemoflearninglong-termdependencies} Alleviatedbygatedrecurrentarchitectures(althoughtrainingismadequitecomplexinthiscase)

DynamicalRecurrentNetworkstrainedeasily

Question:} IsitpossibletotrainRNNarchitecturesmoreefficiently?

} Wecanshiftthefocusfromtrainingalgorithmstothestudyofinitializationconditionsandstability oftheinput-drivensystem

} Toensurestabilityofthedynamicalpartwemustimposeacontractivepropertytothesystemdynamics

LiquidStateMachines} W.Maas,T.Natschlaeger,H.Markram (2002)

} Originatedfromthestudyofbiologicallyinspiredspikingneurons} Theliquidshouldsatisfyapointwiseseparationproperty} Dynamicsprovidedbyapoolofspikingneuronswithbio-inspiredarch.

W. Maass, T. Natschlaeger, and H. Markram, Real-time computing withoutstable states: A new framework for neural computation based on perturbations,Neural Computation. 14(11), 2531–2560, (2002)

Integrate-and-fire

Izhikevich

FractalPredictionMachines} P.Tino,G.Dorffner (2001)

} ContractiveIteratedFunctionSystems} FractalAnalysis

Tino, P., Dor®ner, G.: Predicting the future of discrete sequences from fractal representations of the past. Machine Learning 45 (2001) 187-218

EchoStateNetworks} H.Jaeger(2001)

} Controlthespectralpropertiesoftherecurrencematrix} EchoStateProperty

Jaeger, H.: The "echo state" approach to analysing and training recurrent neural networks. Technical Report GMD Report 148, German National Research Center for Information Technology (2001)

Jaeger, H., Haas, H.: Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304 (2004) 78-80

𝒅(𝑡)

𝒅 𝑡− 𝑦(𝑡)

ReservoirComputing

} Reservoir:untrainednon-linearrecurrenthiddenlayer} Readout:(linear)outputlayer

} Initialize𝐖() and𝐖*randomly

} Scale𝐖* tomeetthecontractive/stabilityproperty

} Drivethenetworkwiththeinputsignal

} Discardaninitialtransient} Trainthereadout

𝐱 t = tanh(𝐖()𝒖 𝑡 +𝐖* 𝐱(t − 1))

𝐲 t = 𝐖678𝒙 𝑡

Verstraeten, David, et al. "An experimentalunification of reservoir computingmethods." Neural networks 20.3 (2007): 391-403.

EchoStateNetworks

EchostateNetwork:Architecture

Input Space: Reservoir State Space: Output Space:

} Reservoir:untrained,large,sparselyconnected,non-linearlayer

} Readout:trained,linearlayer

Reservoir} Non-linearlyembedtheinputintoahigherdimensionalfeaturespacewhere

theoriginalproblemismorelikelytobesolvedlinearly(Cover’sTh.)} Randomizedbasisexpansioncomputedbyapoolofrandomizedfilters} Providesa“rich”setofinput-drivendynamics} Contextualizeeachnewinputgiventhepreviousstate:memory

Efficient:therecurrentpartisleftcompletelyuntrained

Readout} Computethefeaturesinthereservoirstatespacefortheoutputcomputation

} Typicallyimplementedbyusinglinearmodels

Reservoir:StateComputation

} Itisalsousefultoconsidertheiteratedversionofthestatetransitionfunction} thereservoirstateafterthepresentationofanentireinputsequence

} Thereservoirlayerimplementsthestatetransitionfunctionofthedynamicalsystem

EchoStateProperty(ESP)AvalidESNshouldsatisfythe“EchoStateProperty”(ESP)} Def.AnESNsatisfiestheESPwhenever:

} Thestateofthenetworkasymptoticallydependsonlyonthedrivinginputsignal

} Dependenciesontheinitialconditionsareprogressivelylost} Equivalentdefinitions:statecontractivity,stateforgettingandinputforgetting

ConditionsfortheESPTheESPcanbeinspectedbycontrollingthealgebraicpropertiesoftherecurrentweightmatrix} Theorem.Ifthemaximumsingularvalueofislessthan1thentheESNsatisfiestheESP.} SufficientconditionfortheESP(contractivedynamicsforeveryinput)

} Theorem. Ifthespectralradiusofisgreaterthan1than(undermildassumptions)theESNdoesnotsatisfytheESP.} NecessaryconditionfortheESP(stabledynamics)

} recall:thespectralradiusisthemaximumamongtheabsolutevaluesoftheeigenvalues

ESNInitialization:HowtosetuptheReservoir

} Elementsin𝐖() areselectedrandomlyin[−𝑠𝑐𝑎𝑙𝑒(), 𝑠𝑐𝑎𝑙𝑒()]} 𝐖* initializationprocedure:

} Startwitharandomlygeneratedmatrix𝐖*AB)C} Scale𝐖*AB)C tomeettheconditionfortheESP(usually:thenecessaryone)

ESNTraining

} Runthenetworkonthewholeinputsequenceandcollectthereservoirstates

} Discardaninitialtransient(washout)} Solvetheleastsquaresproblemdefinedby

TrainingtheReadout

} On-linetrainingisnotthestandardchoiceforESNs} LeastMeanSquaresistypicallynotsuitable

} Higheigenvaluespread(i.e.largeconditionnumber)ofX

} RecursiveLeastSquaresismoresuitable

} Off-linetrainingisstandardinmostapplications} Closedformsolutionoftheleastsquaresproblembydirectmethods} Moore-Penrosepseudo-inversion

} Possibleregularizationusingrandomnoiseinthestates

} Ridge-regression

} 𝝀𝒓 isaregularizationcoefficient(thehigher,themorethereadoutisregularized)

TrainingtheReadout/2

} Multiplereadoutsforthesamereservoir} Solvingmorethan1taskwiththesamereservoirdynamics

} Otherchoicesforthereadout:} Multi-layerPerceptron} SupportVectorMachine} K-NearestNeighbor} …

ESN– AlgorithmicDescription:Training} Initialization

} Win = 2*rand(Nr,Nu) - 1; Win = scale_in * Win;

} Wh = 2*rand(Nr,Nr) - 1; Wh = rho * (Wh / max(abs(eig(Wh)));

} state = zeros(Nr,1);

} Runthereservoirontheinputstream} for t = 1:trainingSteps

state = tanh(Win * u(t) + Wh * state);X(:,end+1) = state;

} Discardthewashout} X = X(:,Nwashout+1:end);

} Trainthereadout} Wout = Ytarget(:,Nwashout+1:end)*X’*inv(X*X’+lambda_r*eye(Nr));

} TheESNisnowreadyforoperation(estimations/predictions)

ESN– AlgorithmicDescription:OperationPhase} Runthereservoirontheinputstream(testpart)

} for t = testStepsstate = tanh(Win * u(t) + Wh * state);output(:,end+1) = Wout * state;

} Note:whenworkingonasingletime-seriesyoudonotneedto} re-initializethestate} discardtheinitialtransient

ESNHyper-parameterization&ModelSelectionImplementESNsfollowingagoodpracticeformodelselection(likeforanyotherML/NNmodel)} Carefulselectionofnetwork’shyper-parameters

} reservoirdimension} spectralradius} inputscaling} readoutregularization} …

ESNMajorArchitecturalVariants

} directconnectionsfromtheinputtothereadout

} feedbackconnectionsfromtheoutputtothereservoir

} mightaffectthestabilityofthenetwork’sdynamics} smallvaluesaretypicallyused

ESNforsequence-to-elementtasks} Thelearningproblemrequiresonesingleoutputforeachinputsequence} Granularityofthetaskisonentiresequences(notontime-steps)

} example:sequenceclassification

inputsequence

outputelement

time→

𝒙(𝟏)

𝒙(𝟑)

𝒙(𝟒)𝒙(𝟓)

𝒙(𝟐)

𝒙(𝒔) 𝒚(𝒔)

} Laststate} 𝒙 𝑠 = 𝒙 𝑁N

} Meanstate} 𝒙 𝑠 = O

PQR ∑ 𝒙(𝑡)PQ8TO

} Sumstate} 𝒙 𝑠 = ∑ 𝒙(𝑡)PQ

𝒔 = [𝒖 𝟏 ,… , 𝒖 𝑵𝒔 ]

LeakyIntegratorESN(LI-ESN)

} Useleakyintegratorreservoirunits

} Applyanexponentialmovingaveragetoreservoirstates} low-passfiltertobetterhandleinputsignalsthatchangeslowlywith

respecttothesamplingfrequency

} theleakingrateparameter 𝑎 ∈ 0,1} controlsthespeedofreservoirdynamicsinreactiontotheinput} smallervaluesimplyreservoirthatreactmoreslowlytotheinputchanges} ifa=1 thenstandardESNdynamicsareobtained

ExamplesofApplications

ApplicationsofESNs:Examples/1} ESNsformodelingchaotictimeseries} Mackey-Glasstimeseries

contraction coefficient reservoir dimension

} forα>16.8thesystemhasachaoticattractor

} mostusedvaluesare17and30

ESNperformanceontheMG17task

ApplicationsofESNs:Examples/2} Forecastingofindoorusermovements

Generalizationofpredictiveperformancetounseenenvironments

https://archive.ics.uci.edu/ml/datasets/Indoor+User+Movement+Prediction+from+RSS+dataDatasetisavailableonlineontheUCIrepository

q DeployedWSN:4fixedsensors(anchors)&1sensorwornbytheuser(mobile)

q PredictiftheuserwillchangeroomwhensheisinpositionM

q Input:receivedsignalstrength(RSS)datafromthe4anchors(10dimensionalvectorforeachtimestep,noisydata)

q Target:binaryclassification(changeenvironmentalcontextornot)

http://fp7rubicon.eu/

ApplicationsofESNs:Examples/2} Forecastingofindoorusermovements– Inputdata

exampleoftheRSStracesgatheredfromallthe4anchorsintheWSN,fordifferentpossiblemovementpaths

ApplicationsofESNs:Examples/3} HumanActivityRecognition(HAR)andLocalization

q Input fromheterogeneoussensorsources(datafusion)q Predictingeventoccurrenceandconfidenceq Highaccuracy ofeventrecognition/indoorlocalization

>90%ontestdataq Effectivenessinlearning avarietyofHAR tasksq Effectivenessintrainingonnew events

ApplicationsofESNs:Examples/4} Robotics

q Indoorlocalizationestimationincriticalenvironment(StellaMarisHospital)

q PreciserobotlocalizationestimationusingnoisyRSSIdata(35cm)

q Recalibrationincaseofenvironmentalalterationsorsensormalfunctions

q Input:temporalsequencesofRSSIvalues(10dimensionalvectorforeachtimestep,noisydata)

q Target:temporalsequencesoflaser-basedlocalization(x,y)

ApplicationsofESNs:Examples/7} HumanActivityRecognition

q Classificationofhumandailyactivities fromRSSdatageneratedbysensorswornbytheuser

q Input:temporalsequencesofRSSvalues(6dimensionalvectorforeachtimestep,noisydata)

q Target:classificationofhumanactivity(bending,cycling,lying,sitting,standing,walking)

q Extremelygoodaccuracy(≈0,99)andF1score(≈0,96)

q 2ndPrizeat2013EvAAL InternationalCompetition

http://archive.ics.uci.edu/ml/datasets/Activity+Recognition+system+based+on+Multisensor+data+fusion+%28AReM%29DatasetisavailableonlineontheUCIrepository

ApplicationsofESNs:Examples/8

Wii Balance Board

} AutonomousBalanceAssessment} Anunobtrusiveautomaticsystemforbalanceassessmentinelderly} BergBalanceScale(BBS)test:14exercises/items(̴30min.)

} Input:streamofpressuredatagatheredfromthe4cornersNintendoWiiboardduringtheexecutionofjust1(overthe14)BBSexercises

} Target:globalBBSscoreoftheuser(0-56)} TheuseofRNNsallowtoautomaticallyexploittherichnessofthesignal

dynamics oremi

} AutonomousBalanceAssessment} ExcellentpredictionperformanceusingLI-ESNs

} Practicalexampleofhowperformancecanbeimprovedinareal-worldcase} Byanappropriatedesignofthetaske.g.inclusionofclinicalparametersininput

} Byanappropriatechoicesforthenetworkdesigne.g.byusingaweightsharingapproachontheinput-to-reservoirconnections

LI-ESNmodel TestMAE(BBSpoints)

standard 4,80± 0,40 0,68

+weight 4,62± 0,30 0,69

LR weightsharing(ws) 4,03± 0,13 0,71

ws+weight 3,80± 0,17 0,76

q Verygoodcomparison

q withrelatedmodels(MLPs,TDNN,RNNs,NARX,…)

q withliteratureapproaches

} Phonesrecognitionwithreservoirnetworks} 2-layeredad-hocreservoirarchitecture

} layersfocusondifferentrangesoffrequencies(usingappropriateleakyparameters)andfocusondifferentsub-problems

Triefenbach, Fabian, et al. "Phoneme recognition with large hierarchical reservoirs." Advances in neural information processing systems. 2010.

Triefenbach, Fabian, et al. "Acoustic modeling with hierarchical reservoirs." IEEE Transactions on Audio, Speech, and Language Processing 21.11 (2013): 2439-2450.

ExtensionstoStructuredData

LearninginStructuredDomains

Sequences Trees Graphs

QSPR analysis of Alkanes

Boiling Point

Predictive Toxicology ChallengeMG -Chaotic Time Series Prediction

{−1, +1}

} Inmanyreal-worldapplicationdomainstheinformationofinterestcanbenaturallyrepresentedbythemeansofstructureddatarepresentations.

} Theproblemsofinterestcanbemodeledasregressionorclassificationtasksonstructureddomains.

LearninginStructuralDomains} RecursiveNeuralNetworksextendtheapplicabilityofRNNmethodologies

tolearningindomainsoftreesandgraphs} Randomizedapproachesenableefficienttrainingandstate-of-theart

performance} EchoStateNetworksextendedtodiscretestructures:

TreeandGraphEchoStateNetworks

} BasicIdea:thereservoirisappliedtoeachnode/vertexoftheinputstructure

[C. Gallicchio, A. Micheli, Priceedings of IJCNN 2010] [C. Gallicchio, A. Micheli, Neurocomputing, 2013]

LearninginStructuredDomains} Thereservoiroperationisgeneralizedfromtemporalsequencestodiscretestructures

} Statetransitionsystemsondiscretetree/graphstructures

Standard ESN Tree ESN Graph ESN

TreeESN:Reservoir} Large,sparselyconnected,untrainedlayerofnon-linearrecursiveunits

} Inputdrivendynamicalsystemondiscretetreestructures

TreeESN:StatemappingandReadout} StateMappingfunctionfortree-to-elementtasks

} onesingleoutputvector(unstructured)isrequiredforeachinputtree

} example:documentclassification

} ReadoutcomputationandtrainingisasinthecaseofstandardReservoirComputingapproaches} tree-to-tree(isomorphictransductions)} tree-to-element

( )x t

Root State Mapping

Mean State Mapping

TreeESN:EchoStateProperty} Therecursivereservoirdynamicscanbeleftuntrainedprovidedthatastabilitypropertyissatisfied

} TreeESP:asymptoticstabilityconditionsontreestructures

} fortwoanyinitialstates,thestatecomputedfortherootoftheinputtreeshouldconvergeforincreasingheightofthetree

} theinfluenceofaperturbationinthelabelofanodewillprogressivelyfadeaway

} SufficientconditionfortheESPfortrees} beingcontractive

} Necessarycondition} beingstable

[C. Gallicchio, A. Micheli, Information Sciences, 2019]

degree

TreeESN:Efficiency

Computational ComplexityExtremely efficient RC approach: only the linear readout parameters are trained

Encoding Process

For each tree t

• Scales linearly with the number of nodes and the reservoir dimension• The same cost for training and test• Compares well with state of art methods for trees:

• RecNNs: extra cost (time + memory) for gradient computations• Kernel methods: higher cost of encoding (e.g. Quadratic in PT kernels)

number of nodes max degree number of reservoir unitsdegree of connectivity

Output Computation• Depends on the method used (e.g. Direct using SVD or iterative)• The cost of training the linear TreeESN readout is generally inferior to the cost

of training MLPs or SVMs (used in RecNNs and Kernels)

ReservoirinGraphEchoStateNetworks} Input-drivendynamicalsystemondiscretegraphs} Thesamereservoirarchitectureisappliedtoalltheverticesinthestructure

} Stabilityofthestateupdateensuresasolutionevenincaseofcyclicdependenciesamongstatevariables

t tt̂Standard ESN GraphESN

GraphESN:EchoStateProperty} Astabilityconstraintisrequiredtoachieveusabledynamics

} Foundationalidea:resorttocontractivedynamics} Contractivedynamicsensuretheconvergenceoftheencodingprocess(Banach Th.)

} Enablesaniterativecomputationoftheencodingprocess

} Sufficientcondition} Necessarycondition

ResearchonReservoirComputing(Inourgroup)} Applicationstocomplexreal-worldtasks

} NLP,earthquaketime-series,humanmonitoring,…

} GatedReservoirComputingModels} RC-basedanalysisoffullytrainedRNNs} Unsupervisedadaptationofreservoirs

} edgeofstability/chaos} Bayesianoptimization

} DeepEchoStateNetworks} Advancedmathematicalanalysis} ArchitecturalconstructionofhierarchicalRC

} DeepRCforStructures

EchoStateNetworkDynamics

EchoStateProperty} Assumption:Inputandstatespacesarecompactsets} Areservoirnetworkwhosestateupdateisruledby

satisfiestheESPifinitialconditionsareasymptoticallyforgotten

} Thestatedynamicsprovidesapoolof“echoes”ofthedrivinginput

} Essentially,thisisanstability condition(globalasymptoticstabilityinthesenseofLyapunov)

EchoStateProperty:Stability} Whyastableregimeissoimportant?} Anunstablenetworkexhibitssensitivitytoinputperturbations} Twoslightlydifferent(long)inputsequencesdrivethenetworkinto(asymptoticallyvery)differentstates

} Goodfortraining} Thestatevectorstendtobemoreandmorelinearlyseparable(foranygiventask)

} Badforgeneralization:overfitting!} Nogeneralizationabilityifatemporalsequencesimilartooneinthetrainingsetdrivesthenetworkintocompletelydifferentstates

ESP:SufficientCondition} ThesufficientconditionfortheESPanalyzesthecaseofcontractive dynamics ofthestatetransitionfunction

} Whateveristhedrivinginputsignal:Ifthesystemiscontractivethenitwillexhibitstability

} Inwhatfollows,weassumestatetransitionfunctionsoftheform:

input previous statenew state

input weight matrix recurrent weight matrix

Contractivity

Thereservoirstatetransitionfunctionrulestheevolutionofthecorrespondingdynamicalsystem

} Def.ThereservoirhascontractivedynamicswheneveritsstatetransitionfunctionF isLipschitzcontinuouswithconstantC<1

Contractivity andtheESP} Theorem. IfanESNhasacontractivestatetransitionfunctionF(andbounded

statespace),thenitsatisfiestheEchoStateProperty} Assumption:F iscontractivewithparameterC<1

} Giventhiscondition:

} WewanttoshowthattheESPholdstrue:

(Contractivity)

Contractivity andtheESP} Theorem. IfanESNhasacontractivestatetransitionfunctionF,thenit

satisfiestheEchoStateProperty} Assumption:F iscontractivewithparameterC<1

goes to 0 as n goes to infinity à the ESP holds

Contractivity andReservoirInitialization} IfthereservoirisinitializedtoimplementacontractivemappingthantheESPisguaranteed(inanynorm,foranyinput)

} FormulationofasufficientconditionfortheESP} Assumptions:

} Euclideandistanceasmetricinthestatespace(useL2-norm)} Reservoirunitswithtanh activationfunction(note:squashingnonlinearitiesboundthestatespace)

MarkovianNatureofstatespaceorganizations} Contractivedynamicalsystemsarerelatedtosuffix-basedstatespaceorganizations

} Statesassumedincorrespondenceofdifferentinputsequencessharingacommonsuffixareclosetoeachotherproportionallytothelengthofthecommonsuffix} similarsequencesaremappedtoclosestates} differentsequencesaremappedtodifferentstates} similaritiesanddissimilaritiesareintendedinasuffix-basedfashion

} RNNsinitializedwithsmallweights(withcontractivestatetransitionfunction)andboundedstatespaceimplement(approximatearbitrarilywell)definitememorymachines

Hammer, B., Tino, P.: Recurrent neural networks with small weights implementdefinite memory machines. Neural Computation 15 (2003) 1897-1929

MarkovianNatureofstatespaceorganizations} MarkovianArchitecturalbiasofRNNs

} recurrentweightsaretypicallyinitializedwithsmallvalues} thisleadstoatypicallycontractiveinitializationofrecurrentdynamics

} IteratedFunctionSystems,fractaltheory,architecturalbiasofRNNs

} RNNsinitializedwithsmallweights(withcontractivestatetransitionfunction)andboundedstatespaceimplement(approximatearbitrarilywell)definitememorymachines

} Thischaracterizationisabias forfullytrainedRNNs:holdsintheearlystagesoflearning

Hammer, B., Tino, P.: Recurrent neural networks with small weights implementdefinite memory machines. Neural Computation 15 (2003) 1897-1929

Markovianity andESNs} Usingdynamicalsystemswithcontractivestatetransitionfunctions(inanynorm)impliestheEchoStateProperty(foranyinput)

} ESNsfeaturedbyfixedcontractivedynamics} RelationswiththeuniversalityofRCforboundedmemorycomputation(LSMstheory)

} ESNswithuntrainedcontractivereservoirsarealreadyabletodistinguishinputsequencesonasuffix-basedfashion

} IntheRCframeworkthisisnolongerabias,itisafixedcharacterizationoftheRNNmodel

Maass, W., Natschlager, T., Markram, H.: Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation 14 (2002) 2531-2560

Gallicchio C, Micheli A. Architectural and Markovian Factors of Echo State Networks. Neural Networks 2011;24(5):440–456.

WhydoEchoStateNetworkswork?} BecausetheyexploittheMarkovianstatespaceorganization} Thereservoirconstructsahigh-dimensionalMarkovianstatespace

representationoftheinputhistory} Inputsequencessharingacommonsuffixdrivethesystemintoclosestates

} Thestatesareclosetoeachotherproportionallytothelengthofthecommonsuffix

} Asimpleoutput(readout)toolcanthenbesufficienttoseparatethedifferentcases

Gallicchio C, Micheli A. Architectural and Markovian Factors of Echo State Networks. Neural Networks 2011;24(5):440–456.

WhendoEchoStateNetworkswork?} WhenthetargetmatchestheMarkovianassumptionbehindthereservoir

statespaceorganization} Markovianity canbeusedtocharacterizeeasy/hardtasksforESNs

Example:easytask

WhendoEchoStateNetworkswork?} WhenthetargetmatchestheMarkovianassumptionbehindthereservoir

statespaceorganization} Markovianity canbeusedtocharacterizeeasy/hardtasksforESNs

Example:hardtask

ESP:NecessaryCondition} Investigating thestability ofreservoirdynamicsfromadynamicalsystemperspective

} Theorem. IfanESNhasunstabledynamicsaroundthezerostateandthezerosequenceisanadmissibleinput,thentheESPisnotsatisfied.

} Approachthisstudybylinearizingthestatetransitionfunction

Jacobian matrix

ESP:NecessaryCondition} Linearizationaroundthezerostateandfornullinput

} Remember:

} TheJacobianwithtanh neuronsisgivenby

ESP:NecessaryCondition} Linearizationaroundthezerostateandfornullinput

} Remember:

} TheJacobianwithtanh neuronsisgivenby

Null input assumption

ESP:NecessaryCondition} Thelinearizedsystemnowreads:

} 0isafixedpoint.Isitstable?} LineardynamicalsystemstheorytellsusthatIf𝜌 𝐖* < 1 thenthefixedpointisstable

} Otherwise:0isnotstableifwestartfromastatenear0 andwedrivethenetworkwitha(infinite-length)nullsequencewedonotendupin0

} Thenullsequenceisacounter-example:theESPdoesnothold!} Thereareatleasttwodifferentorbitsresultingfromthesameinputsequence

ESP:NecessaryCondition} Asufficientcondition(underourassumptions)fortheabsenceoftheESPisthat𝜌 𝐖* ≥ 1

} Hence,anecessaryconditionfortheESPisthat𝜌 𝐖* < 1

} Ingeneral:𝜌 𝐖* ≤ 𝐖* 2

} Typically,inapplications:} Thesufficientconditionisoftentoorestrictive} ThenecessaryconditionfortheESPisoftenusedtoscalethe

recurrentweightmatrix(e.g.toavalueofthespectralradiusof0.9)

} However,notethatscalingthespectralradiusbelow1inpresenceofdrivinginputisneithersufficientnornecessaryforensuringechostates

ESPIndex:aconcreteperspective} Drivinginputisnotproperlytakenintoaccountinconventionalreservoirinitializationstrategies

} Idea:calculateaveragedeviationsofreservoirstateorbitsfromdifferentinitialconditionsandundertheinfluenceofthesamedrivingexternalinput

ESPIndex:aconcreteperspective

• The set of configurations that satisfy the ESPin real cases are well beyond thosecommonly adopted in ESN practice

• A large portion of "good" reservoirs areusually neglected in common practice

C. Gallicchio, "Chasing the Echo State Property", ESANN 2019.

AdvancesonEchoStateNetworks

ResearchonEchoStateNetworks} TheresearchonESNsfollowstwomajorcomplementaryobjectives:} StudytheintrinsicpropertiesofRNNs,takingasidetheaspectsrelatedtotrainingoftherecurrentconnections

} DevelopefficientlytrainedRNNs

} Advances…} Theoreticalanalysis} Qualityofreservoirdynamics} Architecturalstudies} DeepEchoStateNetworks} ReservoirComputingforStructures

EchoStateProperty:Advances} MuchofthetheoreticaladvancesinthestudyofESNsaimtoestablishsimpleconditionsforreservoirinitialization

} Focusofthetheoreticalinvestigationsisshiftedtothestudyofstabilityconstraint

} Analysisofthesystemstabilitygiventheinput

} Non-autonomousdynamicalsystems

} MeanFieldTheory

} LocalLyapunovexponents

G. Manjunath and H. Jaeger. Echo state property linked to an input: Exploring a fundamental characteristic of recurrent neural networks. Neural computation, 25(3):671–696, 2013.

M. Massar and S. Massar. Mean-field theory of echo state networks. Physical Review E,87(4):042809, 2013.

G. Wainrib and M.N. Galtier. A local echo state property through the largest lyapunovexponent. Neural Networks, 76:39–45, 2016.

I.B. Yildiz, H. Jaeger, and S.J. Kiebel. Re-visiting the echo state property. Neural networks, 35:1–9, 2012.

ApplicationstoReal-worldProblems} Successfulapplicationsinseveralreal-worldproblems

} Chaotictime-seriesmodeling} Non-linearsystemidentification} Speechrecognition} Financialforecasting} Bio-medicalapplications} Robotlocalization&control} …

} Highdimensionalreservoirsareoftenneededtoachieveexcellentperformanceincomplexreal-worldtasks

} Question:canwecombinetrainingefficiencywithcompact (i.e.smallsize)ESNs?

QualityofReservoirDynamics} Howtoestablishthequalityofareservoir?} IfIcanfindasuitablewaytocharacterizehowgoodareservoirisIcantrytooptimizeit

} Entropyofrecurrentunitsactivations} UnsupervisedadaptationofreservoirsusingIntrinsicPlasticity

} Studytheshort-termmemoryabilityofthesystem} MemoryCapacityandrelationstolinearity

} Edgeofstability/chaos:reservoirdynamicsattheborderofstability} Recurrentsystemsclosetoinstabilityshowoptimalperformanceswheneverthetaskathandrequireslongshort-termmemory

Short-termMemoryCapacity} Anaspectofgreatimportanceinthestudyofdynamicalsystems istheanalysisoftheirmemoryabilities

} Jaegerintroducedalearningtask,calledMemoryCapacity(MC)toquantifyit

} Trainindividualoutputunitstorecallincreasinglydelayedversionsofaunivariatei.i.d.inputsignal

Jaeger, Herbert. Short term memory in echo state networks. Vol. 5. GMD-Forschungszentrum Informationstechnik, 2001.

MCisthesumofsquaredcorrelationcoefficientsbetweenthedelayedsignalsandtheoutputs

Short-termMemoryCapacity} Forgettingcurvestostudythememorystructure} Plotthesquaredcorrelation(Y-axis,i.e.detCoeff)withrespecttoindividualdelays(X-axis)

linear reservoir units tanh reservoir units

Short-termMemoryCapacitySomefundamentaltheoreticalresults:} TheMCofanetworkwithNrecurrentunitsisupper-boundedbyN} MC≤N} ItisimpossibletotrainanESNontaskswhichrequireunbounded-timememory

} Linearreservoirscanachievethemaximumbound} e.g.sufficientcondition:thematrix𝐖* O𝐰()𝐖* a𝐰() …𝐖*P𝐰() hasfullrank} example:unitaryrecurrentmatrices(i.e.orthogonalmatricesintherealcase)

} MemoryversusNon-linearitydilemma} Linearreservoirsarefeaturedbylongershort-termmemories,butnon-linear

reservoirsarerequiredtosolvecomplexreal-worldproblems....

} MemoryCapacityvsPredictiveCapacity

ArchitecturalSetup} Howtoconstruct“better”reservoirsthanjustrandomreservoirs?

} CriticalEchoStateNetworks:relevanceoforthogonalrecurrenceweightmatrices(e.g.permutationmatrices)

[M.A. Hajnal, A. Lorincz, 2006]

CyclicReservoirs DelayLineReservoirs[A. Rodan, P. Tino, 2011][T. Strauss et al., 2012][J. Boedecker et al., 2009]

[M. Cernansky, P. Tino 2008][A. Rodan, P. Tino 2012]

EdgeofStability} Reservoirdynamicalregimeclosetothetransitionbetweenstableandunstabledynamics} E.g.,studyofLyapunovexponents} λ=0identifiesthetransitionbetween(locally)stable(λ<0)andunstable(λ>0)dynamics

[J. Boedecker, O. Obst, J.T. Lizier, N.M. Mayer, and M. Asada. Information processing in echostate networks at the edge of chaos. Theory in Biosciences, 131(3):205–213, 2012.]

DeepNeuralNetworks

DeepLearningisanattractiveresearcharea} Hierarchyofmanynon-linearmodels} Abilitytolearndatarepresentationsatdifferent(higher)levelsofabstraction

DeepNeuralNetworks(DeepNNs)} Feed-forwardhierarchyofmultiplehiddenlayersofnon-linearunits

} Impressiveperformance inreal-worldproblems(especiallyinthecognitivearea)

} Remember:deeplearninghasastrongbiologicalplausibility

DeepRecurrentNeuralNetworksExtensionofthedeeplearningmethodologytotemporalprocessing.Aimat naturallycapturetemporal feature representations atdifferenttime-scales

} Text,speech,languageprocessing} Multi-layeredprocessingoftemporalinformationwithfeedbackshasastrongbiologicalplausibility

TheanalysisofdeepRNNsisstillyoung} DeepReservoirComputing:Investigatetheactualroleoflayeringindeeprecurrentarchitectures

} Stability:characterizethedynamicsofhierarchicallyorganizedrecurrentmodels

DeepEchoStateNetwork

} Whatistheintrinsicrole oflayering inrecurrentarchitectures?

} Developnovelefficientapproachestoexploit} Multipletime-scalesrepresentations

} Extremeefficiencyoftraining

C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing, 2017

untrained

DeepRNNArchitecture:TheroleofLayering

ConstraintstothearchitectureofafullyconnectedRNN,by:} Removingconnectionfrominputtohigherlayers

} Removingconnectionsfromhigherlayerstolowerones

} Removingconnectionstolayersatlevelshigherthan+1

LessweightstostorethanafullyconnectedRNN

DeepESN:OutputComputation

} Thereadoutcanmodulatethe(qualitativelydifferent)temporalfeaturesdevelopedatthedifferentlayers

DeepESN:ArchitectureandDynamics

first layer

l-th layer (l>1)

Each layer has its own:- leaky integration

constant - Input scaling- Spectral radius- Inter-layer scaling

DeepESN:ArchitectureandDynamics

first layer

l-th layer (l>1)

Each layer has its own:- leaky integration

constant - Input scaling- Spectral radius- Inter-layer scaling

The recurrent part of the system is hierarchically structured.Interestingly, this naturally entails a structure into the developed system dynamics

* seminar topic

IntrinsicallyRicherDynamicsLayeringinRNN:aconvenientwayofarchitecturalsetup} Multipletime-scalesrepresentations} Richerdynamicsclosertotheedgeofstability} Longershort-timememory

DeepESN:HierarchicalTemporalFeatures

Structuredrepresentationoftemporaldatathroughthedeeparchitecture

EmpiricalInvestigations} Effectsofinputperturbationslastslongerinhigherlayers

} Multipletime-scalesrepresentation

} Orderedalongthenetwork’shierarchy

C. Gallicchio, A. Micheli, L. Pedrelli, "Deep Reservoir Computing: A Critical Experimental Analysis", Neurocomputing, 2017

DeepESN:HierarchicalTemporalFeatures

FrequencyAnalysis } DiversifiedmagnitudesofFFTcomponents

} Multiplefrequencyrepresentation

} Orderedalongthenetwork’shierarchy

} Higherlayerstendtofocusonlowerfrequencies[C. Gallicchio, A. Micheli, L. Pedrelli, WIRN 2017]

layer 1 layer 4

layer 7 layer 10

DeepESN:MathematicalBackground

TheoreticalAnalysis} Higher layersintrinsicallyimplementless contractive dynamics

} EchoStatePropertyforDeepESNs

} Deepernetworksnaturallydevelopricher dynamics,closertotheedge of stability

[C. Gallicchio, A. Micheli. Cognitive Computation (2017).]

[C. Gallicchio, A. Micheli, L. Silvestri. Neurocomputing 2018.]

DeepESN:PerformanceinApplications

Formidabletrade-offbetweenperformanceandcomputationaltime

C. Gallicchio, A. Micheli, L. Pedrelli, "Comparison between DeepESNs and gatedRNNs on multivariate time-series prediction", ESANN 2019.

DeepTreeEchoStateNetworks

} DeepTreeEchoStateNetworks} Untrainedmulti-layeredrecursiveneuralnetwork

Gallicchio, Micheli, IJCNN, 2018.Gallicchio, Micheli, Information Sciences, 2019.

DeepTreeEchoStateNetworks:Unfolding

DeepTreeEchoStateNetworks:Advantages

} Effectiveinapplications

} Extremelyefficient} Layeredrecursivearchitecture} Shallowcase

Conclusions} ReservoirComputing:paradigmforefficientmodelingofRNNs

} Reservoir:non-lineardynamiccomponent,untrainedaftercontractiveinitialization

} Readout:linearfeed-forwardcomponent,trained} Easy toimplement,fast totrain} Markovianflavourofreservoirstatedynamics} SuccessfulapplicationsRecentextensionstoward:

} DeepLearningarchitecture} StructuredDomains

References} JaegerH.The“echostate”approachtoanalysing andtrainingrecurrent

neuralnetworks- withanerratumnote.Tech.rep.GMD- GermanNationalResearchInstituteforComputerScience,Tech.Rep.2001.

} JaegerH,HaasH.Harnessingnonlinearity:predictingchaoticsystemsandsavingenergyinwirelesscommunication.Science.2004;304(5667):78–80.

} Gallicchio C,Micheli A.ArchitecturalandMarkovianFactorsofEchoStateNetworks.NeuralNetworks2011;24(5):440–456.

} Lukoševičius,Mantas,andHerbertJaeger."Reservoircomputingapproachestorecurrentneuralnetworktraining." ComputerScienceReview 3.3(2009):127-149.

} Lukoševičius,Mantas."Apracticalguidetoapplyingechostatenetworks." Neuralnetworks:Tricksofthetrade.SpringerBerlinHeidelberg,2012.659-686.

References} Gallicchio,Claudio,AlessioMicheli,andLucaPedrelli."Deepreservoir

computing:Acriticalexperimentalanalysis." Neurocomputing 268(2017):87-99.

} Gallicchio,Claudio,AlessioMicheli,andLucaPedrelli."Designofdeepechostatenetworks." NeuralNetworks 108(2018):33-47.

} Gallicchio,Claudio,andAlessioMicheli."Deepechostatenetwork(deepesn):Abriefsurvey." arXiv preprintarXiv:1712.04323 (2017).

} Gallicchio,Claudio,andAlessioMicheli."DeepReservoirNeuralNetworksforTrees." InformationSciences 480(2019):174-193.

} Gallicchio,Claudio,andAlessioMicheli."Graphechostatenetworks." The2010InternationalJointConferenceonNeuralNetworks(IJCNN).IEEE,2010.

Reservoir Computing Methods - e-learning · Recurrent Neural Networks (RNNs)}Feedbacks allows the...

Documents

Transcript of Reservoir Computing Methods - e-learning · Recurrent Neural Networks (RNNs)}Feedbacks allows the...

Lecture 8: Recurrent Neural NetworksRecurrent neural networks (RNNs) •Recurrent •Perform the same task for every element of a sequence, with the output being dependent on the previous

Learning visual motion with recurrent neural networksmarius/slides/RedwoodTalk.pdf · Marius Pachitariu Learning visual motion with RNNs 12/48. Learning visual motion Statistical

Recurrent Neural Networks - Arvutiteaduse instituut...Today's Topics • Modeling sequences: a brief overview • Training RNNs with back propagation • A toy example of training

13-recurrent neural network - KAISTmac.kaist.ac.kr/.../slides/13-recurrent_neural_network.pdf · 2019-05-16 · The resulting network is called a recurrent neural network (RNN). RNNs

Recurrent Neural Network shad.pcs15@iitp.acshad.pcs15/data/rnn-shad.pdf · Recurrent Neural Network (RNN) Training of RNNs BPTT Visualization of RNN through Feed-Forward Neural Network

An Introduction to Recurrent Neural Networksabatanasov.com › Files › Deep Learning 1.pdf · 1 Introduction to RNNs 2 Historical Background 3 Mathematical Formulation 4 Unrolling

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Recurrent Neural Networks (RNNs) –BackProp Through Time (BPTT) –Vanishing / Exploding.

Recurrent Neural Networks (RNNs) h t 1) t+1) · Recurrent neural networks can be built in many diﬀerent ways. Much as almost any function can be considered a feedforward neural

Nonlinear Brain Tumor Model Estimation with Long Short ...users.rowan.edu/~bouaynaya/IJCNN-2018-Greg.pdf · neural networks (RNNs) for parameter estimation in dy-namical systems,

Introduction - ISIP€¦ · Web viewThis system integrates convolutional neural networks (CNNs) with recurrent neural networks (RNNs) to deliver state of the art performance. This

TopicsPlan for Today • Model – Recurrent Neural Networks (RNNs) • Learning – BackPropThrough Time (BPTT) (C) Dhruv Batra and Zsolt Kira 2

Machine Learning Lecture 18 Bayes Decision Theory · Gated Recurrent Units (GRU) •Applications of RNNs 3 B. Leibe ng ‘18 Recurrent Neural Networks •Up to now Simple neural network

Crafting Adversarial Attacks on Recurrent Neural Networks ...cs229.stanford.edu/proj2017/final-posters/5148050.pdf · Crafting Adversarial Attacks on Recurrent Neural Networks (RNNs)

Part2:Deep Learning - HITir.hit.edu.cn/~car/talks/ijcnlp17-tutorial/lec02-Deep_Learning.pdf · Recurrent Neural Networks (RNNs) •Condition the neural network on all previous inputs

Understanding Hidden Memories of Recurrent Neural Networkshuamin/vast_yao_rnnvis_2017.pdf · ing neural model, co-clustering 1 INTRODUCTION Recurrent neural networks (RNNs) are deep

Deep Learning - Recurrent Neural Network (RNNs) · Deep Learning Recurrent Neural Network (RNNs) Ali Ghodsi University of Waterloo October 23, 2015 ... in preparation, Deep Learning

Recurrent Neural Networks (RNNs)mgormley/courses/10601-s18/slides/lecture17-rn… · Eliahu Khalastchi Recurrent Neural Networks (RNNs) 3}Standard NN models (MLPs, CNNs) are not able

Recurrent Neural Networksbhiksha/courses/deeplearning/...The Backpropagation Through Time (BTT) Algorithm Different Recurrent Neural Network (RNN) paradigms How Layering RNNs works

Workshop track - ICLR 2016 - Artificial Intelligencevision.stanford.edu/pdf/KarpathyICLR2016.pdf · Recurrent Neural Networks (RNNs), and speciﬁcally a variant with Long Short-Term

Recurrent and Recursive Networks - IDA · 2016-09-14 · Recurrent neural networks (RNNs) • Recurrent neural networks can be visualised as networks with feedback connections, which