[Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

download [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

of 402

Transcript of [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    1/401

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    2/401

    This page intentionally left blank 

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    3/401

    Nonparametric System Identification

    Presenting a thorough overview of the theoretical foundations of nonparametric sys-tems identification for nonlinear block-oriented systems, Włodzimierz Greblicki and

    Mirosław Pawlak show that nonparametric regression can be successfully applied to

    systemidentification, and they highlight what you can achievein doingso.

    Starting with the basic ideas behind nonparametric methods, various algorithms for

    nonlinear block-oriented systems of cascadeandparallel forms are discussed in detail.

    Emphasis isplaced onthemost popular systems, HammersteinandWiener, whichhave

    applicationsin engineering, biology, andfinancial modeling.

    Algorithms usingtrigonometric, Legendre, Laguerre, and Hermiteseries areinvesti-

    gated, andthekernel algorithm, itssemirecursiveversions, andfully recursivemodifica-tionsarecovered.Thetheoriesof modern nonparametricregression, approximation, and

    orthogonal expansionsarealsoprovided, asarenew approachestosystemidentification.

     Theauthors show how to identify nonlinear subsystems so that their characteristics can

    be obtained even when little information exists, which is of particular significancefor

    practical application. Detailed information about all the tools used is provided in the

    appendices.

     This book is aimed at researchers and practitioners insystems theory, signal process-

    ing, and communications. It will also appeal toresearchers in fields suchas mechanics,

    economics, andbiology, whereexperimental dataareused toobtain models of systems.

    Włodzimierz Greblicki is a professor at the Instituteof Computer Engineering, Control,

    and Roboticsat theWrocław University of Technology, Poland.

    Mirosław Pawlak isaprofessor intheDepartment of Electrical andComputer Engineer-

    ingat theUniversity of Manitoba, Canada. HewasawardedhisPh.D. fromtheWrocław

    University of Technology, Poland.

    Both authors have published extensively over the years in the area of nonparametrictheory andapplications.

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    4/401

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    5/401

    Nonparametric System

    Identification

    WŁODZIMIERZ GREBLICKI

    Wrocław University of Technology

    MIROSŁAW PAWLAK 

    University of Manitoba, Canada

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    6/401

    CAMBRIDGE UNIVERSITY PRESS

    Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo

    Cambridge University PressThe Edinburgh Building, Cambridge CB2 8RU, UK 

    First published in print format

    ISBN-13 978-0-521-86804-4

    ISBN-13 978-0-511-40982-0

    © Cambridge University Press 2008

    2008

    Information on this title: www.cambridge.org/9780521868044

    This publication is in copyright. Subject to statutory exception and to the provision ofrelevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.

    Cambridge University Press has no responsibility for the persistence or accuracy of urlsfor external or third-party internet websites referred to in this publication, and does notguarantee that any content on such websites is, or will remain, accurate or appropriate.

    Published in the United States of America by Cambridge University Press, New York 

     www.cambridge.org 

    eBook (NetLibrary)

    hardback 

    http://www.cambridge.org/9780521868044http://www.cambridge.org/http://www.cambridge.org/9780521868044http://www.cambridge.org/

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    7/401

    Contents

    Preface page  ix

    1 Introduction   1

    2 Discrete-time Hammerstein systems   3

    2.1 Thesystem   3

    2.2 Nonlinear subsystem   4

    2.3 Dynamic subsystemidentification   8

    2.4 Bibliographic notes   9

    3 Kernel algorithms   113.1 Motivation   11

    3.2 Consistency   13

    3.3 Applicable kernels   14

    3.4 Convergencerate   16

    3.5 Themean-squared error   21

    3.6 Simulationexample   21

    3.7 Lemmas and proofs   24

    3.8 Bibliographic notes   29

    4 Semirecursive kernel algorithms   30

    4.1 Introduction   30

    4.2 Consistency and convergencerate   31

    4.3 Simulationexample   34

    4.4 Proofs and lemmas   35

    4.5 Bibliographic notes   43

    5 Recursive kernel algorithms   44

    5.1 Introduction   445.2 Relation tostochastic approximation   44

    5.3 Consistency and convergencerate   46

    5.4 Simulation example   49

    5.5 Auxiliary results, lemmas, and proofs   51

    5.6 Bibliographic notes   58

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    8/401

    vi   Contents

    6 Orthogonal series algorithms   59

    6.1 Introduction   59

    6.2 Fourier series estimate   616.3 Legendreseries estimate   64

    6.4 Laguerreseries estimate   66

    6.5 Hermiteseries estimate   68

    6.6 Wavelet estimate   69

    6.7 Local and global errors   70

    6.8 Simulation example   71

    6.9 Lemmas and proofs   72

    6.10 Bibliographic notes   78

    7 Algorithms with ordered observations   80

    7.1 Introduction   80

    7.2 Kernel estimates   81

    7.3 Orthogonal series estimates   85

    7.4 Lemmas and proofs   89

    7.5 Bibliographic notes   99

    8 Continuous-time Hammerstein systems   101

    8.1 Identification problem   101

    8.2 Kernel algorithm   103

    8.3 Orthogonal series algorithms   106

    8.4 Lemmas and proofs   108

    8.5 Bibliographic notes   112

    9 Discrete-time Wiener systems   113

    9.1 Thesystem   113

    9.2 Nonlinear subsystem   114

    9.3 Dynamic subsystemidentification   119

    9.4 Lemmas   1219.5 Bibliographic notes   122

    10 Kernel and orthogonal series algorithms   123

    10.1 Kernel algorithms   123

    10.2 Orthogonal series algorithms   126

    10.3 Simulationexample   129

    10.4 Lemmas and proofs   130

    10.5 Bibliographic notes   142

    11 Continuous-time Wiener system   143

    11.1 Identificationproblem   143

    11.2 Nonlinear subsystem   144

    11.3 Dynamic subsystem   146

    11.4 Lemmas   146

    11.5 Bibliographic notes   148

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    9/401

    Contents   vii

    12 Other block-oriented nonlinear systems   149

    12.1 Series-parallel, block-oriented systems   149

    12.2 Block-oriented systems with nonlinear dynamics   17312.3 Concluding remarks   218

    12.4 Bibliographical notes   220

    13 Multivariate nonlinear block-oriented systems   222

    13.1 Multivariatenonparametric regression   222

    13.2 Additivemodeling and regressionanalysis   228

    13.3 Multivariatesystems   242

    13.4 Concluding remarks   248

    13.5 Bibliographic notes   248

    14 Semiparametric identification   250

    14.1 Introduction   250

    14.2 Semiparametric models   252

    14.3 Statistical inferencefor semiparametric models   255

    14.4 Statistical inferencefor semiparametric Wiener models   264

    14.5 Statistical inferencefor semiparametric Hammerstein models   286

    14.6 Statistical inferencefor semiparametric parallel models   287

    14.7 Direct estimators for semiparametric systems   290

    14.8 Concluding remarks   309

    14.9 Auxiliary results, lemmas, and proofs   310

    14.10 Bibliographical notes   316

    A Convolution and kernel functions   319

    A.1 Introduction   319

    A.2 Convergence   320

    A.3 Applications to probability   328

    A.4 Lemmas   329

    B Orthogonal functions   331

    B.1 Introduction   331

    B.2 Fourier series   333

    B.3 Legendreseries   340

    B.4 Laguerreseries   345

    B.5 Hermiteseries   351

    B.6 Wavelets   355

    C Probability and statistics   359

    C.1 Whitenoise   359

    C.2 Convergenceof randomvariables   361

    C.3 Stochastic approximation   364

    C.4 Order statistics   365

    References    371

    Index    387

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    10/401

    To my wife, Helena, and my children, Jerzy, Maria, and Magdalena – WG

    To my parents and family and those whom I love – MP

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    11/401

    Preface

     The aim of this book is to show that the nonparametric regression can be applied

    successfully tononlinear systemidentification. It gatherswhat hasbeendoneintheareasofar andpresents main ideas, results, andsomenew recentdevelopments.

     The study of nonparametric regression estimation began with works published by

    Cencov, Watson, and Nadaraya in the 1960s. The history of nonparametric regression

    in system identification began about ten years later. Such methods have been applied

    to the identification of composite systems consistingof nonlinear memorylesssystems

    and linear dynamic ones. Therefore, the approach is strictly connected with so-called

    block-oriented methods developed since Narendra and Gallman’s work published in

    1966. Hammerstein and Wiener structures are most popular and have received the

    greatestattentionin numerous applications. Fundamental for nonparametric methods istheobservationthat theunknowncharacteristic of thenonlinear subsystemor itsinverse

    can berepresented as regressionfunctions.

    In terms of the a priori information, standard identification methods and algorithms

    work when it is parametric, that is, when our knowledge about the system is rather

    large; for example, when weknow that thenonlinear subsystemhas apolynomial char-

    acteristic. Inthis book, theinformationis muchsmaller, nonparametric. Thementioned

    characteristiccanbe,forexample,anyintegrableorboundedor,even,anyBorel function.

    It can thus besaid that this book associates block-oriented systemidentificationwith

    nonparametric regression estimation and shows how to identify nonlinear subsystems,that is, to recover their characteristics when the a priori information is small. Because

    of this, the approach should be of interest not only to researchers but also to people

    interested in applications.

    Chapters 2–7 aredevoted to discrete-time Hammerstein systems. Chapter 2 presents

    basic discussionof theHammerstein systemand itsrelationship with theconcept of the

    nonparametricregression. Thenonparametrickernel algorithmispresentedinChapter3,

    its semirecursive versions are examined in Chapter 4, and Chapter 5 deals with fully

    recursivemodificationsderived fromtheideaof stochastic approximation. Next, Chap-

    ter 6 is concerned with the nonparametric orthogonal series method. Algorithms usingtrigonometric, Legendre, Laguerre, and Hermiteseries areinvestigated. Some spaceis

    devoted to estimation methods based on wavelets. Nonparametric algorithms based on

    ordered observationsarepresented andexamined in Chapter 7. Chapter 8 discusses the

    nonparametric algorithms when applied to continuous-timeHammerstein systems.

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    12/401

    x   Preface

     TheWiener systemis identified in Chapters 9–11. Chapter 9 presents themotivation

    for nonparametric algorithms that are studied in the next two chapters devoted to the

    discrete and continuous-time Wiener systems, respectively. Chapter 12 is concernedwith the generalization of our theory to other block-oriented nonlinear systems. This

    includes, among others, parallel models, cascade-parallel models, sandwich models,

    and generalized Hammerstein systems possessing local memory. In Chapter 13, the

    multivariate versions of block-oriented systems are examined. The common problem

    of multivariate systems, that is, the curse of dimensionality, is cured by using low-

    dimensional approximations. With respect to this issue, models of the additive form

    are introduced and examined. In Chapter 14, we develop identification algorithms for

    a semiparametric class of block-oriented systems. Such systems are characterized by a

    mixtureof finitedimensional parameters andnonparametric functionsbeingtypically aset of univariatefunctions.

     Thereader isencouragedto look into theappendices, inwhichfundamental informa-

    tion about tools used in the book is presented in detail. Appendix A is strictly related

    to kernel algorithms, and Appendix B is tied with the orthogonal series nonparametric

    curve estimates. Appendix C recalls some facts from probability theory and presents

    resultsfromthetheory of order statisticsused extensively in Chapter 7.

    Over the years, our work has benefited greatly from the advice and support of a

    number of friends and colleagues with interest in ideas of nonparametric estimation,

    pattern recognition, and nonlinear system modeling. There are too many names to listhere, butspecial mentionisduetoAdamKrzyżak, aswell asDanutaRutkowska, Leszek

    Rutkowski, Alexander Georgiev, SimonLiao, PradeepaYahampath, andYongqingXin–

    our past Ph.D. students, now professorsat universitiesinCanada, theUnitedStates, and

    Poland.Cooperationwiththemhasbeenagreatpleasureandgivenusalotof satisfaction.

    We aredeeply indebted to Zygmunt Hasiewicz, Ewaryst Rafajłowicz, Uli Stadtm̈uller,

    EwaRafajłowicz, Hajo Holzmann, and Andrzej Kozek, who havecontributed greatly

    to our research in the area of nonlinear system identification, pattern recognition, and

    nonparametric inference.

    Last, butby nomeansleast,wewould liketothank Mount-firstNg for helpinguswitha number of typesetting problems. Ed Shwedyk and January Gnitecki have provided

    support for correctingEnglishgrammar.

    We also thank Anna Littlewood, fromCambridge University Press, for being a very

    supportive and patient editor. Researchpresented in this monograph was partially sup-

    ported by research grants from Wrocław University of Technology, Wrocław, Poland,

    and NSERC of Canada.

    Wrocław, Winnipeg   W odzimierz Greblicki, Miros aw Pawlak 

    February 2008

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    13/401

    1   Introduction

    Systemidentification, as aparticular process of statistical inference, exploits two types

    of information. Thefirstisexperiment;theother,calledapriori, isknownbeforemakingany measurements. In a widesense, the a priori information concernsthe systemitself 

    and signals entering thesystem. Elementsof theinformationare, for example:

        the nature of thesignals, which may be randomor nonrandom, white or correlated,

    stationary or not, their distributions can be known in full or partially (up to some

    parameters) or completely unknown,    general information about the system, which can be, for example, continuous or

    discretein thetimedomain, stationary or not,    thestructureof thesystem, whichcanbeof theHammersteinorWiener type, orother,    the knowledge about subsystems, that is, about nonlinear characteristics and linear

    dynamics.

    In other words, the a priori information is related to the theory of the phenomena

    takingplaceinthesystem(areal physical process) or canbeinterpretedas ahypothesis

    (if so, resultsof theidentificationshould benecessarily validated) or can beabstract in

    nature.

     Thisbook dealswithsystemsconsistingof nonlinear memorylessandlinear dynamic

    subsystems, for example, Hammerstein and Wiener systems and other related struc-

    tures. With respect to them, the a priori information is understood in a narrow sensebecauseitrelatestothesubsystemsonly andconcernstheapriori knowledgeabouttheir

    descriptions. Werefer to suchsystems as block-oriented.

     The characteristic of the nonlinear subsystemis recovered with the help of nonpara-

    metricregressionestimates.Thekernel andorthogonal seriesmethodsareused.Ordered

    statistics arealsoapplied.Bothofflineandonlinealgorithmsareinvestigated.Weexam-

    ineonly theseestimationmethodsandnonlinear modelsfor whichweareabletodeliver

    fundamental resultsintermsof consistency andconvergencerates. Thereareother tech-

    niques, for example, neural networks, which may exhibit a promising performancebut

    their statistical accuracy is mostly unknown.For the theory of nonparametric regression, see Efromovich [78], Györfi, Kohler,

    Krzyżak, and Walk [140], Härdle [150], Prakasa Rao [241], Simonoff [278], or Wand

    and Jones [310]. Nonparametric wavelet estimates are discussed in Antoniadis and

    Oppenheim[6], Härdle, Kerkyacharian, Picard, and Tsybakov [151], Ogden [223], and

    Walter andShen [308].

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    14/401

    2   Introduction

    Parametric methods are beyond the scope of this book; nevertheless, we mention

    Brockwell and Davies [33], Ljung [198], Norton [221], Zhu [332], and Söderströmand

    Stoica[280].Nonlinear systemidentification within the parametric framework is studied by Nells

    [218], Westwick and Kearney [316], Marmarelis and Marmarelis [207], Bendat [16],

    andMathewsandSicuranza[208]. Thesebookspresentidentification algorithms based

    mostly on the theory of Wiener and Volterra expansions of nonlinear systems. A com-

    prehensivelistof referencesconcerningnonlinear systemidentificationandapplications

    has been given by Giannakis and Serpendin [102], see also the 2005 special issue on

    systemidentification of theIEEE Trans. onAutomatic Control [199]. A nonparametric

    statistical inference for time series is presented in Bosq [26], Fan and Yao [89], and

    Györfi, Härdle, Sarda, and Vieu [139].It should bestressed that nonparametric andparametric methods are supposed to be

    appliedindifferentsituations. Thefirstareusedwhentheapriori informationisnonpara-

    metric, that is, when wewish to recover an infinite-dimensional object with underlying

    assumptions as weak as possible. Clearly, in such a case, parametric methods can only

    approximate, but not estimate, the unknown characteristics. When the information is

    parametric, parametricmethodsarethenatural choice. If, however, theunknowncharac-

    teristic is acomplicated functionof parameters convergenceanalysis becomes difficult.

    Moreover, serious computational problems can occur. In such circumstances, one can

    resort tononparametric algorithmsbecause, fromthecomputational viewpoint, they arenot discouraging. On thecontrary, they aresimple but consumecomputer memory, be-

    cause, for example, kernel estimates requireall datatobestored. Neverthelessit can be

    saidthatthetwoapproachesdonotcompetewitheachother sincetheyaredesignedtobe

    applied inquitedifferent situations. Thesituations differ fromeachother by theamount

    of theapriori information abouttheidentifiedsystem. However, acompromisebetween

    thesetwo separateworldscanbemadeby restrictingaclassof nonparametric modelsto

    thosethat consistof afinitedimensional parameter andnonlinear characteristics, which

    runthroughanonparametric classof univariatefunctions. Such semiparametric models

    canbeefficiently identified, andthetheory of semiparametric identification isexaminedin this book. The methodology of semiparametric statistical inference is examined in

    Härdle, Müller, Sperlich, and Werwatz [152], Ruppert, Wand, and Carroll [259], and

     Yatchev [329].

    For two number sequences a n   and b n , a n  =  O (b n ) means that a n /b n   is bounded inabsolutevalueas n  → ∞. In particular, a n  =  O (1) denotes that a n   is bounded, that is,thatsupn  |a n | < ∞. Writinga n  ∼ b n , wemeanthata n /b n  hasanonzerolimitasn  → ∞.

     Throughout the book, “almost everywhere” means “almost everywhere with respect

    totheLebesguemeasure,” whereas“almosteverywhere(µ)” means“almosteverywhere

    withrespect to themeasureµ.”

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    15/401

    2   Discrete-time Hammerstein systems

    In this chapter, wediscusssome preliminary aspects of the discrete-time Hammerstein

    system. In Section2.1 weformtheinput–output equations of thesystem. A fundamen-tal relationship between the system nonlinearity and the nonparametric regression is

    established in Section 2.2. The use of the correlation theory for recovering the linear

    subsystemis discussed in Section2.3.

    2.1 The system

    A Hammersteinsystem, showninFigure2.1, consistsof anonlinearmemorylesssubsys-

    temwithacharacteristic m (•) followedbyalineardynamiconewithanimpulseresponse{λn }. Theoutputsignal W n  of thelinear partisdisturbedby Z n  andY n  =  W n  + Z n  istheoutput of thewholesystem. Neither V n  nor W n  is availabletomeasurement. Our goal is

    to identify thesystem, that is, torecover both m (•) and {λn }, fromobservations(U 1, Y 1) , (U 2, Y 2) , . . . , (U n , Y n ) , . . .   (2.1)

    taken at theinput and output of thewhole system.

    Signalscomingtothesystem, thatis,theinput{. . . , U −1, U 0, U 1, . . .} anddisturbance{. . . , Z −1, Z 0, Z 1, . . .} are mutually independent stationary whiterandomsignals. Thedisturbancehaszeromeanandfinitevariance, thatis, E Z n  = 0andvar[Z n ] = σ 

    2Z   < ∞.

    Regarding the nonlinear subsystem, we assume that  m (•) is a Borel measurablefunction. Therefore, V n   is a randomvariable. The dynamic subsystem is described by

    thestateequation X n +1 =   A X n  + bV n 

    W n  =  c T  X n ,(2.2)

    where X n  is astatevector at timen , A  is amatrix, b  and c  arevectors. Thus,

    λn  = 0,   for n  = 0, −1, −2, . . .c T  An −1b ,   for n  = 1, 2, 3, . . . ,and

    W n  =n 

    i =−∞λn −i m (U i ).   (2.3)

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    16/401

    4   Discrete-time Hammerstein systems

    Z n

    Y nU n   W nV  n

    {λn}m(•)

    Figure 2.1   Thediscrete-timeHammerstein system.

    Neitherb norc isknown.Thematrix A anditsdimensionarealsounknown.Nevertheless,

    thematrix A  is stable, all its eigenvalues liein theunit circle. Therefore, assuming that

    Em 2(U ) < ∞,   (2.4)the time index at U   is dropped, we conclude that both X n  as well as W n   are random

    variables.Clearlyrandomprocesses

    {. . . , X −1, X 0, X 1, . . .

    }and

    {. . . , W −1, W 0, W 1, . . .

    }arestationary.Consequently,theoutputprocess{. . . , Y −1, Y 0, Y 1, . . .} isalsoastationarystochastic process. Therefore, theproblemis well posed in thesensethat all signals are

    random variables. In the light of this, we estimate both m (•) and {λn }  from randomobservations (2.1).

     Therestrictionsimposedonthesignalsenteringthesystemandbothsubsystemsapply

    whenever the Hammerstein system is concerned. They will not be repeated in further

    considerations, neither lemmas nor theorems.

    Input randomvariables U n s may havea probability density denoted by   f  (•) or maybe distributed quitearbitrarily. Nevertheless (2.4) holds. It should be emphasized that,

    apart fromfew cases, (2.4) is theonly restrictionin whichthenonlinearity is involved.Assumption (2.4) is irrelevant to identification algorithms and has been imposed for

    only onereason: toguaranteethat bothW n  andY n  arerandomvariables. Neverthelessit

    certainly has an influenceontherestrictions imposed onboth m (•) and thedistributionof  U   to meet (2.4). If, for example, U   is bounded, (2.4) is satisfied for any m (•). Therestrictionalso holds, if  E U 2 < ∞ and|m (u )| ≤ α + β|u | withany α, β. Inyet anotherexample, EU 4 < ∞ and|m (u )| ≤ α + βu 2. ForGaussianU   and|m (u )| ≤ W (u ), whereW   is an arbitrary polynomial, (2.4) is also met. Anyway, theapriori informationabout

    the characteristic is nonparametric becausem (

    •) cannot be represented in aparametric

    form. This is becausetheclass of all possible characteristicsis very wide. The family of all stable dynamic subsystems also cannot be parameterized, because

    its order is unknown. Therefore, the a priori information about the impulse response

    is nonparametric, too. To form a conclusion we infer about both subsystems under

    nonparametric apriori information.

    In the following chapters, for simplicity, U , W , Y , and Z   stand for U n , W n , Y n , and

    Z n , respectively.

    2.2 Nonlinear subsystem

    2.2.1 The problem and the motivation for algorithms

    Fix p  ≥ 1 and observethat, sinceY p  =  Z p  +p 

    i =−∞ λp −i m (U i ) and {U n } is a whiteprocess,

    Y p |U 0 = u  = µ(u ),

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    17/401

    2.2 Nonlinear subsystem   5

    Z n

    Y nU n   W nV  n

    {λn}

    S n

    β 

    ρ(•)

    Figure 2.2   Theequivalent Hammerstein system.

    where

    µ(u ) = λp m (u ) + α p with αp  =  E m (U )

    ∞i =1,i =p  λi . Estimatingtheregression E 

    Y p |U 0 = u 

    , wethus re-

    cover m (•) uptosomeunknownconstants λp  and αp . If  E m (U ) = 0, whichis thecase,for example, when the distribution of  U   is symmetrical with respect to zero and m (•)is an even function then α p  = 0 and we estimate m (•) only up to the multiplicativeconstant λp .

    SinceY p +n  = µ(U n ) + ξ p +n  + Z p +n  withξ p +n  =p +n 

    i =−∞,i =n  λ p +n −i m (U i ), it canbesaid that weestimateµ(u ) frompairs

    (U 0, Y p ), (U 1, Y p +1), . . . , (U n , Y p +n ), . . . ,

    and that the regression µ(u ) is corrupted by the noise Z p +n  + ξ p +n . The first compo-nent of noise is white with zero mean. Because of dynamics the other noise compo-

    nent is correlated. Its mean E ξ n  = αp  is usually nonzero and the variance is equal tovar[m (U )]

    ∞i =1,i =p  λ

    2i  . Thus, main difficulties in the analysis of any estimate of  µ(•)

    are caused by the correlation of  {ξ n }, that is, the system itself but not by the whitedisturbance Z n  comingfromoutside.

    Every algorithmestimating the nonlinearity in Hammerstein systems studied in this

    book,theestimateisdenotedhereasµ̂(U 0, . . . , U n ; Y p , . . . , Y p +n ), islinear withrespecttooutputobservations, whichmeansthat

    µ̂(U 0, . . . , U n ; θ p  + η p , . . . , θ  p +n  + ηp +n )=  µ̂(U 0, . . . , U n ; θ p , . . . , θ  p +n ) +  µ̂(U 0, . . . , U n ; η p , . . . , ηp +n ) (2.5)

    andhas anatural property that, for any number θ ,

    µ̂(U 0, . . . , U n ; θ , . . . , θ  ) → θ  as n  → ∞   (2.6)

    in an appropriatestochastic sense. This property, or rather itsconsequence, is exploited

    when proving consistency. To explain this, observethat with respect to U n  and Y n , the

    identified system shown in Figure 2.1 is equivalent to that in Figure 2.2 with nonlin-

    earity ρ (u )

    =m (u )

    − E m (U ) and an additional disturbance β

     = E m (U )

    ∞i 

    =1 λi . In

    theequivalent system, E ρ(U ) = 0 and E {Y p |U 0 = u } = µ(u ). From(2.5) and (2.6), itfollowsthat

    µ̂(U 0, . . . , U n ; Y p , . . . , Y p +n ) =  µ̂(U 0, . . . , U n ; S p  + β , . . . , S p +n  + β)=  µ̂(U 0, . . . , U n ; S p , . . . , S p +n )

    +  µ̂(U 0, . . . , U n ; β , . . . , β)

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    18/401

    6   Discrete-time Hammerstein systems

    with µ̂(U 0, . . . , U n ; β , . . . , β) → β as n  → ∞. Hence, if µ̂(U 0, . . . , U n ; S p , . . . , S p 

    +n )

    → E 

    {S p 

    |U 0

    =u }

    ,   as n  → ∞

    ,

    wehave

    µ̂(U 0, . . . , U n ; Y p , . . . , Y p +n ) →  E {Y p |U 0 = u },   as n  → ∞,whereconvergenceis understoodin thesamesenseas that in (2.6).

     Thus, if theestimaterecovers theregression E {S p |U 0 = u } fromobservations(U 0, S p ), (U 1, S 1+p ), (U 2, S 2+p ), . . . ,

    it also recovers E 

    {Y p 

    |U 0

    =u 

    }from

    (U 0, Y p ), (U 1, Y 1+p ), (U 2, Y 2+p ), . . . .

    We can say that if the estimate works properly when applied to the system with input

    U n  and output S n   (in which E ρ(U ) = 0), it behaves properly also when applied to thesystemwithinputU n  andoutput Y n  (in which Em (U ) may benonzero).

     Theresult of thereasoning is given in thefollowing remark:

    R E M A R K    2.1 Let an estimate have properties (2.5) and (2.6). I f the estimate is consistent 

    for Em (U ) = 0, then it i s consistent for E m (U ) = 0, too.

    Owing to the remark, with no loss of generality, in all proofs of consistency of algorithms recoveringthenonlinearity, weassumethat E m (U ) = 0.

    In parametric problems thenonlinearity is usually apolynomial m (u ) = α0 + α1u +· · · + αq u q  of afixeddegreewithunknowntruevaluesof parametersα0, . . . , αq . There-fore, toapply parametric methods, wemust haveagreat deal moreapriori information

    about the subsystem. It seems that in many applications, it is impossible to represent

    m (•) in aparametric form.Sincethesystemwiththefollowing ARM A typedifferenceequation:

    wn  + a k −1wn −1 + · · · + a 0wn −k  = b k −1m (u n −1) + · · · + b 0m (u n −k )can bedescribed by (2.2), all presented methods can beused torecover thenonlinearity

    m (•) in theprevious ARM A system.It will beconvenient todenote

    φ(u ) =  E 

    W 2p |U 0 = u 

    .   (2.7)

    Since   W p  =p −1

    i =−∞ λi m (U i ), denoting   c 0 =  E m 2(U )∞

    i =1,i =p  λ2i  + E 2m (U )

    (

    ∞i =1,i =p  λi )

    2, c 1 = 2λp Em (U )

    ∞i =1,i =p  λi , and c 2 = λ2p , wefind

    φ (u ) = c 0 + c 1m (u ) + c 2m 2(u ). To avoid complicated notation, we do not denote explicitly the dependence of the

    estimated regressionandother functions on p  and simply writeµ(•) and φ(•).Results presented in further chapters can be easily generalized onthe systemshown

    inFigure2.3, where{. . . , ξ  0, ξ 1, ξ 2, . . .} isanother zeromeannoise. Moreover,{Z n } can

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    19/401

    2.2 Nonlinear subsystem   7

    Z n

    Y nU n

    {λn}m(•)

    ξ n

    Figure 2.3   Possiblegeneralizationof thesystemshownin Figure2.1.

    be correlated, that is, it can be the output of a stable linear dynamic systemstimulated

    by whiterandomnoise. So can {ξ n }.It is worth noting that aclass of stochastic processes generated by theoutput process

    {Y n } of the Hammerstein systemis different fromthe classof strong mixing processesconsideredextensivelyinthestatistical literatureconcerningthenonparametricinference

    fromdependent data, see, for example, [26] and [89]. Indeed, theARMA process {X n }in which X n +1 = a X n  + V n , where0

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    20/401

    8   Discrete-time Hammerstein systems

    cloud of 200 input–output observations, we infer fromis presented in Figure 2.4. The

    quality of eachestimate, denoted hereby m̂ (u ), is measured with

    MISE =    3−3

    (m̂ (u ) − m (u ))2du .

    2.3 Dynamic subsystem identification

    Passingtothedynamicsubsystem,weuse(2.3)andrecall E Z n  = 0tonoticeE {Y i U 0} =

    i  j =−∞ λi − j  E {m (U i )U 0} = λi  E {m (U )U }. Denoting κi  = λi  E {U m (U )} , weobtain

    κi  =  E {Y i U 0} ,

    whichcan beestimated in thefollowingway:

    κ̂i  =1

    n −i  j =1

    Y i + j U  j .

     T HEOR EM  2.1 For any i ,

    limn →∞

    E (κ̂i  −

    κi )2

    =0.

    Proof.   The estimate is unbiased, that is,   E κ̂i  =  E {Y i U 0} = κi . Moreover, var[κ̂i ] =P n  + Q n  + R n  with

    P n  =1

    n 2var

    n  j =1

    Z i + j U  j 

    = 1n 2

    n  j =1

    var

    Z i + j U  j  = 1

    n σ 2Z EU 

    2,

    Q n  =1

    n var[W i U 0] ,

    and

    R n  =1

    n 2

    n  j =1

    n  j =1, j =i 

    cov

    W i + j U  j , W i +m U m 

    = 1n 2

    n  j =1

    (n −   j ) covW i + j U  j , W i U 0 .Since W i  =

    i  j =−∞ λi − j m (U  j ),   Q n  = n −1λ2i   var[m (U )U ]. For the same reason, for

     j   > 0,

    cov

    W i + j U  j , W i U 0 = i + j 

    p =−∞

    i q =−∞

    λi + j −p λi −q  cov

    m (U p )U  j , m (U q )U 0

    =  E 2{U m (U )}λi + j λi − j 

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    21/401

    2.4 Bibliographic notes   9

    SeeLemmaC.3 in Appendix C, which leads to

    |R n | ≤1

    n 2 E 2

    {U m (U )}n 

     j =1(n −   j )|λi + j λi − j | ≤

    1

    n E 2

    {U m (U )} maxs  |λs |∞

     j =1|λ j |.

     Thus,

    E (κ̂i  − κi )2 = var[κ̂i ] =  O 

    1

      (2.8)

    whichcompletes theproof.

     Thetheoremestablishesconvergenceof thelocal error E (κ̂i  − κi )2 tozeroasn  → ∞.As an estimate of the whole impulse response

    {κ1, κ2, κ3, . . .

    }, we take a sequence

    {κ̂1, κ̂2, κ̂3, . . . , κ̂N (n ), 0, 0, . . .} andfindthemeansummedsquareerror (MSSE) isequalto

    MSSE(κ̂) =N (n )i =1

    E (κ̂i  − κi )2 +∞

    i =N (n )+1κ2i   .

    From(2.8), it followsthat theerror is not greater than

    N (n )

    n  +

    i =N (n )+1κ2i   .

     Therefore, if  N (n ) → ∞ as n  → ∞ and N (n )/n  → 0 as n  → ∞,lim

    n →∞MSSE(κ̂) = 0.

     The identity λs τ  =  E {Y s U 0}, whereτ  =  E {U m (U )}, allows us to forma nonpara-metric estimateof the linear subsystemin the frequency domain. Indeed, formation of 

    theFourier transformof theidentity yields

    (ω)τ  =  S YU (ω),   |ω| ≤ π,   (2.9)where S YU (ω) = ∞s =−∞ κs e −i s ω is thecross-spectral density functionof theprocesses{Y n } and {U n }. Moreover,

    (ω) =∞

    s =0λs e 

    −i s ω

    is the transfer function of the linear subsystem. Note also that if  λ0 = 1, then τ  = κ0.See Chapter 12 for further discussion on the frequency domain identification of linear

    systems.

    2.4 Bibliographic notes

    Various aspects of parametric identification algorithms of discrete-time Hammerstein

    systems have been studied by Narendra and Gallman [216]; Haist, Chang, and Luus

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    22/401

    10   Discrete-time Hammerstein systems

    [142], Thatchachar and Ramaswamy [289], Kaminskas [175], Gallman [92], Billings

    [19],BillingsandFakhouri [20,24],ShihandKung[276], KungandShih[190], Liaoand

    Sethares[195],VerhaegenandWestwick[301],Giri,Chaoui,andRochidi [103],NinnessandGibson[220], Bai [11,12], andVörös[305].Theanalysisof block–orientedsystems

    and, in particular, Hammerstein ones, useful for variousaspects of identificationand its

    applicationscanbefoundinBendat [16], Chen[45], MarmarelisandMarmarelis[207],

    Mathewsand Sicuranza[208], Nells [218], and Westwick and Kearney [316].

    SometimesresultsconcerningHammersteinsystemsaregiven,howevernotexplicitly,

    in works devoted to morecomplicated Hammerstein–Wiener or Wiener–Hammerstein

    structures, see, for example, Gardiner [94], Billings and Fakhouri [22, 23], Fakhouri,

    Billlings,andWormald[86], Hunter andKorenberg[168], KorenbergandHunter [177],

    Emara-ShaBaik, Moustafa, andTalaq [79], Boutayeb andDarouach [27], Vandersteen,Rolain, and Schoukens [296], Bai [10], Bershad, Celka, and McLaughlin [18], and

    Zhu[333].

     The nonparametric approach offers a number of algorithms to recover the charac-

    teristics of the nonlinear subsystem. The most popular kernel estimate can be used

    in the offline version, see Chapter 3. For semirecursive and fully recursive forms, see

    Chapter 4 and Chapter 5, respectively. Nonparametric orthogonal series identification

    algorithms,seeChapter 6,utilizetrigonometric, Legendre, Laguerre,Hermitefunctions

    or wavelets. Bothclassesof estimatescanbemodifiedtouseorderedinputobservations

    (seeChapter 7), whichmakes theminsensitivetotheroughness of theinput density. The Hammerstein model has been used in various and diverse areas. Eskinat, J ohn-

    son, and Luyben [82] applied it to describe processes in distillation columns and heat

    exchangers.Thehysteresis phenomenoninferriteswasanalyzedby HsuandNgo[166],

    pH processes were analyzed by Patwardhan, Lakshminarayanan, and Shah [227], bio-

    logical systems werestudiedby Hunter andKorenberg[168], andEmerson, Korenberg,

    andCitron[80] described someneuronal processes. Theuseof theHammersteinmodel

    for modeling aspects of financial volatility processes is presented in Capobianco [38].

    In Giannakis and Serpendin [102] a comprehensive bibliography on nonlinear system

    identification is given, see also the 2005 special issue on system identification of theIEEE Trans. onAutomatic Control [199].

    It is also worth noting that the concept of the Hammerstein model originates from

    thetheory of nonlinear integral equationsdevelopedby Hammersteinin1930[148], see

    alsoTricomi [292].

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    23/401

    3   Kernel algorithms

     Thekernel algorithmisjustthekernel estimateof aregressionfunction. Thisisthemost

    popularnonparametricestimationmethodandisvery convenientfromthecomputationalviewpoint. In Section 3.1, an intuitivemotivation for the algorithmis presented and in

    Section 3.2, itspointwise consistency is shown. Some results hold for any input signal

    density, that is, are density-free; some are even distribution-free, that is, they hold for

    any distributionof theinput signal. In Section3.3, theattention is focused on aclass of 

    applicablekernel functions. Theconvergencerateis studied in Section3.4.

    3.1 Motivation

    It is obvious that

    limh →0

    1

    2h 

       u +h u −h 

    µ(v) f  (v)d v = µ(u ) f  (u )

    at every continuity point u  ∈  R   of both m (•) and   f  (•), since µ(u ) = λp m (u ) + α p .Becausetheformulacan berewritten in thefollowingform:

    limh →0 

      µ(v) f  (v)1

    h K 

    u − v

    h  d v = µ(u ) f  (u ),   (3.1)

    where

    K (u ) =

    1

    2,   for |u |

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    24/401

    12   Kernel algorithms

    -1 1

    0.5

    Figure 3.1   Rectangular kernel (3.2).

    converges to 

      µ(v) f  (v)δ(u − v)d v = µ(u ) f  (u ) as h  → 0.Becauseµ(u ) =  E Y p |U 0 = u , weget 

      µ(u ) f  (v)1

    h K 

    u − v

    d v = 1

       E 

    Y p |U 0 = v

    u − v

      f  (v)d v

    = 1h 

    E Y p K u − U 0h 

    ,whichsuggeststhefollowingestimateof µ(u ) f  (u ):

    1

    nh 

    n i =1

    Y p +i  K 

    u − U i h 

    .

    For similar reasons,

    1nh 

    n i =1

    K u − U i h 

    is agood candidatefor an estimateof  

      f  (v)1

    h K 

    u − v

    d v,

    which converges to  f  (u ) as h  → 0. Thus,

    µ̂(u ) =

    n i =1

    Y p +i  K u − U i h n  n 

    i =1K 

    u − U i 

    h n 

      (3.3)with h n  tending to zero, is a kernel estimate of   µ(u ). The parameter h n  is called a

    bandwidth. Notethat theaboveformulaisof theratio formandwealwaystreat thecase

    0/0 as 0.

    In light of this, crucial problems are the choice of the kernel  K (•) and the number

    sequence {h n }. Fromnow on, wedenote g (u ) = µ(u ) f  (u ).Itisworthmentioningthatthereisawiderangeof kernel estimates[88,140,172]avail-ablefor findingacurveindata. Themostprominentare: theclassical Nadaraya–Watson

    estimator, defined in (3.3), local linear and polynomial kernel estimates, convolution

    type kernel estimates, andvarious recursivekernel methods. Some of thesetechniques

    arethoroughly examined in this book.

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    25/401

    3.2 Consistency   13

    3.2 Consistency

    On thekernel function, thefollowingrestrictions areimposed:

    sup−∞ 0.

     Thenext theoremis the“almost everywhere” versionof Theorem3.1. Therestriction

    imposed on the kernel and number sequenceare the same as in Theorem 3.1 with the

    only exceptionthat (3.6) holds withsomeε > 0 but not with ε = 0. T HE OR E M  3.2 Let U have a probabil ity density f   (•) and let Em 2(U ) < ∞. Let the Borel measurable satisfy (3.4), (3.5), and (3.6) with some ε > 0. Let the sequence {h n }of positive numbers satisfy (3.7) and (3.8). Then, convergence (3.9) takes place at every 

    Lebesgue point u  ∈  R of both m (•) and f   (•), where f  (u ) > 0, and, a forti ori , at almost every u where f  (u ) > 0, that i s, at almost every u belonging to support of f  (•).Proof.   Theproof is very muchlikethat of Theorem3.1. Thedifferenceis that weapply

    LemmaA.9 rather than LemmaA.8.

     The algorithm converges also when the input signal has not a density, when the

    distribution of U   is of any shape. Theproof of thetheoremis in Section3.7.1.

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    26/401

    14   Kernel algorithms

     T HEOR EM  3.3 Let E m 2(U ) < ∞. Let H (•) be a nonnegative nonincreasing Borel func- tion defined on  [0, ∞), continuous and positi ve at t  = 0 and such that 

    t H (t ) → 0 as t  → ∞.

    Let, for some c 1 and c 2,

    c 1H (|u |) ≤  K (u ) ≤ c 2H (|u |).

    Let the sequence {h n } of positive numbers satisfy (3.7) and (3.8). Then convergence(3.9) takes place at almost every  (ζ ) u  ∈  R, where ζ   i s the probabil ity measure of U .

    Restrictions (3.7) and (3.8) are satisfied by a wide class of number sequences. If 

    h n  = cn −δ withc  > 0, they aresatisfiedfor 0 < δ  0. The other does it for every Lebesgue point of both m (

    •) and   f  (

    •), that is,

    for almost every (with respect to the Lebesgue measure) u  where   f  (u ) > 0, that is, atalmost every (ζ ) point. In Theorem3.3 the kernel satisfies restrictions (3.4), (3.5), and

    (3.6) with ε = 0. In Theorems 3.1and 3.2, (3.6) holdswith ε > 0.If both m (•) and   f  (•) are bounded and continuous, we can apply kernels satisfying

    only (3.4) and (3.5), see Remark 3.1. In Theorem3.3, U   has an arbitrary distribution,

    whichmeans that it may nothaveadensity.

    Inthelightof thistoachieveconvergenceatLebesguepointsand, afortiori, continuity

    points, wecan apply thefollowingkernel functions:

       

    therectangular kernel (3.2),    thetriangle kernel

    K (u ) =

    1− |u |,   for |u |

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    27/401

    3.3 Applicable kernels   15

    -10 10

    0.3

    Figure 3.2   Gauss–Weierstrass kernel (3.10).

        thePoisson kernel

    K (u ) =1

    π

    1

    1+ u 2,    theFejér kernel (seeFigure3.3)

    K (u ) = 1π

    sin2 u 

    u 2  ,   (3.11)

        theLebesguekernel

    K (u )=

    1

    2

    e −|u |.

    All thesekernels satisfy (3.4), (3.5), and (3.6) for someε > 0. Thekernel

    K (u ) =

    1

    4e ,   for |u | ≤ e 

    1

    4|u | ln2 |u | ,   otherwise,(3.12)

    satisfies (3.4), (3.5), and (3.6) withε = 0 only. In turn, kernels

    K (u ) =1

    π

    sinu 

    u  ,   (3.13)

    (seeFigure3.4) and

    K (u ) = 

    2

    πcosu 2,   (3.14)

    -10 10

    0.3

    Figure 3.3   Fejér kernel (3.11).

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    28/401

    16   Kernel algorithms

    -10   10

    -0.1

    0.3

    Figure 3.4   Kernel (3.13).

    (seeFigure3.5), satisfy (3.4) and(3.5), but not(3.6), even withε = 0. Forall presentedkernels,   K (u )du  = 1.Observethattheycanbecontinuousornotandcanhavecompactor unbounded support.

    Noticethat Theorem3.3 admitsthefollowingone:

    K (u ) =

    1

    e ,   for |u | ≤ e 

    1

    |u | ln|u | ,   otherwise,

    for which

       K (u )du  = ∞. Restrictions imposed by the theoremareillustrated in Fig-

    ure3.6.

    3.4 Convergence rate

    In this section, both the characteristic  m (•) and an input density   f  (•) are smoothfunctions and have q  derivatives. Proper selection of the kernel and number sequence

    increases thespeed wheretheestimateconverges. Wenow findtheconvergencerate.

    In our analysis, thekernel satisfies thefollowingadditional restrictions:

       vi  K (v)d v = 0,   for i  = 1, 2, . . . , q  − 1,   (3.15)and  

      |vq −1/2K (v)|d v < ∞,   (3.16)

    -2 2

    -0.4

    0.4

    Figure 3.5   Kernel (3.14).

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    29/401

    3.4 Convergence rate   17

    K (u)

    c2H (u)

    c1H (u)

    Figure 3.6   A kernel satisfyingrestrictions of Theorem3.3.

    seetheanalysis in Section A.2.2. For simplicity of notation,

       K (v)d v = 1. For afixed

    u , weget

    E  f̂  (u ) = 1h n 

       f  (v)K 

    u − v

    h n 

    d v =

       f  (u + vh n )K (−v)d v,

    whichyields

    bias[ f̂  (u )] =  E  f̂  (u ) −   f  (u ) = 

      ( f  (u + vh n ) −   f  (u ))K (−v)d v.

    Assuming that   f   (q )(•) is square integrable and applying (A.17), we find bias[ f̂  (u )]

    =  O (h q 

    −1/2

    n    ). Wenext recall (3.27) and writevar[ ˆf  (u )] =  O (1/nh n ), which leads to

    E ( f̂  (u ) −   f  (u ))2 =  O (h 2q −1n    ) + O 

      1

    nh n 

    .

     Thus, selecting

    h n  ∼ n −1/2q ,   (3.17)

    wefinally obtain

    E ( ˆf  (u ) −   f  (u ))

    2

    =  O (n −1

    +1/2q 

    ).Needless to say that if the q th derivative of  g (u ) is square integrable, for the same

    reasons, E (ĝ (u ) − g (u ))2 is of the sameorder. Hence, applying Lemma C.9, wefinallyobtain thefollowingconvergencerate:

    P {|µ̂(u ) − µ(u )| > ε|µ(u )|} =  O (n −1+1/2q )

    for any ε > 0, and

    |µ̂(u )

    −µ(u )

    | = O (n −1/2+1/4q ) as n 

     → ∞in probability.

    If   f   (q )(u ) is bounded, bias[ f̂  (u )] =  O (h q n ), see(A.18); and, for

    h n  ∼ n −1/(2q +1),

    E ( f̂  (u ) −   f  (u ))2 =  O (n −1+1/(2q +1)).

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    30/401

    18   Kernel algorithms

    -4   -2 2   4

    0.6

    Figure 3.7   Kernel G 4.

    If, in addition, the q th derivative of  g (u ) is bounded,  E (ĝ (u ) − g (u ))2 is of the sameorder and, as aconsequence,

    P {|µ̂(u ) − µ(u )| > ε|µ(u )|} =  O (n −1+1/(2q +1))

    for any ε > 0, and

    |µ̂(u ) − µ(u )| =  O (n −1/2+1/(4q +2)) as n  → ∞ in probability,

    whichmeans that therateis slightly better.

     The rate O (n −q /(2q +1)) in probability, obtained aboveis known to be optimal withintheclass of q   differentiableinputdensitiesandnonlinear characteristics, see[285].

    It is not difficult to construct kernels satisfying (3.15) such that    K (v)d v = 1. Forexample, starting from the Gauss–Weierstrass kernel (3.10) denoted now as G (•) weobserve that

      u i G (u )du  = 0 for odd i , and   u i G (u )du  = 1× 3× · · · × (i  − 1) for

    even i . Thus, for

    G 2(u ) = G (u ) =1√ 2π

    e −u 2/2,

    (3.15) is satisfied for q  = 2. For thesamereasons, for

    G 4(u ) = 12

    (3− u 2)G (u ) = 12√ 

    2π(3− u 2)e −u 2/2,   (3.18)

    (seeFigure3.7), and

    G 6(u ) =1

    8(15− 10u 2 + u 4)G (u ) = 1

    8√ 

    2π(15− 10u 2 + u 4)e −u 2/2

    (3.15) hold for q  = 4 and q  = 6, respectively.In turn, for rectanglekernel (3.2) denoted now as W (

    •),   u i W (u )du   equals zero forodd i   and 1/(i  + 1) for even i . Thus for W 2(u ) = W (u ), (3.15) holdswith q  = 2, while

    for

    W 4(u ) =1

    4(9− 15u 2)W (u ) =

    18

    (9− 15u 2),   for |u | ≤ 10,   otherwise,

    (3.19)

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    31/401

    3.4 Convergence rate   19

    -1 1

    -1

    1

    Figure 3.8   Kernel W 4.

    with q  = 4. For q  = 6, wefind

    W 6(u ) = 564

    (45− 210u 2 + 189u 4)W (u ) (3.20)

    =

    5

    128

    45− 210u 2 + 189u 4 ,   for |u | ≤ 1

    0,   otherwise.

    Kernels W 4(u ) and W 6(u ) areshownin Figures 3.8 and 3.9, respectively.

     There is a formal way of generating kernel functions satisfying Conditions (3.15)

    and (3.16) for an arbitrary value of   q . This technique relies on the theory of or-

    thogonal polynomials that is examined in Chapter 6. In particular, if one wishes toobtain kernels defined on a compact interval then we can use a class of Legendre

    orthogonal polynomials, see Section 6.3 for various properties of this class. Hence,

    let {p (u ); 0 ≤ ≤ ∞}  be a set of the orthonormal Legendre polynomials defined on[−1, 1], that is,  1−1 p (u )p  j (u )du  = δ j ,  δ j   being the Kronecker delta function andp (u ) =

     2+1

    2   P (u ), where P (u ) is thethorder Legendrepolynomial.

     The following lemma describes the procedure for generation of a kernel function of 

    order q  withasupportdefined on[−1, 1].

    L E M M A  3.1 A kernel function 

    K (u ) =q −1 j =0

    p  j (0)p  j (u ),   |u | ≤ 1 (3.21)

    satisfies Condition (3.15).

    -1 1

    -0.5

    1.5

    Figure 3.9   Kernel W 6.

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    32/401

    20   Kernel algorithms

    Proof.   For i  ≤ q  − 1consider 1−1 u i  K (u )du . Sinceu i  canbeexpandedintotheLegen-dreseries, that is,u i  =

    i =0 a  p (u ), wherea  =  

    1

    −1 u 

    i  p (u )du  thenfor K (u ) defined

    in (3.21), wehave   1−1

    u i  K (u )d v =i 

    =0

    q −1 j =0

    a  p  j (0)   1

    −1p (u )p  j (u )du 

    =i 

    =0a  p (0) = 0i  =

    1 if i  = 00 if i  = 1, 2, . . . , q  − 1.   .

     Theproof of Lemma3.1 has been completed.

    It is worth noting that  P (0) = 0 for   = 1, 3, 5, . . .   and  P (−u ) =  P (u ) for   =0, 2, 4, . . .. Consequently, thekernel in (3.21) is symmetric and all terms in (3.21) with

    odd values of   j  areequal zero.

    Since p 0(u ) = 

    12

     and p 2(u ) = 

    52

    32

    u 2 −   12

    , it is easy to verify that thekernel in

    (3.21) withq  = 4 is given by

    K (u ) =

    9

    8− 15

    8u 2

    ,   |u | ≤ 1.

     This confirms theformof thekernel W 4(v) given in (3.19).

     Theresult of Lemma3.1 can beextended to alarger class of orthogonal polynomials

    defined ontheset S , that is, when wehavethesystemof functions {p (u ); 0 ≤ ≤ ∞}defined on S , whichsatisfies 

    p (u )p  j (u )w(u )du  = δ j ,

    where w(u ) is the weight function being positive on S  and such that w(0) = 1. Thenformula(3.21) takes thefollowingmodified form:

    K (u ) =q −1 j =0

    p  j (0)p  j (u )w(u ).   (3.22)

    In particular, if  w(u ) = e −u 2, −∞

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    33/401

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    34/401

    22   Kernel algorithms

    1.5

    0.5

    -0.5

    -1.5

    -3 -2 -1 0 1 2 3

    m(u)

    n = 40

    n = 80

    n = 320

    n = 1280

    Figure 3.10   Realizationsof theestimate; a  = 0.5, h n  = n −2/5 (examplein Section3.6).

    1

    0.5

    0

    0 200 400 600 800 1000 1200

    1.5

    a

    n

    0

    0.25

    0.5

    0.75

    Figure 3.11   MISE versus n , variousa ; h n  = n −2/5 (examplein Section3.6).

    4

    3

    2

    1

    0

    0 200 400 600 800 1000 1200

    var(Z)

    0

    0.25

    0.5

    1

    Figure 3.12   MISE versus n , variousvar(Z ); h n  = n −2/5 (example in Section3.6).

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    35/401

    3.6 Simulation example   23

    4

    3

    2

    1

    0

    0 1 2   3   4

    n

    10

    20

    40

    80

    160

    320

    640

    1280

    h n 

    Figure 3.13   MISE versus h n , variousn ; a  = 0.0 (exampleinSection3.6).

    4

    3

    2

    1

    0

    0 1 2 3 4

    n

    10

    20

    40

    80

    160

    320

    640

    1280

    hn

    Figure 3.14   MISE versus h n , variousn ; a  = 0.25 (examplein Section3.6).

    0 1 2 3 4

    4

    3

    2

    1

    0

    5

    10

    20

    40

    80

    160

    320

    640

    1280

    hn

    Figure 3.15   MISE versus h n , variousn ; a  = 0.5 (exampleinSection3.6).

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    36/401

    24   Kernel algorithms

    0 1 2 3 4

    4

    3

    2

    1

    0

    5

    hn

    20

    40

    80

    160

    320

    640

    1280

    Figure 3.16   MISE versus h n , variousn ; a  = 0.75 (examplein Section3.6).

    3.7 Lemmas and proofs

    3.7.1 Lemmas

    In Lemma3.2, U   has adensity, in Lemma3.3, thedistributionof U   is arbitrary.

    L E M M A  3.2 Let U have a probabil i ty density. Let Em (U ) = 0, var[m (U )] < ∞. Let the kernel K (•) satisfy (3.4), (3.5). I f (3.6) holds with  ε = 0, then, for i  = 0,

    suph >0

    covW p +i  1h K 

    u − U i h 

    , W p 

    1

    h K 

    u − U 0

    ≤ (|λp λ p +i | + |λ p λp −i | + |λ p +i λp −i |)ω(u ),

    where  ω(u )  is finite at every continuity point u of both m (•)  and f   (•). I f    ε > 0, the property holds at al most every u  ∈  R.Proof.   Weprovethecontinuousversionof thelemma. The“almosteverywhere” version

    can beverified in asimilar way.

    4

    3

    2

    1

    0

    5

    -0.2 0 0.2 0.4 0.6 0.8 1 1.2

    n

    10 20 40

    80 160 320

    640 1280

    Figure 3.17   MISE versus δ, h n  = n −δ, variousn ; a  = 0.5 (examplein Section3.6).

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    37/401

    3.7 Lemmas and proofs   25

    Since W p +i  =p +i 

    q =−∞ λp +i −q m (U q ) and W p  =p 

    r =−∞ λ p −r m (U r ), the covariancein theassertion equals

    p +i q =−∞

    p r =−∞

    λ p +i −q λp −r  cov

    m (U q )1

    h K 

    u − U i 

    , m (U r )

    1

    h K 

    u − U 0

    .

    Applying LemmaC.2, wefind that theaboveformulais equal to

    (λp λ p +i  + λp λp −i )1

    h E 

    u − U 

    h n 

     1

    h E 

    m 2(U )K 

    u − U 

    +λp 

    +i λp 

    −i 

    1

    h 2

    E 2m (U )K u − U 

    h  .Let u  beapoint whereboth m (•) and   f  (•) arecontinuous. It suffices toapply LemmasA.8and A.9to find that thefollowing formulas

    suph >0

    1h E K 

    u − U h 

    ,   suph >0

    m (U )1h K 

    u − U h 

    ,suph >0

    m 2(U )

    1

    h K 

    u − U 

    ,

    arefinite.

    In thenext lemma, U   has an arbitrary distribution.

    L E M M A  3.3 Let E m (U ) = 0and var[m (U )] < ∞. I f the kernel satisfies the restr icti ons of Theorem 3.3, then 

    limsuph →0

    cov

    W p +i  K 

    u − U i 

    , W p K 

    u − U 0

    E 2K 

    u − U h 

    ≤ (|λp λp +i | + |λp λ p −i | + |λp +i λp −i |)θ (u ),

    where some θ (u ) is finite at almost every  (ζ ) u  ∈  R, where ζ  i s the distribution of U .

    Proof.   The proof is similar to that of Lemma 3.2. Lemma A.10, rather than Lemmas

    A.8 and A.9, should beemployed.

    3.7.2 Proofs

    ProofofTheorem3.1For thesakeof theproof  E m (U ) = 0, seeRemark 2.1. Observethat µ̂(u ) =  ĝ (u )/  f̂  (u )with

    ĝ (u ) = 1nh n 

    n i =1

    Y p +i  K 

    u − U i h n 

      (3.23)

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    38/401

    26   Kernel algorithms

    and

    ˆf  (u ) =

    1

    nh n 

    i =1 K 

    −U i 

    h n  .   (3.24)Fix u  ∈  R  andsupposethat bothm (•) and  f  (•) arecontinuous at thepoint.

    Wewill now show that

    ĝ (u ) → g (u ) 

      K (v)d v → 0 as n  → ∞ in probability, (3.25)

    where, werecall, g (u ) = µ(u ) f  (u ). Since

    E ĝ (u )

    =1

    h n E E Y p  | U 0 K 

    u − U 0h n  =

    1

    h n E µ(U )K 

    u − U h n  ,

    applying LemmaA.8, weconcludethat

    E ĝ (u ) → g (u ) 

      K (v)d v as n  → ∞.

    In turn, sinceY n  =  W n  + Z n ,var[ĝ (u )] =  P n (u ) + Q n (u ) + R n (u ),

    where

    P n (u ) = 1nh n 

    σ 2Z 1

    h n E K  2

    u − U h n 

    ,

    Q n (u ) =1

    nh n 

    1

    h n var

    W p K 

    u − U 0

    h n 

    ,

    and

    R n (u ) =1

    n 2h 2n 

    n i =1

    n  j =1 j =

    cov

    W p +i  K 

    u − U i 

    h n 

    , W p + j  K 

    u − U  j 

    h n 

    = 2n 2h 2n 

    n i =1

    (n − i ) cov

    W p +i  K u − U i 

    h n 

    , W p K 

    u − U 0h n 

    .

    In view of LemmaA.8,

    nh n P n (u ) → σ 2Z   f  (u ) 

      K  2(v)d v   as n  → ∞.

    Since

    varW p K u − U 0h n  =  E 

    W 2p K 

     2

    u − U 0

    h n 

    − E 2

    W p K 

    u − U 0

    h n 

    =  E 

    φ (U ) K  2

    u − U 

    − E 2

    µ(U )K 

    u − U 

    ,   (3.26)

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    39/401

    3.7 Lemmas and proofs   27

    whereφ(•) is as in (2.7), by LemmaA.8,

    nh n Q n (u ) → φ (u ) f  (u )   K  2(v)d v   as n  → ∞.Passingto R n (u ), weapply Lemma3.2to obtain

    |R n (u )| ≤ 2ω(u ) 1

    n 2

    n i =1

    (n − i )(|λp λ p +i | + |λ p λp −i | + |λp +i λp −i |)

    ≤ 6ω(u )(maxn 

    |λn |)1

    ∞i =1

    |λi | =  O 

    1

    .

    Finally,

    nh n  var[ĝ (u )] →

    σ 2Z  + φ(u )

      f  (u )

       K  2(v)d v as n  → ∞.

    In this way, wehaveverified (3.25).

    Using similar arguments, weshow that E  f̂  (u ) →   f  (u )   K (v)d v as n  → ∞ andnh n  var[ f̂  (u )] →   f  (u )

       K  2(v)d v as n  → ∞,   (3.27)

    andthen weconcludethat  f̂  (u ) →   f  (u )   K (v)d v → 0 as n  → ∞ in probability. Theproof has been completed.

    ProofofTheorem3.3

    In general, the ideaof theproof is similar tothat of Theorem3.1. Some modifications,

    however, arenecessary.

    Recalling Remark 2.1, with no loss of generality, we assume that E m (U ) = 0 andbegin with theobservation that µ̂(u ) =  ξ̂ (u )/η̂(u ), where

    ξ̂ (u ) = 1n E K 

    u −U 

    h n 

    n i =1

    Y p +i  K u − U i h n 

    and

    η̂(u ) = 1n E K 

    u −U 

    h n 

    n i =1

    u − U i 

    h n 

    .

    Obviously,

    E ξ̂ (u ) =E Y 1K u − U 0

    h n 

    E K 

    u − U 

    h n 

      = E µ(U )K u − U h n  E K 

    u − U 

    h n 

      ,which, by LemmaA.10, converges toµ(u ) as n  → ∞ for almost every (ζ ) u  ∈  R .

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    40/401

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    41/401

    3.8 Bibliographic notes   29

    3.8 Bibliographic notes

     The kernel regression estimate has been proposed independently by Nadaraya [215]and Watson [312] and was the subject of studies performed by Rosenblatt [257],

    Collomb [55], Greblicki [105], Greblicki and Krzẏzak [121], Chu and Marron [51],

    Fan [87], Müller and Song [212], J ones, Davies, and Parkand [172], and many oth-

    ers. A comprehensive overview of various kernel methods is presented in Wand and

     Jones [310]. At first, thedensity of U   was assumed to exist. SinceStone[284], consis-

    tency for any distribution has been examined. Later, distribution-free properties were

    studied by Spiegelman and Sacks [282], Devroye and Wagner [73,74], Devroye [71],

    Krzyżak and Pawlak [187, 188], Greblicki, Krzyżak, and Pawlak [122], Kozek and

    Pawlak [179], among others. In particular, themonograph by Györfi, Kohler, Krzyżak,and Walk [140] examines the problem of a distribution-free theory of nonparametric

    regression.

     Thekernel regressionestimatehasbeenderivedinanatural way fromthekernel esti-

    mate(3.24) of aprobabilitydensity functionintroduced by Parzen [226], generalizedto

    multivariate cases by Cacoullos [37] and examined by a number of authors, see, for

    example, Rosenblatt [256], Van Ryzin [297, 298], Deheuvels [65], Wahba [306],

    Devroyeand Wagner [72], Devroyeand Györfi [68], and Csörgo and Mielniczuk [58].

    Seealso Härdle[150], PrakasaRao [241], or Silverman [277] and papers cited therein.

    In all mentioned works, however, the kernel estimate is of form (3.3) with p  = 0,whileindependent observations (U i , Y i )s comefromamodel Y n  = m (U n ) + Z n . In thecontext of theHammerstein system, it means that dynamicsis justmissing becausethe

    linear subsystemis reduced toasimpledelay.

     Thenonparametric kernel regressionestimatehasbeenappliedto recover thenonlin-

    ear characteristic inaHammerstein systemby Greblicki and Pawlak[126]. InGreblicki

    and Pawlak [129], the input signal has an arbitrary distribution. Not a state equation,

    but aconvolutiontodescribethedynamic subsystem, has beenappliedinGreblicki and

    Pawlak [127]. The kernel estimate has also been discussed in Krzyżak [182,183], as

    well as Krzyżak and Partyka [185]. For very specific distributions of the input signal,thenonparametric kernel regressionestimatehas been studied by Lang[193].

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    42/401

    4   Semirecursive kernel algorithms

     This chapter is devoted to semirecursive kernel algorithms, modifications of those ex-

    amined in Chapter 3. Their numerators and denominators can becalculated online. Weshow consistency and examine convergencerate. Theresultsfor all input densities and

    all inputdistributionsareestablished.

    4.1 Introduction

    Weexaminethefollowingsemirecursivekernel estimates:

    µ̃n (u ) =

    i =1

    1

    h i  Y p +i  K u − U i h i  n 

    i =1

    1

    h i K 

    u − U i 

    h i 

      (4.1)and

    µ̄n (u ) =

    n i =1

    Y p +i  K 

    u − U i h i 

    i =1K u − U i h i 

      ,   (4.2)

    modifications of (3.3). To demonstrate recursiveness, we observe that µ̃n (u ) =g̃ n (u )/  f̃   n (u ), where

    g̃ n (u ) =1

    n i =1

    Y p +i 1

    h i K 

    u − U i 

    h i 

    and

    f̃   n (u ) = 1n 

    n i =1

    1h i 

    K u − U i h i 

    . Therefore,

    g̃ n (u ) =  g̃ n −1(u ) −1

    g̃ n −1(u ) − Y p +n 

    1

    h n K 

    u − U n 

    h n 

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    43/401

    4.2 Consistency and convergence rate   31

    and

    f̃   n 

    (u )=

      f̃   n −

    1(u )−

    1

    n   f̃   n −1(u ) − 1h n  K u − U n h n  .For theother estimate, µ̄n (u ) =  ḡ n (u )/  f̄   n (u ) with

    ḡ n (u ) =1n 

    i =1 h i 

    n i =1

    Y p +i  K 

    u − U i h i 

    and

    f̄   n (u ) =1

    n i 

    =1 h i 

    i =1K 

    u − U i 

    h i 

    .

    Both ḡ n (u ) and  f̄   n (u ) can becalculated with thefollowingrecurrenceformulas:

    ḡ n (u ) =  ḡ n −1(u ) −h n n i =1 h i 

    ḡ n −1(u ) −

    1

    h n Y p +n K 

    u − U n 

    h n 

    and

    f̄   n (u ) =  ḡ n −1(u ) −h n n i =1 h i 

     f̄   n −1(u ) −

    1

    h n K 

    u − U n 

    h n 

    .

    In both estimates, thestartingpoints

    g̃ 1(u ) =  ḡ 1(u ) =1

    h 1Y p +1K 

    u − U 1h 1

    and

    f̃   1(u ) =   f̄   1(u ) =1

    h 1K 

    u − U 1

    h 1

    are thesame.

     Thus, both estimates are semirecursive because their numerators and denominators

    can becalculated recursively, but not they themselves.

    4.2 Consistency and convergence rate

    In Theorems 4.1 and 4.2, theinput signal has adensity; in Theorem4.3, itsdistribution

    is arbitrary.

     T HE OR E M  4.1 Let U have a density f  (•) and let Em 2(U ) < ∞. Let the Borel measur- able kernel K (•) satisfy (3.4), (3.5), and (3.6) with  ε = 0. Let the sequence {h n } satisfy the following restri ctions: 

    h n  → 0 as n  → ∞,   (4.3)

    1

    n 2

    n i =1

    1

    h i → 0 as n  → ∞.   (4.4)

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    44/401

    32   Semirecursive kernel algorithms

    Then,

    µ̃n (u )→

    µ(u ) as n  → ∞

    in probabil ity.   (4.5)

    at every u  ∈  R where both m (•) and f   (•) are continuous and f  (u ) > 0. I f, (3.6) holds for some ε > 0, then the convergence takes place at every Lebesgue point u  ∈  R of both m (•) and f   (•), such that f  (u ) > 0; a forti ori , at almost every u belonging to support of  f  (•).

     T HEOR EM  4.2 Let U have a density f  (•) and let Em 2(U ) < ∞. Let the Borel measur- able kernel K (•) satisfy (3.4), (3.5), and (3.6) with  ε = 0. Let the sequence {h n } satisfy (4.3) and 

    ∞n =1

    h i  = ∞.   (4.6)

    Then,

    µ̄n (u ) → µ(u ) as n  → ∞ in probabil ity.   (4.7)at every u  ∈  R where both m (•) and f   (•) are continuous and f  (u ) > 0. I f, (3.6) holds for some ε > 0, then the convergence takes place at every Lebesgue point u  ∈  R of both m (•) and f   (•), such that f  (u ) > 0; a forti ori , at almost every u belonging to support of  f  (•).

    Estimate(4.2)isconsistentnotonly forU  havingadensitybutalsoforanydistribution.

    In thenext theorem, thekernel is thesameas in Theorem3.3.

     T HEOR EM  4.3 Let Em 2(U ) < ∞. Let the kernel K (•) satisfy therestri ctionsof Theorem 3.3. Let the sequence {h n } of positive numbers satisfy (4.3) and (4.6). Then, convergence (4.7) takes place at almost every  (ζ ) point u  ∈  R, where  ζ  is the probability measure of U .

    Estimate (4.1) converges if the number sequence satisfies (4.3) and (4.4), while

    (4.2) if (4.3) and (4.6) hold. Thus, for h n  = cn −δ with c  > 0, both converge if 0 <δ

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    45/401

    4.2 Consistency and convergence rate   33

    andfind

    bias[ ˜f   n (u )] =  E 

      ˜f   n (u ) −   f  (u )   K (v)d v = 1n 

    i =1    ( f  (u + vh i ) −   f  (u ))K (−v)d v.Applying(A.17), weobtain

    bias[ f̃   n (u )] =1

    n i =1

    h q −1/2i 

    =  O 

    1

    n i =1

    h q −1/2i 

    .

    Recalling(4.11), wefind

    E ( f̃   n (u )−

      f  (u ))2

    = O 

     1

    n 2   n 

    i =1 h q −1/2i  2

    + O  1

    n 2

    i =11

    h i withthefirst termincurred by squared biasand theother by variance. Hence, for

    h n  ∼ n −1/2q ,   (4.8)that is, thesameas in (3.17) applied in theofflineestimate,

    E ( f̂  (u ) −   f  (u ))2 =  O (n −1+1/2q ),Sincethesamerateholds for ḡ n (u ), that is, E (ĝ (u ) − g (u ))2 =  O (n −1+1/2q ), wefinally

    obtainP {|µ̃(u ) − µ(u )| > ε|µ(u )|} =  O (n −1+1/2q )

    for any ε > 0, and

    |µ̃(u ) − µ(u )| =  O (n −1/2+1/4q ) as n  → ∞ in probability.Considering estimate(4.2) next, for obvious reasons, wewrite

    bias[ f̄   n (u )] =1

    n i =1 h i n 

    i =1O (h 

    q +1/2i    ) =  O 

    n i =1 h 

    q +1/2i 

    n i =1 h i  and, dueto(4.12),

    E ( f̄   n (u ) −   f  (u ))2 =  O 

    i =1 h q +1/2i 

    2n 

    i =1 h i 2

    + O    1n i =1 h i 

    ,

    which, for h n  selected as in (4.8), becomes

    E ( f̄   n (u ) −   f  (u ))2 =  O (n −1+1/2q ).

    Since E (ḡ n (u ) − g (u ))2 =  O (n −1+1/2q ), wecometotheconclusionthatP {|µ̄n (u ) − µ(u )| > ε|µ(u )|} =  O (n −1+1/2q )

    for any ε > 0, and

    |µ̄n (u ) − µ(u )| =  O (n −1/2+1/4q ) as n  → ∞ in probability.

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    46/401

    34   Semirecursive kernel algorithms

    If theq thderivatives of both   f  (u ) and g (u ) arebounded, using (A.18), weobtain

    P {|

    µ̃(u )−

    µ(u )|

    > ε|µ(u )

    |} = O (n −1+1/(2q +1))

    for any ε > 0, and

    |µ̃(u ) − µ(u )| =  O (n −1/2+1/(4q +2)) as n  → ∞ in probability,

    that is, somewhat faster convergence. Thesamerateholdsalsofor µ̄n (u ).

    4.3 Simulation example

    In the system as in Section 2.2.2,  a  = 0.5 and  Z n  = 0. Since µ(u ) = m (u ), we justestimate m (u ) and rewritethemin thefollowing forms:

    m̃ n (u ) =

    n i =1

    1

    h i Y 1+i  K 

    u − U i 

    h i 

    i =1

    1

    h i K 

    u − U i 

    h i 

      ,   (4.9)and

    m̄ n (u ) =

    n i =1

    Y 1+i  K 

    u − U i h i 

    i =1K 

    u − U i 

    h i 

      .   (4.10)

    For the rectangular kernel and h n  = n −1/5, the MISE for both estimates is showninFigure5.5 in Section 5.4. For h n  = n −δ with δ varyingin theinterval [−0.25, 1.5], theerror is shownin Figures 4.1and 4.2.

    4

    3

    2

    1

    0

    5

    -0.25 0 0.25 0.5 0.75 1 1.25 1.5

    10 20 40

    80 160 320

    640 1280

    δ

    Figure 4.1   Estimate(4.9); MISE versus δ, variousn ; h n  = n −δ (Section4.3).

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    47/401

    4.4 Proofs and lemmas   35

    4

    3

    2

    1

    0

    5

    -0.25 0 0.25 0.5 0.75 1 1.25 1.5

    10 20 40

    80 160 320

    640 1280

    δ

    Figure 4.2   Estimate(4.10); MISE versus δ, variousn ; h n  = n −δ (Section4.3).

    4.4 Proofs and lemmas

    4.4.1 Lemmas

    ThesystemL E M M A  4.1 Let U have a probabili ty density f   (•). Let Em (U ) = 0 and   var[m (U )]< ∞. Let n  = 0. Let kernel satisfy (3.4), (3.5). I f (3.6) holds with  ε = 0, then,

    suph >0,H >0

    covW p +i  1h K 

    u − U i h 

    , W p 

    1

    H K 

    u − U 0

    ≤ (|λp λp +i − j | + |λp λp −i + j | + |λp +i − j λ p −i + j |)ρ(u ),

    where  ρ(u )  is finite at every continuity point u of both m (•) and f   (•). I f    ε > 0, the property holds at almost every u  ∈  R.

    Proof.   As  W p +i  = p +i q =−∞ λp +i −q m (U q ) and W p  = p r =−∞ λp −r m (U r ), the covari-ancein theassertion equals (seeLemmaC.2)

    p +i q =−∞

    p r =−∞

    λp +i −q λ p −r  cov

    m (U q )1

    h K 

    u − U i 

    , m (U r )

     1

    H K 

    u − U 0

    ,

    whichis equal to

    = λp λp +i − j 1

    h E K u − U h    1H  E m 2(U )K u − U H  + λ p λp −i + j 

    1

    h E 

    u − U 

     1

    h E 

    m 2(U )K 

    u − U 

    + λ p +i − j λp −i + j 

    1

    h E 

    m (U )K 

    u − U 

      1

    H E 

    m (U )K 

    u − U 

    .

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    48/401

    36   Semirecursive kernel algorithms

    Let u  beapointwherebothm (•) and f  (•) arecontinuous. Itsufficestoapply LemmaA.8to find that thefollowing formulas

    suph =0

    1h E K u − U h  ,suph =0

    m (U )1h K 

    u − U h 

    ,suph =0

    m 2(U )1h K 

    u − U h 

    ,are finite at every continuity point of both m (•) and   f  (•). The “almost everywhere”versionof thelemmacan beverified in asimilar way.

    In thenext lemma, U   has an arbitrary distribution.

    L E M M A  4.2 Let E m (U ) = 0and var[m (U )] < ∞. I f the kernel satisfies the restr icti ons of Theorem 3.3, then 

    suph >0,H >0

    cov

    W n +p K 

    u − U n 

    , W p K 

    u − U 0

    E K 

    u − U 

    h  E K 

    u − U 

    ≤ (|λp λp +i − j | + |λ p λp −i + j | + |λp +i − j λp −i + j |)η(u ),

    where η(u ) is finite at almost every  (ζ ) u  ∈  R, where ζ  i s the distribution of U .Proof.   It suffices toapply argumentsused in theproof of Lemma3.3.

    Number sequencesL E M M A  4.3 If (4.3) and (4.4) hold, then 

    limn →∞

    1

    n 1

    n 2

    n i =1

    1

    h i 

    = 0.

    Proof.   From

    n 2 =

      n i =1

    h 1/2i 

    1

    h 1/2i 

    2≤

    n i =1

    h i 

    n i =1

    1

    h i 

    it followsthat

    1n 

    1

    n 2

    n i =1

    1

    h i 

    ≤ 1n 

    n i =1

    h i ,

    whichconverges tozero as n  → ∞.

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    49/401

    4.4 Proofs and lemmas   37

    L E M M A  4.4( TOEPLITZ)   I f  n 

    i =1 a n  → ∞ and x n  →  x as n  → ∞, then 

    n i =1

    a n x n 

    n i =1

    a n 

    → x as n  → ∞.

    Proof.   The proof is immediate. For any ε > 0, there exists N   such that |x n | < ε   forn  >  N . Hence,

    n i =1 a n x n n i =1 a n 

    − x  = N i =1 a n (x n  − x )n i =1 a n 

    + n i =N +1 a n (x n  − x )n i =1 a n 

    ,

    wherethefirsttermisboundedinabsolutevalueby c /n 

    i =1 a n  forsomec , andtheotherby ε.

    4.4.2 Proofs

    Proof of Theorem4.1Wegivethecontinuousversionof theproof. Toverify the“almost everywhere” version,

    it suffices toapply LemmaA.9 rather than LemmaA.8.Supposethat bothm (•) and  f  (•) arecontinuous at u  ∈  R . Westart fromtheobserva-tion that

    E g̃ n (u ) =1

    n i =1

    1

    h i E 

    Y p  | U 0

    u − U 0

    h i 

    = 1n 

    n i =1

    1

    h i E 

    µ(U )K 

    u − U 

    h i 

    .

    Since

    1

    h i E 

    µ(U )K 

    u − U 

    h i 

    → g (u )

       K (v)d v as i  → ∞,

    (see Lemma A.8) we conclude that   E g̃ n (u ) → g (u ) 

      K (v)d v   as  n  → ∞, where,according toour notation, g (u ) = µ(u ) f  (u ).

     To examinevariance, wewritevar[g̃ n (u )] =  P n (u ) + Q n (u ) + R n (u ) with

    P n (u ) = σ 2Z  1n 2n 

    i =1

    1h 2i 

    varK u − U h i 

    ,

    Q n (u ) =1

    n 2

    n i =1

    var

    W p 

    1

    h i K 

    u − U 0

    h i 

    ,

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    50/401

    38   Semirecursive kernel algorithms

    and

    R n (u ) =1

    n 2

    i =1

     j =1 j =i 

    covW p +i  1h i  K u − U i h i  , W p + j  1h  j  K u − U  j h  j  = 1

    n 2

    n i =1

    n  j =1 j =i 

    cov

    W p +i − j 

    1

    h i K 

    u − U i − j 

    h i 

    , W p 

    1

    h  j K 

    u − U 0

    h  j 

    .

    Since

    P n (u ) = σ 2Z 1

    n 2

    i =11

    h i   1

    h i E K  2

    u − U 

    h i  − h i 

    1

    h 2i E 2K 

    u − U 

    h i  ,

    usingLemmaA.8, wefindthequantity insquarebracketsconvergesto f  (u ) 

      K  2(v)d v

    asi  → ∞.Noticingthat∞n =1 h −1n    = ∞ andapplyingToeplitzLemma4.4,weconcludethat

    11

    n 2

    n i =1

    1h i 

    P n (u ) → σ 2Z   f  (u ) 

      K  2(v)d v as n  → ∞.

    For thesamereasons, observing

    Q n (u ) =1

    n 2

    n i =1

    1

    h i  1h i  E φ(U )K  2u − U h i  − h i  1h 2i  E 2K u − U h i  ,

    whereφ(•) is as in (2.7), weobtain1

    1n 2

    n i =1

    1h i 

    Q n (u ) → φ(u ) f  (u ) 

      K  2(v)d v as n  → ∞.

    Moreover, usingLemma4.1,

    |R n (u )

    | ≤1

    n 2ρ(u )

    i =1n 

     j =1(|λ p λp +i − j | + |λp λp −i + j | + |λp +i − j λp −i + j |)≤ 1

    n ρ(u )(max

    n |λn |)

    ∞n =1

    |λn | =  O 

    1

    .

    Using Lemma 4.3, we conclude that R n (u ) vanishes faster than both P n (u ) and Q n (u )

    andthen weobtain

    11

    n 2 n i =1

    1h i 

    var[g̃ n (u )] → (σ 2Z  + φ(u ) f  (u )) 

      K  2(v)d v as n  → ∞.   (4.11)

    For similar reasons, E  f̃   n (u ) →   f  (u )   K (v)d v as n  → ∞, and1

    1n 2

    n i =1

    1h i 

    var[ f̃   n (u )] →   f  (u ) 

      K  2(v)d v as n  → ∞,

    whichcompletes theproof.

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    51/401

    4.4 Proofs and lemmas   39

    Proof of Theorem4.2Supposethat bothm (•) and  f  (•) arecontinuous at apoint u  ∈  R . Evidently,

    E ḡ n (u ) = 1n i =1 h i 

    n i =1

    h i 1

    h i E 

    Y p  | U 0 K u − U 0h i 

    = 1n 

    i =1 h i 

    n i =1

    h i 1

    h i E 

    µ(U )K 

    u − U 

    h i 

    .

    Since(4.6) holds and

    1

    h i E 

    µ(U )K 

    u − U 

    h i 

    → g (u )

       K (v)d v as n  → ∞,

    (seeLemmaA.8) an applicationof Toeplitz lemma4.4 gives

    E ḡ n (u ) → g (u ) 

      K (v)d v as n  → ∞.

     To examinevariance, wewritevar[ḡ n (u )] =  P n (u ) + Q n (u ) + R n (u ), where

    P n (u ) = σ 2Z 1n 

    i =1 h i 2 n 

    i =1var

    u − U 

    h i 

    ,

    Q n (u ) =1n 

    i =1 h i 2

    i =1 varW p K 

    −U 0

    h i  ,and

    R n (u ) =1n 

    i =1 h i 2 n 

    i =1

    n  j =1 j =i 

    cov

    W p +i  K 

    u − U i 

    h i 

    , W p + j  K 

    u − U  j 

    h  j 

    = 1

    n i 

    =1 h i 

    2

    i =1n 

     j =1 j =i cov

    W p +i − j  K 

    u − U i − j 

    h i 

    , W p K 

    u − U 0

    h  j 

    .

    Since

    P n (u ) = σ 2Z 1n 

    i =1 h i P 1n (u )

    with

    P 1n (u ) =1

    n i =1 h i 

    n i =1

    h i 

     1

    h i E K  2

    u − U 

    h i 

    − h i 

    1

    h 2i E 2K 

    u − U 

    h i 

    converging, dueto(4.6) and Toeplitz lemma4.4, tothesamelimit as1

    h n E K  2

    u − U 

    h n 

    − h n 

    1

    h 2n E 2K 

    u − U 

    h n 

    ,

    weget P n (u )n 

    i =1h i  → σ 2Z   f  (u )

       K  2(v)d v as n  → ∞.

  • 8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

    52/401

    40   Semirecursive kernel algorithms

    For thesamereasons, observing

    Q n (u ) = 1n i =1 h i 

    n i =1

    h i   1h i 

    E φ(U )K  2u − U h i 

    − h i  1h 2i 

    E 2K u − U h i 

    ,whereφ(•) is as in (2.7), weobtain Q n (u )

    n i =1

    h i  → φ(u ) f  (u ) 

      K  2(v)d v as n  → ∞.Applying Lemma4.1, weget

    R n (u )n 

    i =1h i  ≤ ρ(u )

      1

    n i 

    =1 h i 

    i =1h i 

     j =1h  j (|λ p λp +i − j | + |λp λ p −i + j |

    +|λp +i − j λp −i + j |) ≤ 3ρ(u )(maxn 

    h n )(maxn 

    |λn |)  1n 

    i =1 h i 

    n i =1

    h i αi ,

    where αi  =∞

     j =i −p  |λi |. Since limi →∞ αi  = 0, applying Toeplitz lemma 4.4, we getlimn →∞ R n (u )

    n i =1 h i  = 0, which means that  R n (u ) vanishes faster than both P n (u )

    and Q n (u ). Finally,

    var[ḡ n (u )]n 

    i =1 h i  → σ 2Z  + φ(u ) f  (u )    K  2(v)d v as n  → ∞.   (4.12)Since, for the same reasons,   E  f̄   n (u ) →   f  (u )

       K (v)d v   as   n  → ∞   and

    var[ f̄   n (u )]n 

    i =1 h i  →   f  (u ) 

      K  2(v)d v as n  → ∞, theproof has been completed.

    Proof of Theorem4.3Eachconvergencein theproof holds for almost every (ζ ) u  ∈  R . In apreparatory step,weshow that

    ∞n =