Michael Ulbrich Nonsmooth Newton-like Methods for ... · Michael Ulbrich Nonsmooth Newton-like...

MichaelUlbrich

NonsmoothNewton-likeMethods forVariational Inequalities and Constrained

Optimization Problemsin Function Spaces

TechnischeUniversitatMunchen

Fakultat fur Mathematik

June2001,revisedFebruary2002

Table of Contents

1. Intr oduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Examplesof Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 OptimalControlProblems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.2 VariationalInequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Motivationof theMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.1 Finite-DimensionalVariationalInequalities . . . . . . . . . . . . . . 91.2.2 Infinite-DimensionalVariationalInequalities. . . . . . . . . . . . . 13

1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2. Elementsof Finite-DimensionalNonsmoothAnalysis. . . . . . . . . . . . . . 172.1 GeneralizedDifferentials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Semismoothness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 SemismoothNewton’sMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.4 HigherOrderSemismoothness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.5 Examplesof SemismoothFunctions. . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.5.1 TheEuclideanNorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.5.2 TheFischer–BurmeisterFunction . . . . . . . . . . . . . . . . . . . . . . 242.5.3 PiecewiseDifferentiableFunctions . . . . . . . . . . . . . . . . . . . . . 24

2.6 Extensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3. NewtonMethods for SemismoothOperator Equations . . . . . . . . . . . . 293.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 NewtonMethodsfor AbstractSemismoothOperators. . . . . . . . . . . . 34

3.2.1 SemismoothOperatorsin BanachSpaces. . . . . . . . . . . . . . . . 343.2.2 BasicProperties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.2.3 SemismoothNewton’sMethod. . . . . . . . . . . . . . . . . . . . . . . . . 373.2.4 InexactNewton’sMethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.5 ProjectedInexactNewton’sMethod. . . . . . . . . . . . . . . . . . . . . 413.2.6 AlternativeRegularityConditions . . . . . . . . . . . . . . . . . . . . . . 42

3.3 SemismoothNewtonMethodsfor SuperpositionOperators. . . . . . . . 443.3.1 Assumptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.3.2 A GeneralizedDifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3.3 Semismoothnessof SuperpositionOperators. . . . . . . . . . . . . 493.3.4 Illustrations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

II Tableof Contents

3.3.5 Proofof theMain Theorems. . . . . . . . . . . . . . . . . . . . . . . . . . . 553.3.6 SemismoothNewtonMethods . . . . . . . . . . . . . . . . . . . . . . . . . 603.3.7 SemismoothCompositeOperatorsandChainRules . . . . . . . 643.3.8 FurtherPropertiesof theGeneralizedDifferential . . . . . . . . . 66

4. SmoothingStepsand Regularity Conditions . . . . . . . . . . . . . . . . . . . . . 694.1 SmoothingSteps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.2 A NewtonMethodwithoutSmoothingSteps. . . . . . . . . . . . . . . . . . . . 704.3 SufficientConditionsfor Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5. Variational Inequalities and Mixed Problems . . . . . . . . . . . . . . . . . . . . 795.1 Applicationto VariationalInequalities. . . . . . . . . . . . . . . . . . . . . . . . . 79

5.1.1 Problemswith Bound-Constraints. . . . . . . . . . . . . . . . . . . . . . 795.1.2 PointwiseConvex Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.2 MixedProblems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.2.1 Karush–Kuhn–TuckerSystems. . . . . . . . . . . . . . . . . . . . . . . . . 895.2.2 Connectionsto theReducedProblem. . . . . . . . . . . . . . . . . . . . 935.2.3 RelationsbetweenFull andReducedNewtonSystem. . . . . . 955.2.4 SmoothingSteps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.2.5 RegularityConditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6. Trust-RegionGlobalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.1 TheTrust-RegionAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056.2 GlobalConvergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1086.3 ImplementableDecreaseConditions. . . . . . . . . . . . . . . . . . . . . . . . . . . 1146.4 Transitionto FastLocalConvergence. . . . . . . . . . . . . . . . . . . . . . . . . . 116

7. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1217.1 DistributedControlof aNonlinearElliptic Equation . . . . . . . . . . . . . 121

7.1.1 Black-BoxApproach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1247.1.2 All-at-OnceApproach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287.1.3 FiniteElementDiscretization. . . . . . . . . . . . . . . . . . . . . . . . . . 1297.1.4 DiscreteBlack-Box-Approach. . . . . . . . . . . . . . . . . . . . . . . . . 1327.1.5 EfficientSolutionof theNewtonSystem. . . . . . . . . . . . . . . . . 1387.1.6 DiscreteAll-at-OnceApproach . . . . . . . . . . . . . . . . . . . . . . . . 142

7.2 NumericalResults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1427.2.1 UsingMultigrid Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . 1437.2.2 Black-BoxApproach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457.2.3 All-at-OnceApproach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487.2.4 NestedIteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1507.2.5 Discussionof theResults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.3 ObstacleProblems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1527.3.1 DualProblem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1537.3.2 RegularizedDual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1557.3.3 Discretization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

Tableof Contents III

7.3.4 NumericalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

8. Optimal Control of the IncompressibleNavier–StokesEquations. . . . 1678.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1678.2 FunctionalAnalytic Settingof theControlProblem. . . . . . . . . . . . . . 168

8.2.1 FunctionSpaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1688.2.2 TheControlProblem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

8.3 Analysisof theControlProblem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1718.3.1 StateEquation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1718.3.2 Control-to-StateMapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1758.3.3 Adjoint Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1768.3.4 Propertiesof theReducedObjectiveFunction . . . . . . . . . . . . 179

8.4 Applicationof SemismoothNewtonMethods. . . . . . . . . . . . . . . . . . . 181

9. Optimal Control of the CompressibleNavier–StokesEquations . . . . . 1839.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1839.2 TheFlow ControlProblem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1839.3 Adjoint-BasedGradientComputation. . . . . . . . . . . . . . . . . . . . . . . . . . 1859.4 SemismoothBFGS-NewtonMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . 186

9.4.1 Quasi-NewtonBFGS-Approximations. . . . . . . . . . . . . . . . . . 1869.4.2 TheAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

9.5 NumericalResults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

A. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191A.1 Adjoint Approachfor OptimalControlProblems. . . . . . . . . . . . . . . . 191

A.1.1 Adjoint Representationof theReducedGradient. . . . . . . . . . 191A.1.2 Adjoint Representationof theReducedHessian. . . . . . . . . . . 192

A.2 SeveralInequalities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194A.3 ElementaryPropertiesof Multifunctions . . . . . . . . . . . . . . . . . . . . . . . 194A.4 NemytskijOperators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Acknowledgments

It is my greatpleasureto thankProf. Dr. KlausRitter for his constantsupportandencouragementover thepasttenyears.Furthermore,I would like to thankProf.Dr.JohannEdenhoferwhostimulatedmy interestin optimalcontrolof PDEs.

My scientific work benefitedsignificantly from two very enjoyable and fruit-ful researchstaysat the Departmentof Computationaland Applied Mathematics(CAAM) andtheCenterfor Researchon ParallelComputation(CRPC),RiceUni-versity, Houston,Texas.Thesevisits weremadepossibleby Prof. JohnDennisandProf. MatthiasHeinkenschloss.I am very thankful to bothof themfor their hospi-tality andsupport.During my secondstayat Rice University, I laid the foundationof a large part of this work. The visits were fundedby the ForschungsstipendiumUl157/1-1andtheHabilitandenstipendiumUl157/3-1of theDeutscheForschungs-gemeinschaft,and by CRPC grant CCR-9120008.This supportis gratefully ac-knowledged.

The computationalresultsin chapter9 for the boundarycontrol of the com-pressibleNavier–Stokesequationsbuild on joint work with Prof.ScottCollis, Prof.MatthiasHeinkenschloss,Dr. KavehGhayour, andDr. StefanUlbrich aspartof theRice AeroAcousticControl (RAAC) project,which is directedby ScottCollis andMatthiasHeinkenschloss.I thankall RAAC groupmembersfor allowing meto usetheir contributions to the project for my computations.In particular, ScottCollis’Navier–Stokessolver wasvery helpful. The computationsfor chapter9 wereper-formedonanSGIOrigin 2000atRiceUniversitywhichwaspurchasedwith theaidof NSF SCREMSgrant98–72009.I am very thankful to MatthiasHeinkenschlossfor giving me accessto this machine.Furthermore,I would like to thankProf. Dr.FolkmarBornemannfor theopportunityto usehisSGIOrigin 200for computations.

I also would like to acknowledgethe ZentrumMathematik,TechnischeUni-versitat Munchen,for providing a very pleasantandprofessionalworking environ-ment.In particular, I amthankfulto themembersof ourRechnerbetriebsgruppe,Dr.MichaelNast,Dr. AndreasJohann,andRolf Schone,for their goodsystemadmin-istrationandtheirhelpfulness.

In makingtheideasfor this work concrete,I profitedfrom an inspiringconver-sationwith Prof.Liqun Qi, Prof.Danny Ralph,andPDDr. ChristianKanzow duringtheICCP99meetingin Madison,Wisconsin,which I would like to acknowledge.

Finally, I wish to thankmy parents,MargotandPeter, andmy brotherStefanforalwaysbeingtherefor me.

1. Intr oduction

A centralthemeof appliedmathematicsis thedesignof accuratemathematicalmod-elsfor avarietyof technical,financial,medical,andmany otherapplications,andthedevelopmentof efficientnumericalalgorithmsfor theirsolution.Often,thesemodelscontainparametersthatshouldbeadjustedin anoptimalway, eitherto maximizetheaccuracy of themodel(parameteridentification),or to control thesimulatedsystemin a desiredway (optimal control).Sinceoptimizationwith simulationconstraintsis morechallengingthansimulationalone(which alreadycanbe very involved onits own), the developmentandanalysisof efficient optimizationmethodsis crucialfor theviability of thisapproach.Besidestheoptimizationof systems,minimizationproblemsandvariationalinequalitiesoften arisealreadyin the processof buildingmathematicalmodels;this, e.g.,appliesto contactproblems,free boundaryprob-lems,andelastoplasticproblems[47, 62,63,97,98,117].

Mostof thevariationalproblemsmentionedsofar join thepropertythatthey arecontinuousin timeand/orspace,sothatinfinite-dimensionalfunctionspacesprovidetheappropriatesettingfor theiranalysis.Sinceessentialinformationon theproblemto solveis carriedby thepropertiesof theunderlyinginfinite-dimensionalspaces,thesuccessfuldesignof robustandmesh-independentoptimizationmethodsrequiresathoroughconvergenceanalysisin this infinite-dimensionalfunction spacesetting.Thepurposeof this work is to developandanalyzea classof Newton-typemethodsfor thesolutionof optimizationproblemsandvariationalinequalitiesthatareposedin function spacesand containpointwiseinequality constraints. A representativeprototypeof theproblemsweconsiderhereis thefollowing:

Bound-ConstrainedVariational Inequality Problem(VIP):

Findu ∈ Lp(Ω) suchthat:

u ∈ B def= v ∈ Lp(Ω) : a ≤ v ≤ b on Ω,〈F (u), v − u〉 ≥ 0 for all v ∈ B.

(1.1)

Hereby, 〈u, v〉 =∫Ωu(ω)v(ω)dω, andF : Lp(Ω) → Lp

′(Ω) with p, p′ ∈ (1,∞],

1/p + 1/p′ ≤ 1, is an (in generalnonlinear)operator, whereLp(Ω) is the usualLebesguespaceon theboundedLebesguemeasurablesetΩ ⊂ Rn. We assumethatΩ haspositive Lebesguemeasure,so that0 < µ(Ω) < ∞. TheserequirementsonΩ areassumedthroughoutthis work. In casethis is needed(e.g.,for embeddings),but not explicitly stated,we assumethatΩ is nonempty, open,andboundedwith

2 1. Introduction

sufficiently smoothboundary∂Ω. The lower- andupperboundfunctionsa andbmay be presentonly on measurablepartsΩa andΩb of Ω, which is achieved bysettinga|Ω\Ωa

= −∞ andb|Ω\Ωb= +∞, respectively. We assumethatthenatural

extensionsby zeroof a|Ωaandb|Ωb

toΩ areelementsof Lp(Ω). We alsorequireaminimumdistanceν > 0 of theboundsfrom eachother, i.e.,b− a ≥ ν onΩ. In thedefinition of B, andthroughoutthis work, relationsbetweenmeasurablefunctionsaremeantto holdpointwisealmosteverywhereonΩ in theLebesguesense.Variousextensionsof problem(1.1)will alsobeconsideredandarediscussedbelow.

In many situations,the VIP (1.1) describesthe first-ordernecessaryoptimalityconditionsof thebound-constrainedminimizationproblem

minimize j(u) subjectto u ∈ B. (1.2)

In this case,F is the Frechetderivative j′ : Lp(Ω) → Lp(Ω)∗ of the objectivefunctionalj : Lp(Ω)→ R.

The methodswe aregoing to investigatearebestexplainedby consideringtheunilateralcasewith lower boundsa ≡ 0. Theresultingproblemis callednonlinearcomplementarityproblem(NCP):

u ∈ Lp(Ω), u ≥ 0, 〈F (u), v − u〉 ≥ 0 for all v ∈ Lp(Ω), v ≥ 0. (1.3)

As we will see,and as might be obvious to the reader, (1.3) is equivalent to thepointwisecomplementaritysystem

u ≥ 0, F (u) ≥ 0, uF (u) = 0 onΩ. (1.4)

The basicidea,which wasdevelopedin the ninetiesfor the numericalsolutionoffinite-dimensionalNCPs,consistsin the observation that (1.3) is equivalentto theoperatorequation

Φ(u) = 0, where Φ(u) = φ(u(ω), F (u)(ω)

)ω ∈ Ω. (1.5)

Hereby, φ : R2 → R is anNCP-function,i.e.,

φ(x) = 0 ⇐⇒ x1, x2 ≥ 0, x1x2 = 0.

Wewill developasemismoothnessconceptthatis applicableto theoperatorsarisingin (1.5)andthatallowsusto developaclassof Newton-typemethodsfor thesolutionof (1.5).Theresultingalgorithmshave,astheirfinite-dimensionalcounterparts– thesemismoothNewtonmethods– severalremarkableproperties:

(a) Themethodsarelocally superlinearlyconvergent,andthey convergewith q-rate> 1 underslightly strongerassumptions.

(b) Althoughaninequalityconstrainedproblemis solved,only onelinearoperatorequationhasto besolvedper iteration.Thus,thecostper iterationis compara-ble to thatof Newton’s methodfor smoothoperatorequations.We remarkthatsequentialquadraticprogramming(SQP)algorithms,whichareveryefficient in

1. Introduction 3

practice,requirethesolutionof aninequalityconstrainedquadraticprogramperiteration,which canbesignificantlymoreexpensive.Thus,it is alsoattractiveto combineSQPmethodswith theclassof Newton methodswe describehere,eitherby usingtheNewtonmethodfor solvingsubproblems,or by rewriting thecomplementarityconditionsin theKuhn–Tuckersystemasoperatorequation.

(c) Theconvergenceanalysisdoesnotrequireastrictcomplementarityconditiontohold. Therefore,we canprove fastconvergencealsofor thecasewherethesetω : u(ω) = 0, F (u)(ω) = 0 haspositivemeasureat thesolutionu.

(d) Thesystemsthathave to besolvedin eachiterationareof theform

[d1 · I + d2 · F ′(u)]s = −Φ(u), (1.6)

whereI : u 7→ u is the identity andF ′ denotesthe Frechetderivative of F .Further, d1, d2 arenonnegativeL∞-functionsthat arechosendependingon uandsatisfy0 < γ1 < d1 + d2 < γ2 on Ω uniformly in u. More precisely:(d1, d2) is ameasurableselectionof themeasurablemultifunction

ω ∈ Ω 7→ ∂φ(u(ω), F (u)(ω)

),

where∂φ is Clarke’sgeneralizedgradientof φ. As wewill see,in typicalappli-cationsthesystem(1.6)canbesymmetrizedandisnotmuchharderto solvethana systeminvolving only the operatorF ′(u), which would arisefor the uncon-strainedproblemF (u) = 0. In particular, fastsolverslike multigrid methods,preconditionediterative solvers,etc.,canbeappliedto solve(1.6).

(e) The methodis not restrictedto the problemclass(1.1). Among the possibleextensionswealsoinvestigatevariationalinequalityproblemsof theform (1.1),but with thefeasiblesetB replacedby

C = u ∈ Lp(Ω)m : u(ω) ∈ C on Ω, C ⊂ Rm closedandconvex.

Furthermore,we will considermixed problems,whereF (u) is replacedbyF (y, u) andwherewe have the additionaloperatorequationE(y, u) = 0. Inparticular, suchproblemsariseasthefirst-ordernecessaryoptimalityconditions(Karush–Kuhn–Tuckeror KKT-conditions)of optimizationproblemswith opti-mal controlstructure

minimize J(y, u) subjectto E(y, u) = 0, u ∈ C.

(f) Otherextensionsarepossiblethat we do not cover in this work. For instance,certain quasivariational inequalities[12, 13], i.e., variational inequalitiesforwhich thefeasiblesetdependsonu (e.g.,a = A(u), b = B(u)), canbesolvedby ourclassof semismoothNewtonmethods.

For illustration,webegin with examplesof two problemclassesthatfit in theaboveframework.

4 1. Introduction

1.1 Examplesof Applications

1.1.1 Optimal Control Problems

Let begiventhestatespaceY (aBanachspace),thecontrol spaceU = Lp(Ω), andthesetB ⊂ U of admissibleor feasiblecontrolsasdefinedin (1.1).Thestatey ∈ Yof thesystemunderconsiderationis governedby thestateequation

E(y, u) = 0, (1.7)

whereE : Y × U → W ∗ andW ∗ denotesthe dual of a reflexive BanachspaceW . In our context, the stateequationusually is given by the weakformulationofa partialdifferentialequation(PDE),includingall boundaryconditionsthatarenotalreadycontainedin the definition of Y . Supposethat, for every control u ∈ U ,the stateequation(1.7) possessesa uniquesolutiony = y(u) ∈ Y . The controlproblemconsistsin finding a control u suchthat the pair (y(u), u) minimizesagivenobjective functionJ : Y × U → R amongall feasiblecontrolsu ∈ B. Thus,thecontrolproblemis

minimizey∈Y,u∈U

J(y, u) subjectto (1.7) and u ∈ B. (1.8)

Alternatively, we canusethestateequationto expressthestatein termsof thecon-trol, y = y(u), andto write thecontrolproblemin theequivalentreducedform

minimize j(u) subjectto u ∈ B, (1.9)

with thereducedobjectivefunctionj(u) def= J(y(u), u). By theimplicit functionthe-orem,the continuousdifferentiability of y(u) in a neighborhoodof u follows if Eis continuouslydifferentiableandEy(y(u), u) is continuouslyinvertible.Further, ifin additionJ is continuouslydifferentiablein a neighborhoodof (y(u), u) thenj iscontinuouslydifferentiablein aneighborhoodof u. In thesameway, differentiabilityof higherordercanbeensured.For problem(1.9), thegradientj′(u) ∈ U∗ is givenby

j′(u) = Ju(y, u) + yu(u)∗Jy(y, u),

with y = y(u). Alternatively, j′ canberepresentedvia theadjointstatew = w(u) ∈W , which is thesolutionof theadjoint equation

Ey(y, u)∗w = −Jy(y, u),

wherey = y(u). As discussedin moredetail in appendixA.1, thegradientof j canbewritten in theform

j′(u) = Ju(y, u) + Eu(y, u)∗w.

Adjoint-basedexpressionsfor the secondderivative j′′ arealsoavailable,seeap-pendixA.1.

1.1 Examplesof Applications 5

We now make the examplemore concreteand consideras stateequationthePoissonproblemwith distributedcontrolon theright handside,

−∆y = u on Ω, y = 0 on ∂Ω, (1.10)

andanobjective functionof trackingtype

J(y, u) =12

∫Ω

(y − yd)2dx+λ

2

∫Ω

u2dx.

Hereby,Ω ⊂ Rn is a nonemptyandboundedopenset,yd ∈ L2(Ω) is a targetstatethat we would like to achieve aswell aspossibleby controllingu, andthe secondterm is for the purposeof regularization(the parameterλ > 0 is typically verysmall,e.g.,λ = 10−3). We incorporatetheboundaryconditionsinto thestatespaceby choosingY = H1

0 (Ω), theSobolev spaceof functionsvanishingon∂Ω. For thecontrolspacewe chooseU = L2(Ω). Thecontrolproblemthusis

minimizey∈H1

0 (Ω),u∈L2(Ω)

12

∫Ω

(y − yd)2dx+λ

2

∫Ω

u2dx

subjectto −∆y = u, u ∈ B.(1.11)

DefiningtheoperatorE : Y × U 7→W ∗ def= Y ∗, E(y, u) = −∆y − u, wecanwritethestateequationin the form (1.7).We identify L2(Ω) with its dualandintroducetheGelfandtriples

H10 (Ω) = Y → U = L2(Ω) → Y ∗ = H−1(Ω).

Then

Jy(y, u) = y − yd, Ju(y, u) = λu,

Eu(y, u)v = −v ∀ v ∈ U, Ey(y, u)z = −∆z ∀ z ∈ Y.

Therefore,theadjointstatew ∈W ∗∗ = W = H10 (Ω) is givenby

−∆w = yd − y on Ω, w = 0 on ∂Ω, (1.12)

wherey solves (1.10). Note that in (1.12) the boundaryconditionscould also beomitted becausethey are alreadyenforcedby w ∈ H1

0 (Ω). The gradientof thereducedobjective functionj thusis

j′(u) = Ju(y, u) +Eu(y, u)∗w = λu− w

with y = y(u) andw = w(u) solutionsof (1.10) and (1.12), respectively. Thisproblemhasthe following propertiesthat arecommonto many control problemsandwill beof uselateron:

6 1. Introduction

• The mappingu 7→ w(u) possessesa smoothingproperty. In fact, w is asmooth(in this simpleexampleevenaffine linearandbounded)mappingfromU = L2(Ω) to W = H1

0 (Ω), which is continuouslyembeddedin Lp′(Ω) for

appropriatep′ > 2. If the boundaryof Ω is sufficiently smooth,elliptic reg-ularity resultseven imply that the mappingu 7→ w(u) mapssmoothly intoH1

0 (Ω) ∩H2(Ω).• The solution u is containedin Lp(Ω) → U (notethatΩ is bounded)for ap-

propriatep ∈ (2,∞] if the boundssatisfya|Ωa∈ Lp(Ωa), b|Ωb

∈ Lp(Ωb).In fact, let p ∈ (2,∞] besuchthatH1

0 (Ω) → Lp(Ω). As we will seeshortly,j′(u) = λu − w vanishesonΩ0 = ω : a(ω) < u(ω) < b(ω). Thus,usingw ∈ H1

0 (Ω) → Lp(Ω), weconcludeu|Ω0 = λ−1w|Ω0 ∈ Lp(Ω0). OnΩa \Ω0

we have u = a, andonΩb \Ω0 holdsu = b. Hence,u ∈ Lp(Ω).

Therefore,the reducedproblem(1.9) is of the form (1.2). Due to strict convexityof j, it canbe written in the form (1.1) with F = j′, and it enjoys the followingproperties:

Thereexist p, p′ ∈ (2,∞] suchthat

• F : L2(Ω)→ L2(Ω) is continuouslydifferentiable(hereevencontinuousaffinelinear).

• F hasthe form F (u) = λu + G(u), whereG : L2(Ω) → Lp′(Ω) is locally

Lipschitzcontinuous(hereevencontinuousaffine linear).

• Thesolutionis containedin Lp(Ω).

Thisproblemarisesasspecialcasein theclassof nonlinearelliptic controlproblemsthatwediscussin detail in section7.1.

Thedistributedcontrolof theright handsidecanbereplacedbyavarietyof othercontrolmechanisms.Onealternative is Neumannboundarycontrol.To describethisbriefly, let usassumethat theboundary∂Ω is sufficiently smoothwith positive andfinite Hausdorff measure.We considertheproblem

minimizey∈H1(Ω),u∈L2(∂Ω)

12

∫Ω

(y − yd)2dx+λ

2

∫∂Ω

u2dS

subjectto −∆y + y = f onΩ,∂y

∂n= u on∂Ω, u ∈ B,

(1.13)

whereB ⊂ U = L2(∂Ω), f ∈ W ∗ = H1(Ω)∗, and∂/∂n denotesthe outwardnormalderivative.Thestateequationin weakform reads

∀ v ∈ Y :(∇y,∇v)L2(Ω)2 + (y, v)L2(Ω) = 〈f, v〉H1(Ω)∗,H1(Ω) + (u, v|∂Ω)L2(∂Ω),

whereY = H1(Ω). Thiscanbewritten in theformE(y, u) = 0 with E : H1(Ω)×L2(∂Ω)→ H1(Ω)∗. A calculationsimilarasaboveyieldsfor thereducedobjectivefunction

j′(u) = λu− w|∂Ω ,wheretheadjointstatew = w(u) ∈ W = H1(Ω) is givenby

1.1 Examplesof Applications 7

−∆w + w = yd − y onΩ,∂w

∂n= 0 on∂Ω.

UsingstandardresultsonNeumannproblems,we seethatthemappings

u ∈ L2(∂Ω) 7→ y(u) ∈ H1(Ω) 7→ w(u) ∈ H1(Ω)

arecontinuousaffine linear, andthusis

u ∈ L2(∂Ω) 7→ w(u)|∂Ω ∈ H1/2(∂Ω) → Lp′(∂Ω)

for appropriatep′ > 2. Therefore,we havea scenariocomparableto thedistributedcontrolproblem,but now posedon theboundaryof Ω.

1.1.2 Variational Inequalities

As furtherapplication,wediscussavariationalinequalityarisingfromobstacleprob-lems.For q ∈ [2,∞), let g ∈ H2,q(Ω) representa (lower) obstaclelocatedover thenonemptyboundedopensetΩ ⊂ R2 with sufficiently smoothboundary, denotebyy ∈ H1

0 (Ω) the positionof a membrane,andby f ∈ Lq(Ω) external forces.Forcompatibilityweassumeg ≤ 0 on∂Ω. Theny solvestheproblem

minimizey∈H1

0 (Ω)

12a(y, y)− (f, y)L2 subjectto y ≥ g, (1.14)

where

a(y, z) =∑i,j

aij∂y

∂xi

∂z

∂xj,

aij = aji ∈ C1(Ω), anda beingH10 -elliptic. LetA ∈ L(H1

0 ,H−1) betheoperator

inducedby a, i.e.,a(y, z) = 〈y,Az〉H10 ,H

−1 .It canbeshown, seesection7.3and[22], that(1.14)possessesauniquesolution

y ∈ H10 (Ω) andthat,in addition,y ∈ H2,q(Ω). UsingFenchel–Rockafellarduality

[49], an equivalent dual problemcan be derived, which (written as minimizationproblem)assumestheform

minimizeu∈L2(Ω)

12(f + u,A−1(f + u))L2 − (g, u)L2 subjectto u ≥ 0. (1.15)

Thedualproblemadmitsa uniquesolutionu ∈ L2(Ω), which in additionsatisfiesu ∈ Lq(Ω). From the dual solution u we can recover the primal solution y viay = A−1(f + u).

Obviously, the objective function in (1.15) is not L2-coercive, which we com-pensateby addinga regularization.Thisyieldstheobjective function

jλ(u) =12(f + u,A−1(f + u))L2 − (g, u)L2 +

λ

2‖u− ud‖2L2 ,

8 1. Introduction

whereλ > 0 is a (small) parameterandud ∈ Lp′(Ω), p′ ∈ [2,∞), is chosen

appropriately. We will show in section7.3 that the solution uλ of the regularizedproblem

minimizeu∈L2(Ω)

jλ(u) subjectto u ≥ 0 (1.16)

lies in Lp′(Ω) andsatisfies‖uλ − u‖H−1 = o(λ1/2), which implies‖yλ − y‖H1

0=

o(λ1/2), whereyλ = A−1(f + uλ).Sincejλ is strictly convex, problem(1.16)canbewritten in theform (1.1)with

F = j′λ. We have

F (u) = λu+A−1(f + u)− g − λud def= λu+G(u).

UsingthatA ∈ L(H10 ,H

−1) is a homeomorphism,andthatH10 (Ω) → Lp(Ω) for

all p ∈ [1,∞), we concludethat the operatorG mapsL2(Ω) continuouslyaffinelinearly intoLp

′(Ω). Therefore,we see:

• F : L2(Ω)→ L2(Ω) is continuouslydifferentiable(hereevencontinuousaffinelinear).

• F hasthe form F (u) = λu + G(u), whereG : L2(Ω) → Lp′(Ω) is locally

Lipschitzcontinuous(hereevencontinuousaffine linear).

• Thesolutionis containedin Lp′(Ω).

A detaileddiscussionof this problemincludingnumericalresultsis givenin section7.3.In asimilarway, obstacleproblemsontheboundarycanbetreated.Furthermore,time-dependentparabolicvariationalinequalityproblemscanbereduced,by semi-discretizationin time, to asequenceof elliptic variationalinequalityproblems.

1.2 Moti vation of the Method

Theclassof methodsfor solving(1.1)thatweconsiderhereis basedonthefollowingequivalentformulationof (1.1)asa systemof pointwiseinequalities:

(i) a ≤ u ≤ b, (ii) (u− a)F (u) ≤ 0, (iii) (u− b)F (u) ≤ 0 on Ω. (1.17)

OnΩ \Ωa, condition(ii) hasto beinterpretedasF (u) ≤ 0, andonΩ \Ωb condition(iii) meansF (u) ≥ 0. Theequivalenceof (1.1)and(1.17)is easilyverified.In fact,if u is a solutionof (1.1) then (i) holds.Further, if (ii) is violatedon a setΩ′ ofpositivemeasure,wedefinev ∈ B by v = a onΩ′, andv = u onΩ \Ω′, andobtainthecontradiction〈F (u), v − u〉 =

∫Ω′ F (u)(a− u)dω < 0. In thesameway, (iii)

canbeshown to hold.Conversely, if u solves(1.17)then(i)–(iii) imply thatΩ is theunionof thedisjoint setsa < u < b, F (u) = 0, Ω≥ = u = a, F (u) ≥ 0, andΩ≤ u = b, F (u) ≤ 0. Now, for arbitraryv ∈ B, we have

〈F (u), v − u〉 =∫Ω≥

F (u)(v − a)dω +∫Ω≤

F (u)(v − b)dω ≥ 0,

1.2 Motivationof theMethod 9

sothatu solves(1.1).As alreadymentioned,an importantspecialcase,which will provide our main

examplethroughout,is the nonlinearcomplementarityproblem(NCP),which cor-respondsto a ≡ 0 andb ≡ +∞. Obviously, unilateralproblemscanbe convertedto an NCP via the transformationu = u − a, F (u) = F (u + a) in the caseoflower bounds,and u = b − u, F (u) = −F (b − u) in the caseof upperbounds.For NCPs,(1.17)reducesto (1.4). In finite dimensions,theNCP and,moregener-ally, thebox-constrainedvariationalinequalityproblem(which is alsocalledmixedcomplementarityproblem,MCP)havebeenextensively investigatedandthereexistsa significant,rapidly growing body of literatureon numericalalgorithmsfor theirsolution,seesection1.2.1.Hereby, a major role is playedby devicesthat allow toreformulatetheproblemequivalentlyin form of asystemof (nonsmooth)equations.Webegin with adescriptionof theseconceptsin theframework of finite-dimensionalMCPsandNCPs.

1.2.1 Finite-Dimensional Variational Inequalities

Althoughweconsiderfinite-dimensionalproblemsthroughoutthissection1.2.1,wewill work with thesamenotationsasin the functionspacesetting(a, b, u, F , etc.),sincethere is no dangerof ambiguity. In analogyto (1.4), the finite-dimensionalmixedcomplementarityproblemconsistsin findingu ∈ Rm suchthat

ai ≤ ui ≤ bi, (ui − ai)Fi(u) ≤ 0, (ui − bi)Fi(u) ≤ 0, i = 1, . . . ,m, (1.18)

wherea, b ∈ Rm andF : Rm → Rm aregiven.We begin with anearlyapproachby Eaves[48] who observed(in themoregen-

eralframework of VIPsonclosedconvex sets)that(1.18)canbeequivalentlywrittenin theform

u− P[a,b](u− F (u)) = 0, (1.19)

whereP[a,b](u) = maxa,minu, b (componentwise)is theEuclideanprojectiononto [a, b] =

∏mi=1[ai, bi]. Notethat if the functionF is Ck thenthe left handside

of (1.19)is piecewiseCk andthus,aswe will see,semismooth.Thereformulation(1.19) can be embeddedin a more generalframework. To this end,we interpret(1.18)asa systemof m conditionsof theform

α ≤ x1 ≤ β, (x1 − α)x2 ≤ 0, (x1 − β)x2 ≤ 0, (1.20)

which have to be fulfilled by x = (ui, Fi(u)) for [α, β] = [ai, bi], i = 1, . . . ,m.Givenany functionφ[α,β] : R2 → R with theproperty

φ[α,β](x) = 0 ⇐⇒ (1.20) holds, (1.21)

wecanwrite (1.18)equivalentlyas

φ[ai,bi](ui, Fi(u)) = 0, i = 1, . . . ,m. (1.22)

10 1. Introduction

A function with the property(1.21) is calledMCP-functionfor the interval [α, β](also the nameBVIP-function is used,where“BVIP” standsfor box constrainedvariationalinequalityproblem).The link between(1.19)and(1.22)consistsin thefactthatthefunctionφ[α,β] : R2 → R2,

φE[α,β](x) = x1 − P[α,β](x1 − x2) with P[α,β](t) = maxα,mint, β (1.23)

definesanMCP-functionfor theinterval [α, β].The reformulationof NCPs requiresonly an MCP-function for the interval

[0,∞). As alreadysaid, such functions are called NCP-functions. According to(1.21),φ : R2 → R is anNCP-functionif andonly if

φ(x) = 0 ⇐⇒ x1, x2 ≥ 0, x1x2 = 0. (1.24)

Thecorrespondingreformulationof theNCPthenis

Φ(u) def=

φ(u1, F1(u))...

φ(um, Fm(u))

= 0, (1.25)

andtheNCP-functionφE[0,∞) canbewritten in theform

φE(x) = φE[0,∞)(x) = minx1, x2.A further important reformulation,which is due to Robinson[127], usesthe

normalmapF[a,b](z) = F (P[a,b](z)) + z − P[a,b](z).

It is not difficult to seethatany solutionz of thenormalmapequation

F[a,b](z) = 0 (1.26)

givesriseto asolutionu = P[a,b](z) of (1.18),and,conversely, that,for any solutionu of (1.26),thevectorz = u − F (u) solves(1.26).Therefore,theMCP (1.18)andthe normalequation(1.26)areequivalent.Again, the normalmapis piecewiseCk

if F isCk. In contrastto thereformulationbasedon NCP-andMCP-functions,thenormalmapapproachevaluatesF onlyatfeasiblepoints,whichcanbeadvantageousin certainsituations.

Many modernalgorithmsfor finite dimensionalNCPsandMCPsarebasedonreformulationsby meansof theFischer–BurmeisterNCP-function

φFB(x) = x1 + x2 −√x2

1 + x22, (1.27)

which wasintroducedby Fischer[55]. This functionis Lipschitzcontinuousand1-ordersemismoothon R2 (thedefinitionof semismoothnessis givenbelow, and,inmoredetail, in chapter2). Further, φFB is C∞ on R2 \ 0, and (φFB)2 is con-tinuouslydifferentiableonR2. Thelatterpropertyimpliesthat,if F is continuously


differentiable,the function 12Φ

FB(u)TΦFB(u) canserve asa continuouslydiffer-entiablemerit function for (1.25). It is alsopossibleto obtain1-ordersemismoothMCP-functionsfromtheFischer–Burmeisterfunction,see[18,54] andsection5.1.1.

The describedreformulationsweresuccessfullyusedasbasisfor the develop-mentof locally superlinearlyconvergentNewton-typemethodsfor the solutionof(mixed)nonlinearcomplementarityproblems[18, 38, 39, 45, 50, 52, 53, 54, 88, 89,93, 116, 124, 140]. This is remarkable,sinceall thesereformulationsarenonsmoothsystemsof equations.However, theunderlyingfunctionsaresemismooth, a conceptintroducedby Mif flin [113] for real-valuedfunctionson Rn, andextendedto map-pingsbetweenfinite-dimensionalspacesby Qi [120] andQi andSun[122]. Hereby– detailsaregivenin chapter2 – a functionf : Rl → Rm is calledsemismoothatx ∈ Rl if it is Lipschitzcontinuousnearx, directionallydifferentiableatx, andif

supM∈∂f(x+h)

‖f(x+ h)− f(x)−Mh‖ = o(‖h‖) ash→ 0,

wherethesetvaluedfunction∂f : Rl ⇒ Rm×l,

∂f(x) = coM ∈ Rm×l : xk → x, f is differentiableatxk andf ′(xk)→MdenotesClarke’s generalizedJacobian(“co” is the convex hull). It can be shownthatpiecewiseC1 functionsaresemismooth,seesection2.5.3.Further, it is easytoprove that Newton’s method(wherein Newton’s equationtheJacobianis replacedby anarbitraryelementof ∂f ) convergessuperlinearlyin a neighborhoodof a CD-regular(“CD” for Clarke-differential)solutionx∗, i.e.,asolutionwhereall elementsof ∂f(x∗) areinvertible.More detailson semismoothnessin finite dimensionscanbefoundin chapter2.

It shouldbementionedthatalsocontinuouslydifferentiableNCP-functionscanbeconstructed.In fact,alreadyin theseventies,Mangasarian[110] provedtheequiv-alenceof theNCPto a systemof equations,which, in our terminology, heobtainedby choosingtheNCP-function

φM (x) = θ(|x2 − x1|)− θ(x2)− θ(x1),

whereθ : R→ R is any strictly increasingfunctionwith θ(0) = 0. Maybethemoststraightforwardchoiceis θ(t) = t, whichgivesφM = −2φE . If, in addition,θ isC1

with θ′(0) = 0, thenφM is C1. This is, e.g.,satisfiedby θ(t) = t|t|. Nevertheless,mostmodernapproachesprefernondifferentiable,semismoothreformulations.Thishasagoodreason.In fact,consider(1.25)with a differentiableNCP-function.ThentheJacobianof Φ is givenby

Φ′(u) = diag(φx1(ui, F (ui))

)+ diag

(φx2(ui, F (ui))

)F ′(u).

Now, sinceφ(t, 0) = 0 = φ(0, t) for all t ≥ 0, we seethatφ′(0, 0) = 0. Thus,ifstrict complementarityis violatedfor the ith component,i.e., if ui = 0 = Fi(u),thentheith row of Φ′(u) is zero,andthusNewton’smethodis notapplicableif strictcomplementarityis violatedatthesolution.Thiscanbeavoidedby usingnonsmooth

12 1. Introduction

NCP-functions,becausethey canbeconstructedin sucha way thatany elementofthe generalizedgradient∂φ(x) is boundedaway from zeroat any point x ∈ R2.For theFischer–Burmeisterfunction,e.g.,holdsφFB

′(x) = (1, 1)− x/‖x‖2 for allx 6= 0 andthus‖g‖2 ≥

√2− 1 for all g ∈ ∂φFB(x) andall x ∈ R2.

Thedevelopmentof nonsmoothNewton methods[102, 103, 120, 122, 118], es-pecially theunifying notionof semismoothness[120, 122], hasled to considerableresearchon numericalmethodsfor thesolutionof finite-dimensionalVIPs thatarebasedon semismoothreformulations[18, 38, 39, 50, 52, 53, 54, 88, 89, 93, 116,140]. Theseinvestigationsconfirmthat this approachadmitsanelegantandgeneraltheory(in particular, no strict complementarityassumptionis required)andleadstoveryefficientnumericalalgorithms[54, 115, 116].

Relatedapproaches

Theresearchon semismoothness-basedmethodsis still in progress.Promisingnewdirectionsof researchareprovidedby Jacobiansmoothingmethodsandcontinuationmethods[31, 29, 92]. Hereby, a family of functions(φµ)µ≥0 is introducedsuchthatφ0 is a semismoothNCP- or MCP-function,φµ, µ > 0, is smoothandφµ →φ0 in a suitablesenseasµ → 0. Thesefunctionsare usedto derive a family ofequationsΦµ(u) = 0 in analogyto (1.25). In the continuationapproach[29], asequence(uk) of approximatesolutionscorrespondingto parametervaluesµ = µkwith µk → 0 is generatedsuchthat uk convergesto a solution of the equationΦ0(u) = 0. Stepsareusuallyobtainedby solving the smoothedNewton equationΦ′µk

(uk)sck = −Φµk(uk), yielding “centering”stepstowardsthe“central” pathx :

Φµ(x) = 0 for someµ > 0, or bysolvingtheJacobiansmoothingNewtonequationΦ′µk

(uk)sk = −Φ0(uk), yielding “f ast” stepstowardsthe solutionsetof Φ0(u) =0. The latter stepsare also usedas trial stepsin the recentlydevelopedJacobiansmoothingmethods[31, 92]. Sincethelimit operatorΦ0 is semismooth,theanalysisof thesemethodsheavily relieson thepropertiesof ∂Φ0 andthesemismoothnessofΦ0.

Thesmoothingapproachis alsousedin thedevelopmentof algorithmsfor math-ematicalprogramswith equilibriumconstraints(MPECs)[51, 57, 90, 109]. In thisdifficult classof problems,anobjective functionf(u, v) hasto beminimizedundertheconstraintu ∈ S(v), whereS(v) is thesolutionsetof a VIP that is parameter-izedby v. Undersuitableconditionsonthisinnerproblem,S(v) canbecharacterizedequivalentlyby its KKT conditions.These,however, whentakenasconstraintsforthe outer problem,violate any standardconstraintqualification.Alternatively, theKKT conditionscanberewritten asa systemof semismoothequationsby meansofanNCP-function.This,however, introducesthe(mainlynumerical)difficulty of non-smoothconstraints,whichcanbecircumventedby replacingtheNCP-functionwithasmoothingNCP-functionandconsideringasequenceof solutionsof thesmoothedMPECcorrespondingto µ = µk, µk → 0.

In conclusion,semismoothNewton methodsare at the heartof many modernalgorithmsin finite-dimensionaloptimization,andhenceshouldalsobeinvestigated


in theframework of optimalcontrolandinfinite-dimensionalVIPs. This is thegoalof thepresentmanuscript.

1.2.2 Infinite-Dimensional Variational Inequalities

A mainconcernof this work is to extendtheconceptof semismoothNewton meth-odsto aclassof nonsmoothoperatorequationssufficiently rich to coverappropriatereformulationsof the infinite-dimensionalVIP (1.1). In a first stepwe derive ana-loguesof thereformulationsin section1.2.1,but now in the functionspacesetting.We begin with the NCP (1.4). Replacingcomponentwiseoperationsby pointwise(a.e.)operations,wecanapplyanNCP-functionφ pointwiseto thepairof functions(u,F (u)) to definethesuperpositionoperator

Φ(u)(ω) = φ(u(ω), F (u)(ω)

). (1.28)

which, underappropriateassumptions,definesa mappingΦ : Lp(Ω) → Lr(Ω),r ≥ 1, seesection3.3.1.Obviously, (1.4) is equivalentto the nonsmoothoperatorequation

Φ(u) = 0. (1.29)

In thesameway, themoregeneralproblem(1.1)canbeconvertedinto anequivalentnonsmoothequation.To thisend,weuseasemismoothNCP-functionφ andasemis-moothMCP-functionφ[α,β], −∞ < α < β < +∞. Now, we definethe operatorΦ : Lp(Ω)→ Lr(Ω),

Φ(u)(ω) =

F (u)(ω) ω ∈ Ω \ (Ωa ∪Ωb),φ(u(ω)− a(ω), F (u)(ω)

)ω ∈ Ωa \Ωb,

−φ(b(ω)− u(ω),−F (u)(ω))

ω ∈ Ωb \Ωa,φ[a(ω),b(ω)] (u(ω), F (u)(ω)) ω ∈ Ωa ∩Ωb.

(1.30)

Again,Φ is asuperpositionoperatoron thefour differentsubsetsof Ω distinguishedin (1.30).Along thesameline, thenormalmapapproachcanbegeneralizedto thefunction spacesetting.We will concentrateon NCP-functionbasedreformulationsandtheirgeneralizations.

Our approachis applicablewhenever it is possibleto write the problemunderconsiderationasanoperatorequationin which theunderlyingoperatoris obtainedby superpositionΨ = ψ G of a Lipschitz continuousandsemismoothfunctionψ anda continuouslyFrechetdifferentiableoperatorG with reasonableproperties,which mapsinto a directproductof Lebesguespaces.We will show that theresultsfor finite-dimensionalsemismoothequationscanbe extendedto superpositionop-eratorsin function spaces.To this end,we first developa generalsemismoothnessconceptfor operatorsin Banachspacesandthenusetheseresultsto analyzesuper-linearly convergentNewton methodsfor semismoothoperatorequations.Thenweapplythistheoryto superpositionoperatorsin functionspacesof theformΨ = ψG.We work with a setvaluedgeneralizeddifferential∂Ψ that is motivatedby Qi’s

14 1. Introduction

finite-dimensionalC-subdifferential.The semismoothnessresultwe establishis anestimateof theform

supM∈∂Ψ(y+s)

‖Ψ(y + s)− Ψ(y)−Ms‖Lr = o(‖s‖Y ) as ‖s‖Y → 0.

We alsoprove semismoothnessof orderα > 0, which meansthat the above esti-mateholdswith “o(‖s‖Y )” replacedby “O(‖s‖1+αY )”. This semismoothnessresultenablesus to apply the classof semismoothNewton methodsthat we analyzedinthe abstractsetting.If appliedto nonsmoothreformulationsof variationalinequal-ity problems,thesemethodscanbe regardedas infinite-dimensionalanaloguesoffinite-dimensionalsemismoothNewton methodsfor this classof problems.As aconsequence,wecanadjustto thefunctionspacesettingmany of theideasthatweredevelopedfor finite-dimensionalVIPs in recentyears.

1.3 Organization

Wenow giveanoverview on theorganizationof this work.In chapter2 we recall importantresultsof finite-dimensionalnonsmoothanaly-

sis.Severalgeneralizeddifferentialsknown from theliterature(Clarke’sgeneralizedJacobian,B-differential,andQi’s C-subdifferential)andtheir propertiesareconsid-ered.Furthermore,finite-dimensionalsemismoothnessis discussedandsemismoothNewton methodsare introduced.Finally, we give importantexamplesfor semis-mooth functions,e.g.,piecewise smoothfunctions,anddiscussfinite-dimensionalgeneralizationsof thesemismoothnessconcept.

In the first part of chapter3 we establishsemismoothnessresultsfor operatorequationsin Banachspaces.Thedefinition is basedon a setvaluedgeneralizeddif-ferentialandrequiresanapproximationconditiontohold.Furthermore,semismooth-nessof higherorderis introduced.It is shown that continuouslydifferentiableop-eratorsaresemismoothwith respectto their Frechetderivative, and that the sum,composition,anddirectproductof semismoothnessoperatorsis againsemismooth.Thesemismoothnessconceptis usedto developa Newton methodfor semismoothoperatorequationsthatis superlinearlyconvergent(with q-order1+α in thecaseofα-ordersemismoothness).Severalvariantsof this methodareconsidered,includinganinexactversionthatallows to work with approximategeneralizeddifferentialsintheNewtonsystem,anda versionthatincludesa projectionin orderto stayfeasiblewith respectto agivenclosedconvex setcontainingthesolution.

In thesecondpartof chapter3 thisabstractsemismoothnessconceptis appliedtotheconcretesituationof operatorsobtainedby superpositionof aLipschitzcontinu-oussemismoothfunctionandasmoothoperatormappinginto aproductof Lebesguespaces.Thisclassof operatorsis of significantpracticalimportanceasit containsre-formulationsof variationalinequalitiesby meansof semismoothNCP-,MCP-, andrelatedfunctions.We first develop a suitablegeneralizeddifferential that hassim-ple structureandis closelyrelatedto thefinite-dimensionalC-subdifferential.Then

1.3 Organization 15

we show that the consideredsuperpositionoperatorsaresemismoothwith respectto this differential.We alsodevelop resultsto establishsemismoothnessof higherorder. The theoryis illustratedby applicationsto the NCP. The establishedsemis-moothnessof superpositionoperatorsenablesus,via nonsmoothreformulations,todevelopsuperlinearlyconvergentNewtonmethodsfor thesolutionof theNCP(1.4),and,asweshow in chapter5, for thesolutionof theVIP (1.1)andevenmoregeneralproblems.Finally, furtherpropertiesof thegeneralizeddifferentialareconsidered.

In chapter4 we investigatetwo ingredientsthat are neededin the analysisofchapter3. In chapter3 it becomesapparentthat in generala smoothingstepis re-quiredto closea gapbetweentwo differentLp-norms.This necessitywasalreadyobservedin similarcontexts[95,143]. In section4.1wedescribeawayhow smooth-ing stepscanbe constructed,which is basedon an ideaby Kelley andSachs[95].Furthermore,in section4.2 we investigatea particularchoiceof theMCP-functionthat leadsto reformulationsfor which no smoothingstepis required.The analysisof semismoothNewton methodsin chapter3 relies on a regularity condition thatensuresthe uniform invertibility (betweenappropriatespaces)of the generalizeddifferentialsin a neighborhoodof thesolution.In section4.3 we developsufficientconditionsfor this regularityassumption.

In chapter5 we show how thedevelopedconceptscanbeappliedto solve moregeneralproblemsthanNCPs.In particular, we proposesemismoothreformulationsfor bound-constrainedVIPs and,moregenerally, for VIPs with pointwiseconvexconstraints.Thesereformulationsallow us to apply semismoothNewton methodsfor their solution.Furthermore,we discusshow semismoothNewton methodscanbe applied to solve mixed problems,i.e., systemsof VIPs and smoothoperatorequations.Hereby, weconcentrateonmixedproblemsarisingastheKarush–Kuhn–Tucker (KKT) conditionsof constrainedoptimizationproblemswith optimal con-trol structure.A closerelationshipbetweenreformulationsbasedon the black-boxapproach,in which thereducedproblemis considered,andreformulationsbasedonthe all-at-onceapproach,wherethe full KKT-systemis considered,is established.We observe that thegeneralizeddifferentialsof theblack-boxreformulationappearasSchurcomplementsin thegeneralizeddifferentialsof theall-at-oncereformula-tion. This canbe usedto relateregularity conditionsof both approaches.We alsodescribehow smoothingstepscanbecomputed.

In chapter6 wedescribeawayto makethedevelopedclassof semismoothNew-ton methodsglobally convergentby embeddingthemin a trust region method.Tothis end,we proposethreevariantsof minimization problemssuchthat solutionsof the semismoothoperatorequationarecritical pointsof the minimizationprob-lem. Then we develop and analyzea classof nonmonotonetrust-region methodsfor the resultingoptimizationproblemsin a generalHilbert spacesetting.The trialstepshave to fulfill a modeldecreasecondition,which, aswe show, canbe imple-mentedby meansof a generalizedfraction of Cauchydecreasecondition.For thisalgorithmglobalconvergenceresultsareestablished.Further, it is shown how semis-moothNewton stepscanbeusedto computetrial stepsandit is provedthat,under

16 1. Introduction

appropriateconditions,eventuallyalwaysNewtonstepsaretaken.Therefore,therateof local convergenceto regularsolutionsis at leastq-superlinear.

In chapter7 thedevelopedalgorithmsareappliedto concreteproblems.Section7.1discussesin detailtheapplicabilityof semismoothNewtonmethodsto anonlin-earelliptic controlproblemwith boundsonthecontrol.Furthermore,afinite elementdiscretizationis discussedandit is shown that theapplicationof finite-dimensionalsemismoothNewton methodsto the discretizedproblemcan be viewed as a dis-cretizationof theinfinite-dimensionalsemismoothNewton method.Furthermore,itis discussedhow multigrid methodscanbe usedto solve the semismoothNewtonsystemefficiently. Theefficiency of themethodis documentedby variousnumericaltests.Hereby, both, black-boxandall-at-onceapproachare tested.Furthermore,anestediterationis proposedthatfirst solvestheproblemapproximatelyon a coarsegrid to obtaina good initial point on the next finer grid andproceedsin this wayuntil thefinestgrid is reached.As a secondapplicationwe investigatethe obstacleproblemof section1.1.2in detail.An equivalentdualproblemis derived,which isaugmentedby a regularizationterm to make it coercive. An error estimatefor theregularizedsolutionis establishedin termsof theregularizationparameter. We thenshow thatour classof semismoothNewton methodsis applicableto theregularizeddualproblem.Numericalresultsfor a finite elementdiscretizationarepresented.Intheimplementationweagainusemultigrid methodsto solvethesemismoothNewtonsystem.

In chapter8 we show thatour classof semismoothNewton methodscanbeap-plied to solvecontrol-constraineddistributedoptimalcontrolproblemsgovernedbytheincompressibleNavier–Stokesequations.To this end,differentiabilityandlocalLipschitzcontinuitypropertiesof thecontrol-to-statemappingareinvestigated.Fur-thermore,resultsfor the adjoint equationareestablishedthat allow us to prove asmoothingpropertyof thereducedgradientmapping.Theseresultsshow thatsemis-moothNewton methodscanbe appliedto the flow control problemandthat thesemethodsconvergesuperlinearlyin aneighborhoodof regularcritical points.

In chapter9 we presentapplicationsof our methodto the boundarycontrol ofthe time-dependentcompressibleNavier–Stokesequations.Hereby, we control thenormalvelocityof thefluid onpartof theboundary(suctionandblowing),subjecttopointwiselowerandupperbounds.As controlobjective,theterminalkineticenergyis minimized.In thealgorithm,theHessianis approximatedby BFGSmatrices.Thisproblemis verylargescale,with over75,000unknowncontrolsandover29,000,000statevariables.Thenumericalresultsshow thatour approachis viableandefficientalsofor very largescale,stateof theart controlproblems.

Theappendixcontainssomeusefulsupplementarymaterial.In appendixA.1 wedescribetheadjoint-basedgradientandHessianrepresentationfor thereducedobjec-tive functionof optimalcontrolproblems.AppendixA.2 collectsseveralfrequentlyusedinequalities.In appendixA.3 westateelementarypropertiesof multifunctions.Finally, in appendixA.4, thedifferentiabilitypropertiesof Nemytskijoperatorsareconsidered.

2. Elementsof Finite-DimensionalNonsmoothAnalysis

In this chapterwe collect several resultsof finite-dimensionalnonsmoothanalysisthatarerequiredfor our investigations.In particular, finite-dimensionalsemismooth-nessandsemismoothNewton methodsareconsidered.Theconceptsintroducedinthis sectionwill serve asa motivationandguidelinefor thedevelopmentsin subse-quentsections.

All generalizeddifferentialsconsideredhereareset-valuedfunctions(or mul-tifunctions).Basicpropertiesof multifunctions,like uppersemicontinuity, can befoundin appendixA.3.

Throughout,we denoteby ‖ · ‖ arbitrary, but fixednormson therespectiveRn-spacesaswell astheinducedmatrixnorms.Theopenunit ball x ∈ Rn : ‖x‖ < 1is denotedbyBn.

2.1 GeneralizedDifferentials

On thenonemptyopensetV ⊂ Rn, we considerthefunction

f : V → Rm

anddenotebyDf ⊂ V thesetof all x ∈ V atwhichf admitsa(Frechet-)derivativef ′(x) ∈ Rm×n. Now supposethatf is Lipschitzcontinuousnearx ∈ V , i.e., thatthereexistsanopenneighborhoodV (x) ⊂ V of x on which f is Lipschitzcontin-uous.Then,accordingto Rademacher’s Theorem[149], V (x) \ Df hasLebesguemeasurezero.Hence,thefollowing constructionsmakesense.

Definition 2.1. [32, 118, 122] Let V ⊂ Rn beopenandf : V → Rm beLipschitzcontinuousnearx ∈ V . Theset

∂Bf(x) def= M ∈ Rm×n : ∃(xk) ⊂ Df : xk → x, f ′(xk)→M

is calledB-subdifferential(“B” for Bouligand)of f atx. Moreover, Clarke’s gener-alizedJacobianof f atx is theconvex hull ∂f(x) def= co(∂Bf(x)), and

∂Cf(x) def= ∂f1(x)× · · · × ∂fm(x)

denotesQi’s C-subdifferential.

18 2. Elementsof Finite-DimensionalNonsmoothAnalysis

Thedifferentials∂Bf , ∂f , and∂Cf havethefollowing properties.

Proposition2.2. Let V ⊂ Rn be openand f : V → Rm be locally Lipschitzcontinuous.Thenfor x ∈ V holds:

(a) ∂Bf(x) is nonemptyandcompact.

(b) ∂f(x) and∂Cf(x) arenonempty, compact,andconvex.

(c) Thesetvaluedmappings∂Bf , ∂f , and∂Cf , respectively, are locally boundedanduppersemicontinuous.

(d) ∂Bf(x) ⊂ ∂f(x) ⊂ ∂Cf(x).(e) If f is continuouslydifferentiablein a neighborhoodof x then

∂Cf(x) = ∂f(x) = ∂Bf(x) = f ′(x).Proof. Theresultsfor ∂Bf(x) and∂f(x) aswell as(d) areestablishedin [32, Prop.2.6.2].Part(e)immediatelyfollowsfromthedefinitionof therespectivedifferentials.The remainingassertionson ∂Cf areimmediateconsequencesof thepropertiesof∂fi(x). utThefollowing chainrule holds:

Proposition2.3. [32, Cor. 2.6.6]LetV ⊂ Rn andW ⊂ Rl benonemptyopensets,let g : V →W beLipschitzcontinuousnearx ∈ V , andh : W → Rm beLipschitzcontinuousnear g(x). Then,f = h g is Lipschitz continuousnearx and for allv ∈ Rn, it holdsthat

∂f(x)v ⊂ co(∂h(g(x))∂g(x)v) = coMhMgv : Mh ∈ ∂h(g(x)), Mg ∈ ∂g(x).If, in addition,h is continuouslydifferentiablenearg(x), then,for all v ∈ Rn,

∂f(x)v = h′(g(x))∂g(x)v.

If f is real-valued(i.e., if m = 1), then in both chain rules the vectorv can beomitted.

In particular, choosingh(y) = eTi y = yi andg = f , whereei is theith unit vector,weseethat

Corollary 2.4. LetV ⊂ Rn beopenandf : V → Rm beLipschitzcontinuousnearx ∈ V . Then

∂fi(x) = eTi ∂f(x) = Mi : Mi is theith rowof someM ∈ ∂f(x).

2.2 Semismoothness

Thenotionof semismoothnesswasintroducedby Mif flin [113] for real-valuedfunc-tionsdefinedonfinite-dimensionalspaces,andextendedto mappingsbetweenfinite-dimensionalspacesby Qi [120] andQi andSun[122]. The importanceof semis-moothequationsresultsfrom the fact that, althoughthe underlyingmappingis ingeneralnonsmooth,Newton’s methodis still applicableandconvergeslocally withq-superlinearrateto a regularsolution.

2.2 Semismoothness 19

Definition 2.5. [113, 118, 122] Let V ⊂ Rn benonemptyandopen.The functionf : V → Rm is semismoothatx ∈ V if it is Lipschitzcontinuousnearx andif thefollowing limit existsfor all s ∈ Rn:

limM∈∂f(x+τd)d→s, τ→0+

Md.

If f is semismoothat all x ∈ V , wecall f semismooth(onV ).

Note thatwe includethe local Lipschitzconditionin thedefinitionof semismooth-ness.Hence,if f is semismoothatx, it is alsoLipschitzcontinuousnearx. Semis-moothnessadmitsdifferent,yetequivalent,characterizations.To formulatethem,wefirst recalldirectionalandBouligand-(or B-) differentiability.

Definition 2.6. Let thefunctionf : V → Rm bedefinedon theopensetV .

(a) f is directionallydifferentiableatx ∈ V if thedirectionalderivative

f ′(x, s) def= limτ→0+

f(x+ τs)− f(x)τ

existsfor all s ∈ Rn.

(b) f is B-differentiableatx ∈ V if f is directionallydifferentiableatx and

‖f(x+ s)− f(x)− f ′(x, s)‖ = o(‖s‖) ass→ 0.

(c) f is α-order B-differentiableat x ∈ V , 0 < α ≤ 1, if f is directionallydifferentiableatx and

‖f(x+ s)− f(x)− f ′(x, s)‖ = O(‖s‖1+α) ass→ 0.

Notethatf ′(x, ·) is positivehomogeneous.Furthermore,it is known thatdirectionaldifferentiabilityandB-differentiabilityareequivalentfor locally Lipschitzcontinu-ousmappingsbetweenfinite-dimensionalspaces[133]. The following Propositiongivesalternativedefinitionsof semismoothness.

Proposition2.7. Let f : V → Rm be definedon the opensetV ⊂ Rn. Thenforx ∈ V thefollowingstatementsareequivalent:

(a) f is semismoothat x.

(b) f is Lipschitzcontinuousnearx, f ′(x, ·) existsand

supM∈∂f(x+s)

‖Ms− f ′(x, s)‖ = o(‖s‖) ass→ 0.

(c) f is Lipschitzcontinuousnearx, f ′(x, ·) existsand

supM∈∂f(x+s)

‖f(x+ s)− f(x)−Ms‖ = o(‖s‖) ass→ 0. (2.1)


Proof. Concerningtheequivalenceof (a) and(b), see[122, Thm. 2.3]. If f is Lip-schitzcontinuousnearx anddirectionallydifferentiableat x, then,asnotedabove,f is alsoB-differentiableatx. Hence,it is now easilyseenthat(b) and(c) areequiv-alent,sincefor all M ∈ ∂f(x+ s)∣∣‖f(x+ s)− f(x)−Ms‖ − ‖Ms− f ′(x, s)‖∣∣

≤ ‖f(x+ s)− f(x)− f ′(x, s)‖ = o(‖s‖) ass→ 0.

utThe version(c) is especiallywell suitedfor the analysisof Newton-typemethods.To give a first exampleof semismoothfunctions,we notethe following immediateconsequenceof Proposition2.7:

Proposition2.8. LetV ⊂ Rn beopen.If f : V → Rn is continuouslydifferentiablein a neighborhoodof x ∈ V thenf is semismoothat x and∂f(x) = ∂Bf(x) =f ′(x).Further, theclassof semismoothfunctionsis closedundercomposition:

Proposition2.9. [56, Lem. 18] Let V ⊂ Rn andW ⊂ Rl be opensets.Let g :V → W besemismoothat x ∈ V andh : W → Rm besemismoothat g(x) withg(V ) ⊂ W . Thenthe compositemapf

def= h g : V → Rm is semismoothat x.Moreover,

f ′(x, ·) = h′(g(x), g′(x, ·)).It is naturalto askif f is semismoothif its componentfunctionsaresemismoothandviceversa.This is in facttrue:

Proposition2.10. The functionf : V → Rm, V ⊂ Rn open,is semismoothatx ∈ V if andonly if its componentfunctionsaresemismoothat x.

Proof. We usethe characterizationof semismoothnessgiven in Proposition2.7. Iff is semismoothat x then the functionsfi are Lipschitz continuousnearx anddirectionallydifferentiableatx. Furthermore,by Corollary2.4,

supv∈∂fi(x+s)

|fi(x+ s)− fi(x)− vs|

= supM∈∂f(x+s)

|eTi (f(x+ s)− f(x)−Ms)| = o(‖s‖) ass→ 0,

which provesthesemismoothnessof fi atx. Thereversedirectionis an immediateconsequenceof theinclusion∂f(x) ⊂ ∂Cf(x). ut

2.3 SemismoothNewton’s Method

Wenow analyzethefollowing Newton-likemethodfor thesolutionof theequation

f(x) = 0, (2.2)

wheref : V → Rn, V ⊂ Rn open,is semismoothat thesolutionx ∈ V :

2.3 SemismoothNewton’s Method 21

Algorithm 2.11 (SemismoothNewton’sMethod).

0. Chooseaninitial pointx0 andsetk = 0.

1. If f(xk) = 0, thenSTOP.

2. ChooseMk ∈ ∂f(xk) andcomputesk from

Mksk = −f(xk).

3. Setxk+1 = xk + sk, incrementk by oneandgo to step1.

Undera regularity assumptionon thematricesMk, this iterationconvergeslocallyq-superlinearly:

Proposition2.12. Let f : V → Rn bedefinedon theopensetV ⊂ Rn anddenoteby x ∈ Rn a solutionof (2.2). Assumethat

(a) Estimate(2.1) holdsat x = x (which, in particular, is satisfiedif f is semis-moothat x).

(b) Oneof thefollowingconditionsholds:

(i) There existsa constantC > 0 such that, for all k, the matricesMk arenonsingularwith ‖M−1

k ‖ ≤ C.

(ii) There exist constantsη > 0 andC > 0 such that, for all x ∈ x + ηBn,everyM ∈ ∂f(x) is nonsingularwith ‖M−1‖ ≤ C.

(iii) The solution x is CD-regular (“CD” for Clarke-differential), i.e., everyM ∈ ∂f(x) is nonsingularwith ‖M−1‖ ≤ C.

Thenthere existsδ > 0 such that, for all x0 ∈ x + δBn, (i) holdsand Algorithm2.11 either terminateswith xk = x or generatesa sequence(xk) that convergesq-superlinearlyto x.

Variousresultsof this typecanbefoundin theliterature[102, 103, 118, 120, 122].In particular, Kummer[103] developsa generalabstractframework of essentiallytwo requirements(CA) and(CI), underwhichNewton’smethodis well-definedandconvergessuperlinearly. Thecondition(2.1) is a specialcaseof theapproximationcondition(CA), whereas(CI) is a uniform injectivity condition,which, in our con-text, correspondsto assumption(b) (ii).

Sincethe proof of Proposition2.12 is not difficult andquite helpful in gettingfamiliar with thenotionof semismoothness,we includeit here.

Proof. First,weprove(iii) =⇒ (ii). Assumethat(ii) doesnothold.Thenthereexistsequencesxi → x andM i ∈ ∂f(xi) suchthat, for any i, eitherM i is singularor ‖(M i)−1‖ ≥ i. Since∂f is uppersemicontinuousandcompact-valued,we canselecta subsequencesuchthatM i → M ∈ ∂f(x). Due to the propertiesof thematricesM i,M cannotbeinvertible,andthus(iii) doesnot hold.

Further, observethat(ii) implies(i) wheneverxk ∈ x+ηBn for all k. Therefore,if oneof theconditionsin (b) holds,we have (i) at handaslong asxk ∈ x + δBn

and δ > 0 is sufficiently small. Denotingthe error by vk = xk − x and usingMksk = −f(xk), f(x) = 0, we obtainfor suchxk


Mkvk+1 = Mk(sk + vk) = −f(xk) +Mkvk

= −[f(x+ vk)− f(x)−Mkvk].(2.3)

Invoking (2.1)yields

‖Mkvk+1‖ = o(‖vk‖) as‖vk‖ → 0. (2.4)

Hence,for sufficiently smallδ > 0, wehave

‖Mkvk+1‖ ≤ 12C‖vk‖,

andthusby (i)

‖vk+1‖ ≤ ‖M−1k ‖‖Mkvk+1‖ ≤ 1

2‖vk‖.

This shows xk+1 ∈ x + (δ/2)Bn andinductively xk → x (in the nontrivial casexk 6= x for all k). Now we concludefrom (2.4) that the rateof convergenceis q-superlinear. ut

2.4 Higher Order Semismoothness

Therateof convergenceof thesemismoothNewton methodcanbe improvedif in-steadof (2.1)anestimateof higherorderis available.Thisleadsto thefollowing def-inition of higherordersemismoothness,which canbe interpretedasa semismoothrelaxationof Holder-continuousdifferentiability.

Definition 2.13. [122] Let the function f : V → Rm be definedon the opensetV ⊂ Rn. Then,for 0 < α ≤ 1, f is calledα-ordersemismoothat x ∈ V if f islocally Lipschitzcontinuousnearx, f ′(x, ·) exists,and

supM∈∂f(x+s)

‖Ms− f ′(x, s)‖ = O(‖s‖1+α) ass→ 0.

If f is α-ordersemismoothat all x ∈ V , wecall f α-ordersemismooth(onV ).

For α-ordersemismoothfunctions,a counterpartof Proposition2.7 canbe estab-lished.

Proposition2.14. Let f : V → Rm bedefinedon theopensetV ⊂ Rn. Thenforx ∈ V and0 < α ≤ 1 thefollowingstatementsareequivalent:

(a) f is α-ordersemismoothat x.

(b) f is Lipschitzcontinuousnearx, α-orderB-differentiableat x, and

supM∈∂f(x+s)

‖f(x+ s)− f(x)−Ms‖ = O(‖s‖1+α) ass→ 0. (2.5)

Proof. Accordingto resultsin [122],α-ordersemismoothnessatx impliesα-orderB-differentiabilityatx. Now wecanproceedasin theproof of Proposition2.7. ut

2.5 Examplesof SemismoothFunctions 23

Of course,α-Holdercontinuouslydifferentiablefunctionsareα-ordersemismooth.More precisely, wehave:

Proposition2.15. Let V ⊂ Rn be open.If f : V → Rm is differentiablein aneighborhoodof x ∈ V with α-Holder continuousderivative, 0 < α ≤ 1, thenf isα-order semismoothat x and∂f(x) = ∂Bf(x) = f ′(x).Theclassof α-ordersemismoothfunctionsis closedundercomposition:

Proposition2.16. [56, Thm. 21] Let V ⊂ Rn andW ⊂ Rl beopensetsand0 <α ≤ 1. Let g : V → W beα-order semismoothat x ∈ V andh : W → Rm beα-order semismoothat g(x) with g(V ) ⊂ W . Thenthecompositemapf

def= h g :V → Rm is α-ordersemismoothat x. Moreover,

f ′(x, ·) = h′(g(x), g′(x, ·)).

Further, weobtainbyastraightforwardmodificationof theproofof Proposition2.10:

Proposition2.17. Let V ⊂ Rn be open.The functionf : V → Rm is α-ordersemismoothat x ∈ V , 0 < α ≤ 1, if andonly if its componentfunctionsareα-ordersemismoothat x.

Concerningtherateof convergenceof Algorithm 2.11,thefollowing holds:

Proposition2.18. Let the assumptionsin Proposition2.12 hold, but assumethatinsteadof (2.1) thestronger condition(2.5), with 0 < α ≤ 1, holdsat thesolutionx. Thenthere existsδ > 0 such that, for all x0 ∈ x + δBn, Algorithm2.11eitherterminateswith xk = x or generatesa sequence(xk) that convergesto x with rate1 + α.

Proof. In light of Proposition2.12,we only have to establishthe improvedrateofconvergence.But from vk → 0, (2.3),and(2.5)follows immediately

‖vk+1‖ = O(‖vk‖1+α).

ut

2.5 Examplesof SemismoothFunctions

2.5.1 The Euclidean Norm

TheEuclideannorme : x ∈ Rn 7→ ‖x‖2 = (xTx)1/2 is animportantexampleof a1-ordersemismoothfunctionthatarises,e.g.,asthenonsmoothpartof theFischer–Burmeisterfunction.Obviously, e is Lipschitzcontinuouson Rn, andC∞ on Rn \0 with

e′(x) =xT

‖x‖2 .


Therefore,

∂e(x) = ∂Be(x) =xT

‖x‖2

for x 6= 0,

∂Be(0) = vT : v ∈ Rn, ‖v‖2 = 1, and ∂e(0) = vT : v ∈ Rn, ‖v‖2 ≤ 1.

By Proposition2.15,e is 1-ordersemismoothon Rn \ 0, sinceit is smooththere.On theotherhand,for all s ∈ Rn \ 0 andv ∈ ∂e(s) holdsv = sT /‖s‖2 and

e(s)− e(0)− vs = ‖s‖2 − ‖s‖2 = 0.

Hence,e is also1-ordersemismoothat0.

2.5.2 The Fischer–Burmeister Function

TheFischer–Burmeisterfunctionwasalreadydefinedin (1.27):

φFB : R2 → R, φFB(x) = x1 + x2 −√x2

1 + x22.

φ = φFB is thedifferenceof the linear functionf(x) = x1 + x2 andthe 1-ordersemismoothandLipschitzcontinuousfunction‖x‖2, seesection2.5.1.Therefore,φis Lipschitzcontinuousand1-ordersemismoothby Proposition2.15andProposition2.16.Further, from thedefinitionof ∂Bφ and∂φ, it is immediatelyclearthat

∂Bφ(x) = f ′(x)− ∂B‖x‖2, ∂φ(x) = f ′(x)− ∂‖x‖2.

Hence,for x 6= 0,

∂φ(x) = ∂Bφ(x) =

(1, 1)− xT

‖x‖2

,

and

∂Bφ(0) = (1, 1)− yT : ‖y‖2 = 1, ∂φ(0) = (1, 1)− yT : ‖y‖2 ≤ 1.

From this onecanseethat for all x ∈ R2 andall v ∈ ∂φFB(x) holdsv1, v2 ≥ 0,2 − √2 ≤ v1 + v2 ≤ 2 +

√2, showing thatall generalizedgradientsarebounded

above(aconsequenceof theglobalLipschitzcontinuity)andareboundedawayfromzero.

2.5.3 PiecewiseDiffer entiable Functions

Piecewisecontinuouslydifferentiablefunctionsareanimportantsubclassof semis-moothfunctions.We refer to Scholtes[132] for a thoroughtreatmentof the topic,wheretheresultsof this sectioncanbefound.For thereader’s convenience,we in-cludeselectedproofs.

2.5 Examplesof SemismoothFunctions 25

Definition 2.19. [132] A functionf : V → Rm definedon the opensetV ⊂ Rnis calledPCk-function(“P” for piecewise),1 ≤ k ≤ ∞, if f is continuousandif ateverypointx0 ∈ V thereexist a neighborhoodW ⊂ V of x0 anda finite collectionof Ck-functionsf i : W → Rm, i = 1, . . . ,N , suchthat

f(x) ∈ f1(x), . . . , fN (x) for all x ∈W .

Wesaythatf is acontinuousselectionof f1, . . . , fN onW . Theset

I(x) = i : f(x) = f i(x)is theactive index setatx ∈W and

Ie(x) = i ∈ I(x) : x ∈ cl(inty ∈W : f(y) = f i(y)is theessentiallyactive index setatx.

Thefollowing is obvious.

Proposition2.20. The classof PCk-functionsis closedunder composition,finitesummation,andmultiplication(in casetherespectiveoperationsmakesense).

Example2.21. Thefunctionst ∈ R 7→ |t|, x ∈ R2 7→ maxx1, x2, andx ∈ R2 7→minx1, x2 arePC∞-functions.As a consequence,theprojectionontotheinterval[α, β], P[α,β](t) = maxα,mint, β is PC∞, and thus also the MCP-functionφE[α,β] definedin (1.23).

Proposition2.22. Let the PCk-functionf : V → Rm be a continuousselectionof the Ck-functionsf1, . . . , fN on the opensetV ⊂ Rn. Then,for x ∈ V ,there existsa neighborhoodW of x on which f is also a continuousselectionoff i : i ∈ Ie(x).Proof. Assumethecontrary. Thentheopensets

Vr = y ∈ V : ‖y − x‖ < 1/r, f(y) 6= f i(y) for all i ∈ Ie(x)arenonemptyfor all r ∈ N. Let i1, . . . , iq enumeratetheset1, . . . ,N \ Ie(x).SetV 0

r = Vr, and,for l = 1, . . . , q, generatetheopensets

V lr = V l−1r ∩ y ∈ V : f(y) 6= f il(y).

Sincefor all y ∈ V thereexistsi ∈ Ie(x) ∪ i1, . . . , iq with f(y) = f i(y), weseethatV qr = ∅. Hence,thereexistsa maximallr with V lrr 6= ∅. With jr = ilr+1 wehave

∅ 6= V lrr ⊂ y ∈ V : f(y) = f jr(y).We canselecta constantsubsequence(jr)r∈K , i.e., jr = j /∈ Ie(x) for all r ∈ K.Now ⋃

r∈KV lrr ⊂ y ∈ V : f(y) = f j(y),

the seton the left beingopenandhaving x asan accumulationpoint. Therefore,j ∈ Ie(x), which is acontradiction. ut


Proposition2.23. [132, Cor. 4.1.1] EveryPC1-functionf : V → Rm, V ⊂ Rnopen,is locally Lipschitzcontinuous.

Proposition2.24. Let thePC1-functionf : V → Rm, V ⊂ Rn open,bea contin-uousselectionof theC1-functionsf1, . . . , fN in a neighborhoodW of x ∈ V .Thenf is B-differentiableat x and,for all y ∈ Rn,

f ′(x, y) ∈ (f i)′(x)y : i ∈ Ie(x).

Further, if f is differentiableat x then

f ′(x) ∈ (f i)′(x) : i ∈ Ie(x).

Proof. Thefirst partrestates[132, Prop.4.1.3.1].Now assumethatf isdifferentiableat x. Then,for all y ∈ Rn, f ′(x)y ∈ (f i)′(x)y : i ∈ Ie(x). Denoteby q ≥ 1the cardinality of Ie(x). Now choosel = q(n − 1) + 1 vectorsyr ∈ Rn, r =1, . . . l, suchthat every selectionof n of thesevectorsis linearly independent(thevectorsyr canbeobtained,e.g.,by choosingl pairwisedifferentnumberstr ∈ R,andsettingyr = (1, tr, t2r , . . . , tn−1

r )T ). For every r, chooseir ∈ Ie(x) suchthatf ′(x)yr = (f ir)′(x)yr. Sincer rangesfrom 1 to q(n − 1) + 1 andir canassumeonly q different values,we can find n pairwisedifferent indicesr1, . . . , rn suchthat ir1 = . . . = irn

= j. Sincethe columnsof Y = (yr1 , . . . , yrn) are linearly

independentandf ′(x)Y = (f j)′(x)Y , we concludethatf ′(x) = (f j)′(x). utProposition2.25. Let the PC1-functionf : V → Rm, V ⊂ Rn open,be a con-tinuousselectionof theC1-functionsf1, . . . , fN in a neighborhoodof x ∈ V .Then

∂Bf(x) = (f i)′(x) : i ∈ Ie(x), (2.6)

∂f(x) = co(f i)′(x) : i ∈ Ie(x). (2.7)

Proof. Weknow from Proposition2.23thatf is locally Lipschitzcontinuous,sothatthesubdifferentialsarewell defined.By Proposition2.22,f is acontinuousselectionof f i : i ∈ Ie(x) in a neighborhoodW of x. Further, for M ∈ ∂Bf(x), thereexistsxk → x in W suchthatf ′(xk) → M . Among the functionsf i, i ∈ Ie(x),exactly thosewith indicesi ∈ Ie(x) ∩ Ie(xk) areessentiallyactive at xk. Hence,by Proposition2.22,f is a continuousselectionof f i : i ∈ Ie(x) ∩ Ie(xk) in aneighborhoodof xk. Proposition2.24now yieldsthatf ′(xk) = (f ik)′(xk) for someik ∈ Ie(x) ∩ Ie(xk). Now we selecta subsequencek ∈ K on which ik is constantwith valuei ∈ Ie(x). Since(f i)′ is continuous,this provesM = (f i)′(x), andthus“⊂” in (2.6).For everyi ∈ Ie(x) thereexists,by definition,asequencexk → x suchthatf ≡ f i in anopenneighborhoodof everyxk. In particular, f is differentiableatxk (sincef i is C1), andf ′(xk) = (f i)′(xk) → (f i)′(x). This completestheproofof (2.6).Assertion(2.7) is animmediateconsequenceof (2.6). utWenow establishthesemismoothnessif PC1-functions.

2.6 Extensions 27

Proposition2.26. Let f : V → Rm be a PC1-functionon the opensetV ⊂ Rn.Thenf is semismooth.If f is a PC2-function,thenf is 1-ordersemismooth.

Proof. The local Lipschitz continuity andB-differentiabilityof f is guaranteedbyPropositions2.23and2.24.Now considerx ∈ V . In a neighborhoodW of x, f isa continuousselectionof C1-functionsf1, . . . , fN and,without restriction,wemayassumethatall f i areactive at x. For all x + s ∈ W andall M ∈ ∂f(x + s)wehave,by Proposition2.25,M =

∑i∈Ie(x+s) λi(f

i)′(x+ s), λi ≥ 0,∑i λi = 1.

Hence,by Taylor’s theorem,usingf i(x+ s) = f(x+ s) for all i ∈ Ie(x+ s),

‖f(x+ s)− f(x)−Ms‖ =∑

i∈Ie(x+s)

λi‖f i(x+ s)− f i(x)− (f i)′(x+ s)s‖

≤ maxi∈Ie(x+s)

∫ 1

0

‖(f i)′(x+ τs)s− (f i)′(x+ s)s‖dτ = o(‖s‖),

whichestablishesthesemismoothnessof f . If thef i areC2, we obtain

‖f(x+ s)− f(x)−Ms‖ ≤ maxi∈Ie(x+s)

∫ 1

0

τ‖sT (f i)′′(x+ τs)s‖dτ = O(‖s‖2),

showing thatf is 1-ordersemismoothin this case. ut

2.6 Extensions

It is obviousthatusefulsemismoothnessconceptscanalsobeobtainedfor othersuit-ablegeneralizedderivatives.This wasinvestigatedin a general,finite-dimensionalframework by Jeyakumar[85, 86]. He introducedtheconceptof ∂∗f -semismooth-ness,where∂∗f is anapproximateJacobian[87]. For thedefinitionof approximateJacobianswereferto [87]; in thesequel,it is sufficient to know thatanapproximateJacobianof f : Rn 7→ Rm is a closed-valuedmultifunctions∂∗f : Rn ⇒ Rm×nand that ∂Bf , ∂f , and ∂Cf are approximateJacobians.To avoid confusionwiththeinfinite-dimensionalsemismoothnessconceptintroducedlater(whichessentiallycorrespondsto weak J-semismoothness),we denoteJeyakumar’s semismoothnessconceptby J-semismoothness(“J” for Jeyakumar).

Definition 2.27. Let f : Rn 7→ Rm bea functionwith approximateJacobian∂∗f .

(a) Thefunctionf is calledweakly∂∗f -J-semismoothatx if it is continuousnearxand

supM∈co∂∗f(x+s)

‖f(x+ s)− f(x)−Mh‖ = o(‖s‖) ass→ 0. (2.8)

(b) Thefunctionf is ∂∗f -J-semismoothatx if

(i) f is B-differentiableat x (e.g., locally Lipschitz continuousnearx anddirectionallydifferentiableatx, see[133]), and


(ii) f is weakly∂∗f -J-semismoothatx.

Obviously, we candefineweak∂∗f -J-semismoothnessof orderα by requiringtheorderO

(‖s‖1+α) in (2.8), and∂∗f -J-semismoothnessof orderα by the additionalrequirementthatf beα-orderB-differentiableatx.

Note that for locally Lipschitz continuousfunctions∂Bf -, ∂f -, and ∂Cf -J-semismoothnessall coincidewith the usualsemismoothness,cf. Proposition2.10in thecaseof ∂Cf -J-semismoothness.Thesameholdstruefor α-ordersemismooth-ness.

Algorithm 2.11canbeextendedto weakly∂∗f -J-semismoothnessequationsbychoosingMk ∈ ∂∗f(xk) in step2. The proof of Proposition2.12 canbe left un-changed,with the only differencethat in assumption(b) (iii) we have to requirethat ∂∗f is compact-valuedanduppersemicontinuousat x. If f is weakly ∂∗f -J-semismoothnessof orderα at x, thenananalogueof Proposition2.18holds.

3. NewtonMethods for SemismoothOperator Equations

3.1 Intr oduction

It wasshown in chapter1 that semismoothNCP- andMCP-functionscanbe usedto reformulatetheVIP (1.1)as(oneor more)nonsmoothoperatorequation(s)of theform

Φ(u) = 0, where Φ(u)(ω) = φ(G(u)(ω)

)onΩ, (3.1)

with G mappingu ∈ Lp(Ω) to a vectorof Lebesguefunctions.In particular, forNCPswe haveG(u) = (u,F (u)) with F : Lp(Ω)→ Lp

′(Ω), p, p′ ∈ (1,∞]. In fi-

nitedimensionsthis reformulationtechniqueis well investigatedandyieldsasemis-moothsystemof equations,which canbesolvedby semismoothNewton methods.Naturally, the questionarisesif it is possibleto developa similar semismoothnesstheoryfor operatorsof the form (3.1). This questionis of significantpracticalim-portancesincetheperformanceof numericalmethodsfor infinite-dimensionalprob-lemsis intimately relatedto the infinite-dimensionalproblemstructure.In particu-lar, it is desirablethat thenumericalmethodcanbeviewedasa discreteversionofa well-behaved abstractalgorithmfor the infinite-dimensionalproblem.Then,forincreasingaccuracy of discretization,the convergencepropertiesof the numericalalgorithmcanbeexpectedto be(andusuallyare)predictedverywell by theinfinite-dimensionalconvergenceanalysis.Therefore,the investigationof algorithmsin theoriginal infinite-dimensionalproblemsettingis very helpful for thedevelopmentofrobust,efficient,andmesh-independentnumericalalgorithms.

In thefollowing, wecarryoutsuchananalysisfor semismoothNewtonmethodsthatareapplicableto operatorequationsof theform (3.1).Wesplit our investigationsin two parts.First,we develop:

• A generalsemismoothnessconceptfor operatorsf : Y ⊃ V → Z in Banachspaces,which is basedona setvaluedgeneralizeddifferential∂∗f .

• A locally q-superlinearlyconvergent Newton-like methodfor the solution of∂∗f -semismoothnessoperatorequations.

• Extensionsof thesemethodsthat(a) allow inexactcomputationsand(b) incor-porateaprojectionto stayfeasiblewith respectto aclosedconvex setcontainingthesolution.

• α-order∂∗f -semismoothnessand,basedon this,convergencerate1 + α for thedevelopedNewtonmethods.

30 3. Newton Methodsfor SemismoothOperatorEquations

• Resultson the (α-order)semismoothnessof the sum,composition,anddirectproductof semismoothoperatorswith respectto suitablegeneralizeddifferen-tials.

In thesecondpart,whichfollows[139] andconstitutesthemajorpartof thischapter,we fill theseabstractconceptswith life by consideringthe concretecaseof super-positionoperatorsin functionspaces.Hereby, we investigateoperatorsof the formΨ(y)(ω) = ψ(G(y)(ω)), aclassthatincludestheoperatorsarisingin reformulations(3.1)of VIPs. In particular:

• Weintroduceasuitablegeneralizeddifferential∂Ψ thatis easyto computeandhasanaturalfinite-dimensionalcounter-part.

• We prove that, under suitableassumptions,the operatorsΨ are ∂Ψ -semi-smooth;underadditionalassumptions,weestablishα-ordersemismoothness.

• Weapplythegeneralsemismoothnesstheoryto developlocally fastconvergentNewton typemethodsfor theoperatorequationΨ(y) = 0.

In carryingout this program,we wantto achievea reasonablecompromisebetweengeneralityandapplicabilityof thedevelopedconcepts.

Concerninggenerality, it is possibleto poseabstractconditionson an opera-tor andits generalizeddifferentialsuchthat superlinearlyconvergentNewton-typemethodscanbedeveloped.Wereferto Kummer[103],whereanicesuchframeworkis developed.Similarly, on the abstractlevel, we work with the following generalconcept:Givenanoperatorf : Y ⊃ V → Z (V open)betweenBanachspacesandaset-valuedmapping∂∗f : V ⇒ L(Y,Z), wesaythatf is ∂∗f -semismoothaty ∈ Vif f is continuousneary and

supM∈∂∗f(y+s)

‖f(y + s)− f(y)−Ms‖Z = o(‖s‖Y ) as‖s‖Y → 0.

If the remainderterm is of the orderO(‖s‖1+αY ), 0 < α ≤ 1, we call f α-order∂∗f -semismoothat y. The classof ∂∗f -semismoothoperatorsallows a relativelystraightforward developmentand analysisof Newton-type methods.The readershouldbe aware that in view of section2.6 it would be more preciseto usetheterm “weakly ∂∗f -semismooth”insteadof “semismooth”,sincewe do not requiretheB-differentiabilityof f at y. Nevertheless,we prefertheterm“semismooth”forbrevity. Therefore,our definition of semismoothnessis slightly weaker thanfinite-dimensionalsemismoothness,but, asalreadysaid,still powerful enoughto admitthe designof superlinearlyconvergent Newton-typemethods,which is our mainobjective. It is alsoweaker thantheabstractsemismoothnessconceptthat,indepen-dently of the presentwork, wasrecentlyproposedby Chen,NashedandQi [30];to avoid ambiguity, we call this conceptCNQ-semismoothness(“CNQ” for Chen,Nashedand Qi). Hereby[30], the notionsof a slantingfunction f and of slantdifferentiability of f areintroducedanda generalizedderivative ∂Sf(y), the slantderivative,is obtainedasthecollectionof all possiblelimits limyk→y f

(yk). CNQ-semismoothnessis thendefinedby imposingappropriateconditionson theapproxi-mationpropertiesof theslantingfunctionandtheslantderivative.Theseconditions

3.1 Introduction 31

areequivalent[30, Thm.3.3] to therequirementsthat(i) f is slantlydifferentiableinaneighborhoodof y, (ii) f is∂Sf -semismoothnessaty, and(iii) f is B-differentiableat y, i.e., thedirectionalderivative f ′(y, s) = limt→0+(f(y + ts) − f(y))/t existsandsatisfies‖f(x+ s)− f(x)− f ′(x, s)‖Z = o(‖s‖Y ) as‖s‖Y → 0.

For ∂∗f -semismoothequationswe develop Newton-like methodsandprove q-superlinearconvergence.Hereby, we imposeregularityassumptionsthataresimilarto theirfinite-dimensionalcounterparts(e.g.,thosein Proposition2.12).Forα-order∂∗f -semismoothequations,convergenceof order≥ 1 + α is established.In viewof our applicationsto reformulationsof the VIP, and,moregenerally, semismoothsuperpositionoperators,it is advantageousto formulateand analyzethe Newtonmethodin a two-normframework, which requiresto augmenttheNewton iterationby a smoothingstep.Further, weallow for inexactnessin thecomputationsandalsoanalyzeaprojectedversionof thealgorithmwhichgeneratesiteratesthatstaywithinaprescribedclosedconvex set.

Unfortunately, from the viewpoint of applications,the abstractframework of∂∗f -semismoothness(as well as other generalapproaches)leaves two importantquestionsunanswered:

(a) Givenaparticularoperatorf , how should∂∗f bechosen?

(b) Is thereaneasyway to verify thatf is ∂∗f -semismooth?

Thesamequestionsarisein thecaseof CNQ-semismoothness.Then(a) consistsinfinding an appropriateslantingfunction,andpart (b) becomesevenmoreinvolvedsinceCNQ-semismoothnessis strongerthan∂Sf -semismoothness.

Themajor, secondpartof thischapteris intendedto developsatisfactoryanswersto thesetwo questionsfor a classof nonsmoothoperatorswhich includesthemap-pingsΦ arisingfrom reformulationsof NCPsandMCPs,see(3.1).More precisely,weconsidersuperpositionoperatorsof theform

Ψ : Y → Lr(Ω), Ψ(y)(ω) = ψ(G(y)(ω)

), (3.2)

with mappingsψ : Rm → R andG : Y → ∏mi=1 L

ri(Ω), where1 ≤ r ≤ ri < ∞,Y is a real Banachspace,andΩ ⊂ Rn is a boundedmeasurableset with posi-tiveLebesguemeasure.Essentially, ourworkingassumptionsarethatψ is Lipschitzcontinuousandsemismooth,andthatG is continuouslyFrechet-differentiable.Thedetailedassumptionsaregivenbelow. As generalizeddifferentialfor Ψ weintroduceanappropriatemultifunction

∂Ψ : Y ⇒ L(Y,Lr)

(thesuperscript“” is usedto indicatethat∂ is designedespeciallyfor superposi-tion operators),whichis easyto computeandis motivatedby Qi’sfinite-dimensionalC-subdifferential[121]; this addressesquestion(a) raisedabove. In our mainresultweestablishthe∂Ψ -semismoothnessof Ψ :

supM∈∂Ψ(y+s)

‖Ψ(y + s)− Ψ(y)−Ms‖Lr = o(‖s‖Y ) as‖s‖Y → 0. (3.3)


Thisanswersquestion(b) for superpositionoperatorsof theform (3.2).WealsogiveconditionsunderwhichΨ is α-order∂Ψ -semismooth,0 < α ≤ 1.

Basedon (3.3), we usethe abstractresultsof the first part to developa locallyq-superlinearlyconvergentNewtonmethodfor thenonsmoothoperatorequation

Ψ(y) = 0. (3.4)

Moreover, in thecasewhereΨ isα-ordersemismoothweproveconvergencewith q-rate1+α. As wasalreadyobservedearlierin thecontext of relatedlocalconvergenceanalysesin function space[95, 143], we have to incorporatea smoothingsteptoovercomethenon-equivalenceof norms.We alsogiveanexampleshowing thatthissmoothingstepcanbeindispensable.

Althoughthedifferentiabilitypropertiesof superpositionoperatorswith smoothψ arewell investigated,see,e.g.,the expositions[9] and[10], this is not the casefor nonsmoothfunctionsψ. Further, even if ψ is smooth,for operatorequationsoftheform (3.4) theavailability of local convergenceresultsfor Newton-like methodsappearsto bevery limited.

As alreadysaid,animportantapplicationof our results,which motivatesour in-vestigations,arereformulationsof VIPs (1.1)posedin functionspaces.Throughoutthischapter, ourinvestigationsof theoperatorΨ will beaccompaniedby illustrationsat theexampleof NCP-functionbasedreformulationsof nonlinearcomplementarityproblem(NCPs),which, briefly recalled,consistsin finding u ∈ Lp(Ω) suchthatalmosteverywhereonΩ holds

u ≥ 0, F (u) ≥ 0, uF (u) = 0, (3.5)

wherethe operatorF : Lp(Ω) → Lp′(Ω), 1 < p′, p ≤ ∞, is given.As always,

Ω ⊂ Rn is assumedto beboundedandmeasurablewith positiveLebesguemeasure.Using a Lipschitz continuous,semismoothNCP-functionφ : R2 → R, (3.5) isequivalentto the operatorequation(3.1). Obviously, choosingY = Lp(Ω), r2 =r ∈ [1, p′) ∩ [1, p), r1 ∈ [r, p), ψ ≡ φ, andG : u ∈ Lp(Ω) 7→ (

u,F (u)), we have

Ψ ≡ Φ with Ψ asin (3.2). Our focuson the NCP asthe mainexampleratherthanreformulationsof themoregeneralVIP is just for notationalconvenience.In fact,ascanbeseenfrom (1.30),thegeneralVIP requiresto usedifferentreformulationsondifferentpartsof Ω, dependingon thekind of bounds(none,only lower, only upper,lower andupperbounds),aburdenwewantto avoid in thischapter.

To establishthesemismoothnessof Ψ we have to chooseanappropriatevector-valuedgeneralizeddifferential.Althoughtheavailableliteratureon generalizeddif-ferentialsandsubdifferentialsis mainly focusedon real-valuedfunctions,see,e.g.,[20, 32,33, 130] andthereferencestherein,severalauthorshaveproposedandana-lyzedgeneralizeddifferentialsfor nonlinearoperatorsbetweeninfinite-dimensionalspaces[37, 61, 84,123, 135]. In ourapproach,wework with ageneralizeddifferen-tial thatexploitsthestructureof Ψ . Roughlyspeaking,ourgeneralguidanceherebyisto transcribe,at leastformally, componentwiseoperationsin Rk to pointwiseopera-tionsin functionspaces.To sketchtheidea,notethatthefinite-dimensionalanalogueof theoperatorΨ is themapping

3.1 Introduction 33

Ψ f : Rk → Rl, Ψ fj(x) = ψ

(Gj(x)

), j = 1, . . . , l

with ψ asabove andC1-mappingsGj : Rk → Rm. We have thecorrespondencesω ∈ Ω ↔ j ∈ 1, . . . , l, y ∈ Y ↔ x ∈ Rk, andG(y)(ω)↔ Gj(x). Component-wiseapplicationof thechainrule for Clarke’s generalizedgradient[32] shows thattheC-subdifferentialof Ψ f consistsof matricesM ∈ Rl×k having rowsof theform

Mj =m∑i=1

dji (Gji )′(x), with dj ∈ ∂ψ(Gj(x)).

For completeness,let us notethat, conversely, every suchmatrix is an elementof∂CΨ

f if, e.g.,ψ is regular. Carrying out the sameconstructionfor Ψ in a purelyformal mannersuggeststo choosea generalizeddifferentialfor Ψ consistingof op-eratorsof theform

v ∈ Y 7→m∑i=1

di ·(G′i(x)v

)with (d1, . . . , dm)(ω) ∈ ∂ψ(G(y)(ω)

)a.e.onΩ,

wheretheinclusionon theright is meantin thesenseof measurableselections.Oneadvantageof this approach,which motivatesour choiceof the generalizeddiffer-ential∂Ψ , is that it consistsof relatively “concrete”objectsascomparedto thoseinvestigatedin, e.g.,[37,61, 84,123, 135], whichnecessarilyaremoreabstractsincethey arenotrestrictedto aparticularstructureof theunderlyingoperator. It is not theobjective of this chapterto investigatetheconnectionsbetweenthegeneralizeddif-ferential∂Ψ andothergeneralizeddifferentials.Therearecloserelationships,butwe leave it asa topic for futureresearch.Here,we concentrateon thedevelopmentof asemismoothnessconceptbasedon∂Ψ , a relatednonsmoothNewton’smethod,andtherelationsto therespectivefinite-dimensionalanalogues.

As alreadymentioned,the literatureon Newton-like methodsfor the solutionof nonlinearcomplementarityproblemsor, closelyrelated,bound-constrainedopti-mizationproblemsposedin functionspacesis very limited. Hereby, wecall anitera-tion Newton-likeif eachiterationessentiallyrequiresthesolutionof alinearoperatorequation.We point out that in this sensesequentialquadraticprogramming(SQP)methodsfor problemsinvolving inequalityconstraints[2, 3, 4, 5, 6, 76, 138] arenotNewton-like, sinceeachiterationrequiresthesolutionof a quadraticprogrammingproblem(or, put differently, a linearizedgeneralizedequation)which is in generalsignificantlymoreexpensive thansolvinga linearoperatorequation.Therefore,in-steadof applying the methodsconsideredin this chapterdirectly to the nonlinearproblem,they alsocouldbeof interestassubproblemsolversfor SQPmethods.

Probablytheinvestigationsclosestrelatedto oursaretheanalysisof Bertsekas’projectedNewtonmethodby Kelley andSachs[95], andtheinvestigationof affine-scalinginterior-point Newton methodsby Ulbrich andUlbrich [143]. Both papersdealwith bound-constrainedminimizationproblemsin functionspacesandestablishthelocalq-superlinearconvergenceof theirrespectiveNewton-likemethods.In bothapproachestheconvergenceresultsareobtainedby estimatingdirectlytheremainder


termsappearingin theanalysisof theNewton iteration.Hereby, specificpropertiesof the solutionareexploited,anda strict complementarityconditionis assumedinboth papers.We developour resultsfor the generalproblemclass(3.4) andderivetheapplicability to nonlinearcomplementarityproblemsasa simple,but importantspecialcase.In thecontext of NCPsandoptimization,wedonothaveto assumeanystrict complementaritycondition.

Notation

In this chapterwe equipproductspaces∏i Yi with thenorm‖y‖ΠiYi

=∑

i ‖y‖Yi.

further, for convenience,wewrite∑i and

∏i insteadof

∑mi=1 and

∏mi=1.

3.2 NewtonMethods for Abstract SemismoothOperators

3.2.1 SemismoothOperators in BanachSpaces

In theprevioussectionwehavealreadyoutlinedthefollowing abstractsemismooth-nessconceptfor generaloperatorsbetweenBanachspaces:

Definition 3.1. Let f : Y ⊃ V → Z be definedon an opensubsetV of the Ba-nachspaceY with imagesin theBanachspaceZ. Further, let begivena set-valuedmapping∂∗f : V → L(Y,Z), andlet y ∈ V .

(i) We saythatf is ∂∗f -semismoothaty if f is continuousneary and

supM∈∂∗f(y+s)

‖f(y + s)− f(y)−Ms‖Z = o(‖s‖Y ) as‖s‖Y → 0.

(ii) We saythatf is α-order∂∗f -semismoothat y, 0 < α ≤ 1, if f is continuousneary and

supM∈∂∗f(y+s)

‖f(y + s)− f(y)−Ms‖Z = O(‖s‖1+αY ) as‖s‖Y → 0.

(iii) Themultifunction∂∗f is calledgeneralizeddifferentialof f .

Remark 3.2. The mappingy ∈ Y 7→ ∂∗f(y) ⊂ L(Y,Z) canbe interpretedasaset-valuedpoint-basedapproximation,seeRobinson[128], Kummer[103], andXu[146].

3.2.2 BasicProperties

We begin by establishingseveral fundamentalpropertiesof semismoothoperators.First, it is importantto know that continuouslydifferentiableoperatorsf are f ′-semismooth.More precisely:

3.2 Newton Methodsfor AbstractSemismoothOperators 35

Proposition3.3. Let f : Y ⊃ V → Z bedifferentiableon theneighborhoodV ofy with its derivativef ′ beingcontinuousneary. Thenf is f ′-semismoothat y. If f ′

is α-Holder continuousneary, 0 < α ≤ 1, thenf is α-orderf ′-semismoothat y.

Proof. Wehaveby thefundamentaltheoremof calculus

‖f(y + s)− f(y)− f ′(y + s)s‖Z ≤∫ 1

0

‖(f ′(y + ts)− f ′(y + s))s‖Zdt≤ sup

0≤t≤1‖f ′(y + ts)− f ′(y + s)‖Y,Z‖s‖Y = o(‖s‖Y ) as‖s‖Y → 0.

Thusf is f ′-semismoothaty. If f ′ isα-Holdercontinuousneary, we obtain

sup0≤t≤1

‖f ′(y + ts)− f ′(y + s)‖Y,Z

≤ sup0≤t≤1

O(‖(t− 1)s‖αY ) = O(‖s‖αY ) as‖s‖Y → 0,

whichestablishestheα-orderf ′-semismoothnessof f aty. utWeproceedbyestablishingthesemismoothnessof thesumof semismoothoperators.

Proposition3.4. Let V ⊂ Y be openand let fi : V → Z be (α-order) ∂∗fi-semismoothat y ∈ V , i = 1, . . . ,m. Considertheoperator

f : Y ⊃ V → Z, f(y) = f1(y) + · · · + fm(y).

Further, definethegeneralizeddifferential∂∗f def= ∂∗f1 + · · ·+∂∗fm : V ⇒ L(Y,Z)asfollows:

∂∗f(y) = M1 + · · · +Mm : Mi ∈ ∂∗fi(y), i = 1, . . . ,m.

Thenf is (α-order) ∂∗f -semismoothat y.

Proof. By the∂∗fi-semismoothnessof fi,

supM‖f(y + s)− f(y)−Ms‖Z

≤∑i

supMi

‖fi(y + s)− fi(y)−Mis‖Z = o(‖s‖Y ) as‖s‖Y → 0,

where the supremaare taken over M ∈ ∂∗f(y + s) andMi ∈ ∂∗fi(y + s),respectively. In the caseof α-order semismoothness,we can replaceo(‖s‖Y ) byO(‖s‖1+αY ). ut

The next resultshows that the direct productof semismoothoperatorsis itselfsemismoothwith respectto thedirectproductof thegeneralizeddifferentialsof thecomponents.


Proposition3.5. LetV ⊂ Y beopenandassumethat theoperatorsfi : V → Zi,i = 1, . . . ,m, are (α-order) ∂∗fi-semismoothat y ∈ V with generalizeddifferen-tials ∂∗fi : V ⇒ L(Y,Zi). Thentheoperator

f = (f1, . . . , fm) : y ∈ V 7→ (f1(y), . . . , fm(y)

) ∈ Z def= Z1 × · · · × Zmis (α-order) (∂∗f1× · · · × ∂∗fm)-semismoothat y, where(∂∗f1× · · · × ∂∗fm)(y) isthesetof all operatorsM ∈ L(Y,Z) of theform

M : v 7→ (M1v, . . . ,Mmv) with Mi ∈ ∂∗fi(y), i = 1, . . . ,m.

Proof. Let ∂∗f = ∂∗f1 × · · · × ∂∗fm. Then for all M ∈ ∂∗f(y + s) thereexistMi ∈ ∂∗fi(y + s) with Mv = (M1v, . . . ,Mmv). Hence,usingthenorm‖z‖Z =‖z1‖Z1 + · · ·+ ‖zm‖Zm

, andwriting supM andsupMifor suprematakenoverM ∈

∂∗f(y + s) andMi ∈ ∂∗fi(y + s), respectively, weobtain

supM‖f(y + s)− f(y)−Ms‖Z =

m∑i=1

supMi

‖fi(y + s)− fi(y)−Mis‖Zi

= o(‖s‖Y ) as‖s‖Y → 0.

In the caseof α-ordersemismoothness,the above holdswith o(‖ · ‖) replacedbyO(‖ · ‖1+α). utRemark 3.6. We stressthat the constructionof ∂∗f1 × · · · × ∂∗fm from ∂∗fi isanalogousto thatof theC-subdifferential∂Cf from ∂fi.

Next, wegiveconditionsunderwhich thecompositionof two semismoothoperatorsis semismooth.

Proposition3.7. Let U ⊂ X andV ⊂ Y be open.Further, let f1 : U → Y beLipschitz continuousnearx ∈ U and(α-order) ∂∗f1-semismoothat x. Further, letf2 : V → Z be(α-order) ∂∗f2-semismoothat y = f1(x) with ∂∗f2 beingboundednear y. Let f1(U) ⊂ V and considerthe operator f

def= f2 f1 : X ⊃ U → Z,f(x) = f2(f1(x)). Further, definethegeneralizeddifferential ∂∗f def= ∂∗f2 ∂∗f1 :U ⇒ L(X,Z) asfollows:

∂∗f(x) = (∂∗f2 ∂∗f1)(x) = M2M1 : M1 ∈ ∂∗f1(x), M2 ∈ ∂∗f2(f1(x)

).Thenf is (α-order) ∂∗f -semismoothat x.

Proof. We seth = f1(x+ s) − f1(x), x + s ∈ U . For all x + s ∈ U andall M ∈∂∗f(x+ s) thereexistM1 ∈ ∂∗f1(x+ s) andM2 ∈ ∂∗f2

(f1(x+ s)

)= ∂∗f2(y+h)

with M = M2M1. Dueto theLipschitzcontinuityof f1 nearx, wehave

‖h‖Y = ‖f1(x+ s)− f1(x)‖Y = O(‖s‖X) as‖s‖X → 0. (3.6)

Further, since∂∗f2 is boundedneary, we canusethesemismoothnessof f1, f2 and(3.6) to seethatfor all sufficiently smalls ∈ X holds


supM‖f(x+ s)− f(x)−Ms‖Z

= supM1,M2

‖f2(y + h)− f2(y)−M2M1s‖Z

≤ supM1,M2

(‖f2(y + h)− f2(y)−M2h‖Z + ‖M2(h−M1s)‖Z)

≤ o(‖h‖Y ) + supM2

‖M2‖Y,Z supM1

‖f1(x+ s)− f1(x)−M1s‖Y= o(‖h‖Y ) + o(‖s‖X) = o(‖s‖X) as‖s‖X → 0,

wherethesupremaaretakenoverM ∈ ∂∗f(x+ s), M1 ∈ ∂∗f1(x+ s), andM2 ∈∂∗f2(y + h), respectively. Therefore,f is ∂∗f -semismoothat x. In the caseof α-ordersemismoothness,we canreplace“o(‖ · ‖)” with “O(‖ · ‖1+α)” in the abovecalculations,whichyieldstheα-order∂∗f -semismoothof f atx. utRemark 3.8. The establishedresultsprovide a varietyof waysto combinesemis-moothoperatorsto constructnew semismoothoperators.

3.2.3 SemismoothNewton’sMethod

In analogyto Algorithm 2.11,we now considera Newton-like methodfor thesolu-tion of theoperatorequation

f(y) = 0, (3.7)

whichusesthegeneralizeddifferential∂∗f . Hereby, wewill assumethatf : V → Z,V ⊂ Y open,is ∂∗f -semismoothat thesolutiony ∈ V of (3.7).As wewill see,it isimportantfor applicationsto incorporateanadditionaldevice,the“smoothingstep”,in thealgorithm,which enablesus to work with two-normtechniques.To this end,we introducea further BanachspaceY0, in which Y is continuouslyanddenselyembedded,andaugmenttheiterationby asmoothingstep:

Algorithm 3.9 (SemismoothNewton’sMethod).

0. Chooseaninitial pointy0 ∈ V andsetk = 0.

1. ChooseMk ∈ ∂∗f(yk), computesk ∈ Y0 from

Mksk = −f(yk),

andsety0k+1 = yk + sk.

2. Performasmoothingstep:y0k+1 ∈ Y0 7→ yk+1 = Sk(y0

k+1) ∈ Y .

3. If yk+1 = yk, thenSTOPwith resulty∗ = yk+1.

4. Incrementk by oneandgo to step1.

Remark 3.10. Thestoppingtestin step3 is certainlynotstandard.In fact,wecouldremove step3 andperformthefollowing simplertestat thebeginningof step1: “Iff(yk) = 0, thenSTOPwith resulty∗ = yk”. But thenwe only couldprove thaty∗

is a solutionof (3.7),but we would not know if y∗ = y or not. For Algorithm 3.9,however, weareableto prove thaty∗ = y holdsin thecaseof finite termination.


Before we establishfast local convergenceof this algorithm, a commenton thesmoothingstepis in order. First, it is clear that the smoothingstepcan be elimi-natedfrom thealgorithmby choosingY0 = Y andSk(y0

k+1) = y0k+1. However, as

wewill seelater, in many importantsituationstheoperatorsMk arenotcontinuouslyinvertible in L(Y,Z). Fortunately, the following framework, which turnsout to bewidely applicable,providesanescapefrom this difficulty:

Assumption 3.11. ThespaceY is continuouslyanddenselyembeddedin aBanachspaceY0 suchthat:

(i) (Regularity condition) The operatorsMk map Y0 continuouslyinto Z withboundedinverses,andthereexistsa constantCM−1 > 0 suchthat

‖M−1k ‖Z,Y0 ≤ CM−1 .

(ii) (Smoothingcondition)Thesmoothingstepsin step1 satisfy

‖Sk(y0k+1)− y‖Y ≤ CS‖y0

k+1 − y‖Y0

for all k, wherey ∈ Y solves(3.7).

Theorem3.12. Let f : Y ⊃ V → Z be an operator betweenBanach spaces,definedontheopensetV , with generalizeddifferential∂∗f : V ⇒ L(Y,Z). Denoteby y ∈ V a solutionof (3.7)andlet Assumption3.11hold.Thenholds:

(i) If f is ∂∗f -semismoothat y, then there exists δ > 0 such that, for all y0 ∈y + δBY , Algorithm3.9 eitherterminateswith y∗ = y or generatesa sequence(yk) ⊂ V that convergesq-superlinearlyto y in Y .

(ii) If in (i) the mappingf is α-order ∂∗f -semismoothat y, 0 < α ≤ 1, thentherateof convergenceis at least1 + α.

Theproof is similar asthatof Proposition2.12.

Proof. (i): Denotetheerrorsbefore/aftersmoothingby v0k+1 = y0

k+1−y andvk+1 =yk+1 − y, respectively. Now let δ > 0 besosmall that y + δBY ⊂ V andconsideryk ∈ y + δBY . UsingMksk = −f(yk) andf(y) = 0, weobtain

Mkv0k+1 = Mk(sk+vk) = −f(yk)+Mkvk = −[f(y+vk)−f(y)−Mkvk]. (3.8)

Thisandthe∂∗f -semismoothnessof f at y yield

‖Mkv0k+1‖Z = o(‖vk‖Y ) as‖vk‖Y → 0. (3.9)

Hence,for sufficiently smallδ > 0, wehave

‖Mkv0k+1‖Z ≤

12CM−1CS

‖vk‖Y , (3.10)

andthusby Assumption3.11(i)


‖v0k+1‖Y0 ≤ ‖M−1

k ‖Z,Y0‖Mkv0k+1‖Z ≤

12CS‖vk‖Y .

Therefore,usingAssumption3.11(ii),

‖vk+1‖Y ≤ CS‖v0k+1‖Y0 ≤

12‖vk‖Y . (3.11)

Thisshowsyk+1 ∈ y + (‖vk‖Y /2)BY ⊂ y + (δ/2)BY ⊂ V. (3.12)

If thealgorithmterminatesin step3, then

‖vk‖Y = ‖vk+1‖Y ≤ 12‖vk‖Y ,

hencevk = 0, andthusy∗ = yk = y.On theotherhand,if thealgorithmrunsinfinitely, then(3.12)inductively yields

V 3 yk → y in Y . Now weconcludefrom thederivedestimatesand(3.9)

‖vk+1‖Y ≤ CS‖v0k+1‖Y0 ≤ CS‖M−1

k ‖Z,Y0‖Mkv0k+1‖Z

≤ CSCM−1‖Mkv0k+1‖Z = o(‖vk‖Y ),

(3.13)

whichcompletestheproofof (i).(ii): If, in addition, f is α-order ∂∗f -semismoothat y, then we can write

O(‖vk‖1+αY ) on theright handsideof (3.9)andobtainasin (3.13)

‖vk+1‖Y = O(‖vk‖1+αY ).

ut

3.2.4 Inexact Newton’sMethod

Froma computationalpoint of view, dueto discretizationandfinite precisionarith-metics,we only cancomputeapproximateelementsof ∂∗f in general.We addressthis issueby allowing a certainamountof inexactnessin theoperatorsMk

1. We in-corporatethepossibilityof inexactcomputationsin ouralgorithmby modifyingstep1 of Algorithm 3.9asfollows:

Algorithm 3.13. Inexact SemismoothNewton’sMethodAs Algorithm 3.9,but with step1 replacedby

1. ChooseaboundedlyinvertibleoperatorBk ∈ L(Y0, Z), computesk ∈ Y0 from

Bksk = −f(yk),


1 Westressthatinexactsolutionsof a linearoperatorequationsMs = b, M ∈ L(Y,Z) canalwaysbeinterpretedasexactsolutionof a systemwith inexactoperator:If Md = b + e,thenholds(M + δM)d = b with, e.g.,δMv = 〈w, v〉Y ∗,Y e for all v ∈ Y , wherew ∈ Y ∗

is chosensuchthat〈w, d〉Y ∗,Y = −1.


On the operatorsBk we posea Dennis-More-typecondition [40, 42, 112, 125],which we formulate in two versions,a weaker one requiredfor superlinearcon-vergenceanda strongervariantto proveconvergencewith rate1 + α.

Assumption 3.14.

(i) Thereexist operatorsMk ∈ ∂∗f(yk + sk) suchthat

‖(Bk −Mk)sk‖Z = o(‖sk‖Y0) as‖sk‖Y → 0, (3.14)

wheresk ∈ Y0 is thestepcomputedin step1.

(ii) Condition(i) holdswith (3.14)replacedby

‖(Bk −Mk)sk‖Z = O(‖sk‖1+αY0) as‖sk‖Y → 0.

Theorem3.15. Let f : Y ⊃ V → Z be an operator betweenBanach spaces,definedon the opensetV , with generalizeddifferential ∂∗f : V ⇒ L(Y,Z). Lety ∈ V bea solutionof (3.7) and let f beLipschitz continuousnear y. Further, lettheAssumptions3.11and3.14(i) hold.Then:

(i) If f is ∂∗f -semismoothat y, then there exists δ > 0 such that, for all y0 ∈y+δBY , Algorithm3.13 eitherterminateswith yk = y or generatesa sequence(yk) ⊂ V that convergesq-superlinearlyto y in Y .

(ii) If in (i) the mappingf is α-order ∂∗f -semismoothat y, 0 < α ≤ 1, and ifAssumption3.14(ii) is satisfied,thentheq-orderof convergenceis at least1+α.

Proof. We usethe samenotationsas in the proof of Theorem3.12 andsetµk =‖(Bk −Mk)sk‖Z . Throughout,consideryk ∈ y + δBY andlet δ > 0 besosmallthatf is Lipschitzcontinuouson y + δBY ⊂ V with modulusL > 0. Thenholds

‖f(yk)‖Z ≤ L‖vk‖Y .WeestimatetheY0-normof sk:

‖sk‖Y0 ≤ ‖M−1k ‖Z,Y0(‖Bksk‖Z + ‖(Mk −Bk)sk‖Z)

≤ CM−1(‖f(yk)‖Z + µk) ≤ CM−1(L‖vk‖Y + µk).(3.15)

By reducingδ, weachievethatCM−1µk ≤ ‖sk‖Y0/2. Hence,

‖sk‖Y0 ≤ 2CM−1L‖vk‖Y . (3.16)

Next, usingf(y) = 0 andBksk = −f(yk) = −f(y + vk), wederive

Mkv0k+1 = Mk(sk + vk) = (Mk −Bk)sk +Bksk +Mkvk

= (Mk −Bk)sk − [f(y + vk)− f(y)−Mkvk].(3.17)

This,Assumption3.14(i), the∂∗f -semismoothnessof f at y, and(3.16)yield

‖Mkv0k+1‖Z = o(‖sk‖Y0) + o(‖vk‖Y ) = o(‖vk‖Y ) as‖vk‖Y → 0. (3.18)


Now wecanproceedasin theproof of Theorem3.12(i) to establishassertion(i).(ii): If, in addition,f is α-order∂∗f -semismoothat y andAssumption3.14(ii)

holds,thenwecanimprove(3.18)to

‖Mkv0k+1‖Z = O(‖sk‖1+αY0

) +O(‖vk‖1+αY ) = o(‖vk‖1+αY ) as‖vk‖Y → 0.

Now wecanproceedasin theproof of Theorem3.12(ii). ut

3.2.5 ProjectedInexact Newton’sMethod

As a lastvariantof semismoothNewtonmethods,wedevelopaprojectedversionofAlgorithm 3.15thatis applicableto theconstrainedsemismoothoperatorequation

f(y) = 0 subjectto y ∈ K, (3.19)

whereK ⊂ Y is a closedconvex set.Hereby, let f : Y ⊃ V → Z bedefinedontheopensetV andassumethat(3.19)possessesasolutiony ∈ V ∩K. Sometimesitis desirableto have analgorithmfor (3.19)thatstaysfeasiblewith respecttoK. Toachievethis,weaugmentAlgorithm 3.15by aprojectionontoK. WeassumethatanoperatorPK : Y → K ⊂ Y is availablewith thefollowing properties:

Assumption 3.16.

(i) PK is aprojectionontoK, i.e., for all y ∈ Y holds

‖PK(y)− y‖Y = minv∈K‖v − y‖Y .

(ii) For all y in anY -neighborhoodof y holds

‖PK(y)− y‖Y ≤ LP ‖y − y‖Ywith aconstantLP > 0.

Thesetwo requirementsareeasilyseento besatisfiedin all situationswe encounterin this work. In particular, it holdswith LP = 1 if Y is a Hilbert spaceor if K = BandY = Lp(Ω), p ∈ [1,∞]. In thelattercase,we use

PB(u)(ω) = P[a(ω),b(ω)](u(ω)) = maxa(ω),minu(ω), b(ω) onΩ,

which satisfiestheassumptions(for p ∈ [1,∞), PB is theuniquemetricprojectionontoB). Wearenow in a positionto formulatethealgorithm:

Algorithm 3.17 (ProjectedInexact SemismoothNewton’sMethod).

0. Chooseaninitial pointy0 ∈ V ∩ K andsetk = 0.

1. ChooseaninvertibleoperatorBk ∈ L(Y0, Z), computesk ∈ Y0 from

Bksk = −f(yk),


2. Performasmoothingstep:y0k+1 ∈ Y0 7→ y1

k+1 = Sk(y0k+1) ∈ Y .

3. ProjectontoK: yk+1 = PK(y1k+1).

4. If yk+1 = yk, thenSTOPwith resulty∗ = yk+1.



Remark 3.18.

(i) Sincey0 ∈ K andall iteratesyk, k ≥ 1, areobtainedby projectionontoK, wehaveyk ∈ K for all k.

(ii) It is interestingto observe that by composingthe smoothingstepandthe pro-jectionstep,weobtaina step

SPk (y0k+1) = PK(Sk(y0

k+1))

that hasthe smoothingpropertyin anY0-neighborhoodof y. In fact, for y0k+1

neary (in Y0) holdsby Assumptions3.11and3.16

‖SPk (y0k+1)− y‖Y ≤ LP ‖Sk(y0

k+1)− y‖Y ≤ CSLP ‖y0k+1 − y‖Y0 .

Theorem3.19. Let f : Y ⊃ V → Z be an operator betweenBanach spaces,definedon the opensetV , with generalizeddifferential ∂∗f : V ⇒ L(Y,Z). LetK ⊂ Y be closedand convex with correspondingprojectionoperator PK and lety ∈ V ∩K bea solutionof (3.19). Further, assumethatf is LipschitzcontinuousonK near y andlet theAssumptions3.11, 3.14(i), and3.16hold.Then:

(i) If f is ∂∗f -semismoothat y, then there exists δ > 0 such that, for all y0 ∈(y + δBY ) ∩ K, Algorithm3.17 either terminateswith yk = y or generatesasequence(yk) ⊂ V ∩ K that convergesq-superlinearlyto y in Y .

(ii) If in (i) the mappingf is α-order ∂∗f -semismoothat y, 0 < α ≤ 1, and ifAssumption3.14(ii) is satisfied,thentheq-orderof convergenceis at least1+α.

Proof. We only sketch the modificationsrequiredto adjustthe proof of Theorem3.15to thepresentsituation.We chooseδ > 0 sufficiently small to ensurethatf isLipschitzonKδ = (y + δBY ) ∩ K. Then,for all yk ∈ Kδ we canestablish(3.15),(3.16),and,by reducingδ, (3.17)and(3.18).A furtherreductionof δ yields,insteadof (3.10),

‖Mkv0k+1‖Y0 ≤ (2CM−1CSLP )−1‖vk‖Y

andthus,analogousto (3.11),

‖v1k+1‖Y ≤ CS‖v0

k+1‖Y ≤ CM−1CS‖Mkv0k+1‖Y ≤ (2LP )−1‖vk‖Y ,

wherev1k+1 = y1

k+1 − y. Hence,for δ small enough,Assumption3.16(ii) canbeusedto derive

‖vk+1‖Y ≤ LP ‖v1k+1‖Y ≤ ‖vk‖Y /2.

Therestof theproof, includingtheonefor part(ii), canbetranscribeddirectly fromTheorem3.15. ut

3.2.6 Alter nativeRegularity Conditions

In theconvergencetheoremsweusedtheregularityconditionof Assumption3.11(i),which requiresuniform invertibility in L(Y0, Z) of all operatorsMk. SinceMk ∈∂∗f(yk), we alsocould requirethe uniform invertibility of all M ∈ ∂∗f(y) on aneighborhoodof y, moreprecisely:


Assumption 3.20. Thereexist η > 0 andCM−1 > 0 suchthat,for all y ∈ y+ηBY ,everyM ∈ ∂∗f(y) is aninvertibleelementof L(Y0, Z) with ‖M−1‖Z,Y0 ≤ CM−1 .

Thenobviouslyholds:

Theorem3.21. Let theoperator f : Y → Z anda correspondinggeneralizeddif-ferential ∂∗f : Y ⇒ L(Y,Z) be given.Denoteby y ∈ Y a solutionof (3.7) andlet the Assumption3.20hold. Further assumethat yk ∈ y + ηBY for all k. ThenAssumption3.11(i) holds.In particular, theTheorems3.12, 3.15, and3.19remaintrue if Assumption3.11(i) is replacedbyAssumption3.20.

Proof. Thefirst part follows directly from thefact thatMk ∈ ∂∗f(yk). Theproofsof theTheorems3.12,3.15,and3.19canbeappliedwithout changeaslongasyk ∈y+ ηBY . In particularit follows for yk ∈ y+ δBY andδ ∈ (0, η] smallenoughthatyk+1 ∈ y + (δ/2)BY ⊂ y + ηBY , see,e.g.,(3.12).Therefore,all iteratesremaininy + ηBY , andtheproofsareapplicablewithout change. utRemark 3.22. For the projectedNewton method,the requirementof Assumption3.20canberestrictedto all y ∈ (y + ηBY ) ∩ K.

A furthervariant,which correspondsto thefinite-dimensionalCD-regularity is ob-tainedby restrictingtheboundedinvertibility to all M ∈ ∂∗f(y).

Assumption 3.23. Themultifunctiony ∈ Y 7→ ∂∗f(y) ∈ L(Y0, Z) is uppersemi-continuousat y, andthereexistsCM−1 > 0 suchthateveryM ∈ ∂∗f(y) is invertiblein L(Y0, Z) with ‖M−1‖Z,Y0 ≤ CM−1 .

Theorem3.24. Assumption3.23 impliesAssumption3.20. In particular, theTheo-rems3.12, 3.15, and3.19remaintrue if Assumption3.11(i) is replacedbyAssump-tion 3.23.

Proof. Let theAssumption3.23hold andchooseε = 1/(2CM−1). By uppersemi-continuity there exists η > 0 such that ∂∗f(y) ⊂ ∂∗f(y) + εBL(Y0,Z) for ally ∈ y + ηBY . Now considerany y ∈ y + ηBY andany M ∈ ∂∗f(y). ThenthereexistsM ∈ ∂∗f(y) with

‖M − M‖Y0,Z < ε =1

2CM−1≤ 1

2‖M−1‖Z,Y0

.

Therefore,by Banach’s theorem[91, p. 155],M is invertiblein L(Y0, Z) with

‖M−1‖Z,Y0 ≤‖M−1‖Z,Y0

1− ‖M−1‖Z,Y0‖M − M‖Y0,Z

≤ CM−1

1− CM−1/(2CM−1)= 2CM−1 .

Thus,Assumption3.20holdswith CM−1 replacedby 2CM−1 . ut


Remark 3.25. Theorem3.24is convenientlyapplicablein finite dimensions.In thegeneralBanachspacesetting,however, uppersemicontinuityof ∂∗f with respecttothe operatornorm topologyis a quite strongrequirement.More realisticis usuallyuppersemicontinuitywith respectto theweakoperatortopologyontheimagespace,which is generatedby theseminormsM 7→ |〈w,My〉Z∗,Z |, w ∈ Z∗, y ∈ Y0. How-ever, this weakform of uppersemicontinuityis (except for the finite-dimensionalcase)not strongenoughto establishresultslike in Theorem3.24. In conclusion,we observe that in the infinite-dimensionalsettingthe regularity conditionsstatedin Assumption3.11(i) andin Assumption3.20aremuchwider applicablethanAs-sumption3.23.

3.3 SemismoothNewton Methods for Superposition Operators

Wenow concentrateonnonsmoothsuperpositionoperatorsof theform

Ψ : Y → Lr(Ω), Ψ(y)(ω) = ψ(G(y)(ω)

), (3.20)

with mappingsψ : Rm → R andG : Y → ∏mi=1 L

ri(Ω). Throughoutwe assumethat 1 ≤ r ≤ ri < ∞, Y is a real Banachspace,andΩ ⊂ Rn is a boundedmeasurablesetwith positiveLebesguemeasure.

Remark 3.26. Sinceall our investigationsareof local nature,it wouldbesufficientif G is only definedonanonemptyopensubsetof Y . Having this in mind,wepreferto work onY to avoid notationalinconveniences.

Throughout,our investigationsare illustratedby applicationsto the reformulatedNCP

Φ(u) = 0, where Φ(u)(ω) = φ(u(ω), F (u)(ω)

)onΩ (3.21)

with F : Lp(Ω) → Lp′(Ω), p, p′ ∈ (1,∞]. As alreadyobserved,Φ canbecastin

theform Ψ .

3.3.1 Assumptions

In therestof thechapter, we will imposethefollowing assumptionsonG andψ:

Assumption 3.27. Thereare1 ≤ r ≤ ri < qi ≤ ∞, 1 ≤ i ≤ m, suchthat

(a) TheoperatorG : Y →∏i L

ri(Ω) is continuouslyFrechetdifferentiable.

(b) The mappingy ∈ Y 7→ G(y) ∈ ∏i Lqi(Ω) is locally Lipschitz continuous,

i.e., for all y ∈ Y thereexistsanopenneighborhoodU = U(y) anda constantLG = LG(U) suchthat∑

i‖Gi(y1)−Gi(y2)‖Lqi ≤ LG‖y1 − y2‖Y for all y1, y2 ∈ U .

3.3 SemismoothNewton Methodsfor SuperpositionOperators 45

(c) Thefunctionψ : Rm → R is Lipschitzcontinuousof rankLψ > 0, i.e.,

|ψ(x1)− ψ(x2)| ≤ Lψ‖x1 − x2‖1 for all x1, x2 ∈ Rm,

(d) ψ is semismooth.

Remark 3.28. Sinceby assumptionthesetΩ is bounded,we have thecontinuousembeddingLq(Ω) → Lp(Ω) whenever1 ≤ p ≤ q ≤ ∞.

Remark 3.29. It is important to note that the norm of the imagespacein (b) isstrongerthanin (a).

For semismoothnessof order> 0 we will strengthentheAssumptions3.27asfol-lows:

Assumption 3.30. As Assumption3.27,but with (a)and(d) replacedby:Thereexistsα ∈ (0, 1] suchthat

(a) The operatorG : Y → ∏i L

ri(Ω) is Frechetdifferentiablewith locally α-Holdercontinuousderivative.

(d) ψ is α-ordersemismooth.

Notethatfor thespecialcaseY =∏i L

qi(Ω) andG = I wehave

Ψ : y ∈ Y 7→ ψ(y),

andit is easilyseenthat theAssumptions3.27or 3.30,respectively, reduceto parts(c) and(d).

UndertheAssumptions3.27,theoperatorΨ definedin (3.20)is well definedandlocally Lipschitzcontinuous.

Proposition3.31. Let theAssumptions3.27hold.Thenfor all 1 ≤ q ≤ qi, 1 ≤ i ≤m, andthusin particular for q = r, theoperatorΨ definedin (3.20)mapsY locallyLipschitzcontinuousintoLq(Ω).

Proof. UsingLemmaA.4, wefirst proveΨ(Y ) ⊂ Lq(Ω), which follows from

‖Ψ(y)‖Lq = ‖ψ(G(y))‖Lq ≤ ‖ψ(0)‖Lq + ‖ψ(G(y)

)− ψ(0)‖Lq

≤ cq,∞(Ω)|ψ(0)|+ Lψ∑

i‖Gi(y)‖Lq

≤ cq,∞(Ω)|ψ(0)|+ Lψ∑

icq,qi

(Ω)‖Gi(y)‖Lqi .

To establishthelocalLipschitzcontinuity, denotebyLG thelocalLipschitzconstantin Assumption3.27(b) on thesetU andlet y1, y2 ∈ U bearbitrary. Then,againbyLemmaA.4,

‖Ψ(y1)− Ψ(y2)‖Lq ≤ Lψ∑

i‖Gi(y1)−Gi(y2)‖Lq

≤ Lψ∑

icq,qi

(Ω)‖Gi(y1)−Gi(y2)‖Lqi

≤ LψLG(

max1≤i≤m

cq,qi(Ω)

)‖y1 − y2‖Y .

ut


For thespecialcaseΦ in (3.21),thenonsmoothNCP-reformulation,andthechoices

Y = Lp(Ω), q1 = p, q2 = p′, r2 = r ∈ [1, p′) ∩ [1, p), r1 ∈ [r, p),ψ ≡ φ, G(u) =

(u,F (u)

),

(3.22)we have Ψ ≡ Φ, andAssumption3.27canbe expressedin the following simplerform:

Assumption 3.32. Thereexistsr ∈ [1, p) ∩ [1, p′) suchthat

(a) The mappingu ∈ Lp(Ω) 7→ F (u) ∈ Lr(Ω) is continuouslyFrechet-differentiable.

(b) TheoperatorF : Lp(Ω)→ Lp′(Ω) is locally Lipschitzcontinuous.

(c) Thefunctionφ : R2 → R is Lipschitzcontinuous.

(d) φ is semismooth.

In fact,(a)andthecontinuousembeddingLp(Ω) → Lr1(Ω) imply 3.27(a).Further,(b) andthe Lipschitz continuity of the identity u ∈ Lp(Ω) 7→ u ∈ Lp(Ω) yield3.27(b).Finally, (c),(d) imply 3.27(c),(d).

In thesameway, Assumption3.30for Φ becomes

Assumption 3.33. As Assumption3.32,but with (a)and(d) replacedby:

Thereexist r ∈ [1, p) ∩ [1, p′) andα ∈ (0, 1] suchthat

(a) The operatorF : Lp(Ω) → Lr(Ω) is Frechetdifferentiablewith locally α-Holdercontinuousderivative.

(d) φ isα-ordersemismooth.

Remark 3.34. The threedifferentLp-spacesdeserve an explanation.Usually, wehavethefollowingscenario:F : L2(Ω)→ L2(Ω) is (ofteneventwice)continuouslydifferentiableandhasthepropertythat thereexist p, p′ > 2 suchthat themappingu ∈ Lp(Ω) 7→ F (u) ∈ Lp′(Ω) is locally Lipschitzcontinuous.A typical examplearisesfrom optimal control problemsas the problem(1.11) that we discussedinsection1.1.1.In thisproblem,whichin view of many applicationscanbeconsideredto betypical,F = j′ is thereducedgradientof thecontrolproblem,which,in adjointrepresentation,is givenby

F (u) = λu− w(u),

wherew(u) is theadjointstate.Themappingu 7→ w(u) is locally Lipschitzcontin-uous(for theproblemunderconsiderationevenaffine linear)fromL2(Ω) toH1

0 (Ω)andthus,via continuousembedding,alsoto Lp

′(Ω) for suitablep′ > 2. Hence,for

any p ≥ p′, F mapsLp(Ω) locally Lipschitzcontinuousto Lp′(Ω). Often,we can

invoke regularity resultsfor theadjointequationto prove the local Lipschitzconti-nuity of the mappingu ∈ L2(Ω) 7→ H1

0 (Ω) ∩ H2(Ω) which allows to choosep′

evenlarger, if desired.Therefore,as a rule of thumb,usually we are dealingwith the casewhereF

is smoothas a mappingL2(Ω) → L2(Ω) and locally Lipschitz continuousas amappingLp(Ω)→ Lp

′(Ω), p, p′ > 2. Obviously, theseconditionimply theweaker

Assumption3.32for 1 ≤ r ≤ 2 andp, p′ > 2 asspecified.


3.3.2 A GeneralizedDiffer ential

For the developmentof a semismoothnessconceptfor the operatorΨ definedin(3.20)wehaveto chooseanappropriategeneralizeddifferential.As wealreadymen-tionedin theintroduction,ouraimis to work with adifferentialthatis ascloselycon-nectedto finite dimensionalgeneralizedJacobiansaspossible.Hence,we will pro-posea generalizeddifferential∂Ψ in sucha way that its naturalfinite-dimensionaldiscretizationcontainsQi’s C-subdifferential.

Our constructionis motivatedby a formal pointwiseapplicationof the chainrule. In fact,supposefor themomentthattheoperatory ∈ Y 7→ G(y) ∈ C(Ω)m iscontinuouslydifferentiable,whereC(Ω) denotesthespaceof continuousfunctionsequippedwith themax-norm.Thenfor fixedω ∈ Ω thefunctionf : y 7→ G(y)(ω)is continuouslydifferentiablewith derivativef ′(y) ∈ L(Y,Rm),

f ′(y) : v 7→ (G′(y)v

)(ω).

Thechainrule for generalizedgradients[32, Thm.2.3.10]appliedto thereal-valuedmappingy 7→ Ψ(y)(ω) = ψ

(f(y)

)yields

∂(Ψ(y)(ω)

) ⊂ ∂ψ(f(y)) f ′(y)

=g ∈ Y ∗ : 〈g, v〉Y ∗,Y =

∑i di(ω)

(G′i(y)v

)(ω), d(ω) ∈ ∂ψ(G(y)(ω)

).

(3.23)

Furthermore,wecanreplace“⊂” by “=” if ψ is regular(e.g.,convex or concave)orif thelinearoperatorf ′(y) is onto,see[32, Thm.2.3.10].Inspiredby theideaof thefinite-dimensionalC-subdifferential,andfollowing theabovemotivation,we returnto the generalsettingof Assumption3.27, and definethe generalizeddifferential∂Ψ(y) in sucha way thatfor all M ∈ ∂Ψ(y), thelinearform v 7→ (Mv)(ω) is anelementof theright handsidein (3.23):

Definition 3.35. Let theAssumptions3.27hold.ForΨ asdefinedin (3.20)wedefinethegeneralizeddifferential∂Ψ : Y ⇒ L(Y,Lr),

∂Ψ(y) def=

M ∈ L(Y,Lr) :

M : v 7→∑i di ·

(G′i(y)v

),

d measurableselectionof ∂ψ(G(y)

) . (3.24)

Remark 3.36. Thesuperscript“” is chosento indicatethatthisgeneralizeddiffer-ential is designedfor superpositionoperators.

Thegeneralizeddifferential∂Ψ(y) is nonempty. To show this,wefirst prove:

Lemma 3.37. Let theAssumption3.27(a) hold andlet d ∈ L∞(Ω)m bearbitrary.Thentheoperator

M : v ∈ Y 7→∑

idi ·(G′i(y)v

)is anelementofL(Y,Lr) and

‖M‖Y,Lr ≤∑

icr,ri

(Ω)‖di‖L∞‖G′i(y)‖Y,Lri . (3.25)


Proof. By Assumption3.27(a) andLemmaA.4

‖Mv‖Lr = ‖∑

idi ·(G′i(y)v

)‖Lr ≤∑

i‖di‖L∞‖G′i(y)v‖Lr

≤(∑

icr,ri

(Ω)‖di‖L∞‖G′i(y)‖Y,Lri

)‖v‖Y for all v ∈ Y ,

whichshowsthat(3.25)holdsandM ∈ L(Y,Lr). utIn anext step,weshow thatthemultifunction

∂ψ(G(y)

): ω ∈ Ω 7→ ∂ψ

(G(y)(ω)

) ⊂ Rm

is measurable(seeDefinition A.7 or [129, p. 160]).

Lemma 3.38. Any closed-valued,uppersemicontinuousmultifunctionΓ : Rk ⇒Rl is Borel measurable.

Proof. Let C ⊂ Rl be compact.We show thatΓ−1(C) is closed.To this end,letxk ∈ Γ−1(C) bearbitrarywith xk → x∗. Thenthereexist zk ∈ Γ (xk) ∩ C, and,dueto thecompactnessof C, we achieve by transitionto a subsequencethatzk →z∗ ∈ C. Sincexk → x∗, uppersemicontinuityyields that thereexist zk ∈ Γ (x∗)with (zk − zk)→ 0 andthuszk → z∗. Therefore,sinceΓ (x∗) is closed,we obtainz∗ ∈ Γ (x∗) ∩ C. Hence,x∗ ∈ Γ−1(C), which provesthatΓ−1(C) is closedandthereforeaBorel set. utCorollary 3.39. Themultifunction∂ψ

(G(y)

): Ω ⇒ R is measurable.

Proof. By Lemma3.38,the compact-valuedanduppersemicontinuousmultifunc-tion ∂ψ is Borel measurable.Now, for all closedsetsC ⊂ Rm, we have, settingw = G(y) ∈∏i L

ri(Ω),

∂ψ(G(y)

)−1(C) = ω ∈ Ω : w(ω) ∈ ∂ψ−1(C).

Thissetis measurable,since∂ψ−1(C) is aBorel setandw is a (classof equivalent)measurablefunction(s). ut

Thenext resultis a directconsequenceof Lipschitzcontinuity, see[32, 2.1.2].

Lemma 3.40. UnderAssumption3.27(c) holds∂ψ(x) ⊂ [−Lψ, Lψ ]m for all x ∈Rm.

Combiningthis with Corollary3.39yields:

Lemma 3.41. Let theAssumptions3.27hold.Thenfor all y ∈ Y , theset

K(y) = d : Ω → Rm : d measurableselectionof ∂ψ(G(y)

) (3.26)

is a nonemptysubsetofLψBmL∞ ⊂ L∞(Ω)m.


Proof. By theTheoremonMeasurableSelections[129, Cor. 1C] andCorollary3.39,∂ψ(G(y)

)admitsat leastonemeasurableselectiond : Ω → Rm, i.e.,

d(ω) ∈ ∂ψ(G(y)(ω))

a.e.onΩ.

FromLemma3.40followsd ∈ LψBmL∞ . utWenow canprove:

Proposition3.42. Under theAssumptions3.27, for all y ∈ Y thegeneralizeddif-ferential∂Ψ(y) is nonemptyandboundedin L(Y,Lr).

Proof. Lemma3.41ensuresthat thereexist measurableselectionsd of ∂ψ(G(y)

)andthatall thesed arecontainedin LψBmL∞ . Hence,Lemma3.37showsthat

M : v 7→∑

idi ·(G′i(y)v

)is in L(Y,Lr). Theboundednessof ∂Ψ(y) follows from (3.25). ut

We now have everythingat handto introducea semismoothnessconceptthat isbasedon thegeneralizeddifferential∂Ψ . We postponethe investigationof furtherpropertiesof ∂Ψ to sections3.3.7and3.3.8.There,we will establishchainrules,theconvex-valuedness,weakcompact-valuedness,andtheweakgraphclosednessof∂Ψ .

3.3.3 Semismoothnessof Superposition Operators

In this section,we prove the main resultof this chapter, which assertsthat underAssumption3.27theoperatorΨ is ∂Ψ -semismooth.UnderAssumption3.30andafurther conditionwe establish∂Ψ -semismoothof order>0. For convenience,wewill usethe term semismoothnessinsteadof ∂Ψ -semismoothnessin the sequel.Therefore,applyingthegeneralDefinition3.1to thecurrentsituation,we have:

Definition 3.43. TheoperatorΨ is called(∂Ψ -) semismoothat y ∈ Y if it is con-tinuousneary and

supM∈∂Ψ(y+s)

‖Ψ(y + s)− Ψ(y)−Ms‖Lr = o(‖s‖Y ) ass→ 0 in Y . (3.27)

Ψ isα-order (∂Ψ -) semismoothaty ∈ Y , 0 < α ≤ 1, if it is continuousneary and

supM∈∂Ψ(y+s)

‖Ψ(y + s)− Ψ(y)−Ms‖Lr = O(‖s‖1+αY

)ass→ 0 in Y . (3.28)

In thefollowing maintheoremsweestablishthesemismoothnessandtheβ-ordersemismoothness,respectively, of theoperatorΨ .


Theorem3.44. UndertheAssumptions3.27, theoperatorΨ is semismoothonY .

Underslightly strongerassumption,we canalsoestablishβ-ordersemismoothnessof Ψ :

Theorem3.45. Let the Assumptions3.30 hold and let y ∈ Y . Assumethat thereexistsγ > 0 such that theset

Ωε = ω : max‖h‖1≤ε

(ρ(G(y)(ω), h

)− ε−α‖h‖1+α1

)> 0, ε > 0,

with theresidualfunctionρ : Rm × Rm → R givenby

ρ(x, h) = maxzT∈∂ψ(x+h)

|ψ(x+ h)− ψ(x) − zTh|,

hasthefollowingdecreaseproperty:

µ(Ωε) = O(εγ) asε→ 0+, (3.29)

ThentheoperatorΨ is β-ordersemismoothat y with

β = min

γν

1 + γ/q0,αγν

α+ γν

, where

q0 = min1≤i≤m

qi, ν =q0 − rq0r

if q0 <∞, ν =1r

if q0 =∞.

(3.30)

Theproofsof boththeoremswill bepresentedin section3.3.5.

Remark 3.46. Condition3.29requiresthemeasurabilityof thesetΩε, which willbeverifiedin theproof.

Remark 3.47. As wewill seein Lemma3.54,it wouldbesufficient to requireonlythe local β-orderHolder continuity of G′ in Assumption3.30 (a) with β ≤ α asdefinedin (3.30).

It might be helpful to give an explanationof the abstractcondition(3.29).Forconvenientnotation,letx = G(y)(ω). Dueto theα-ordersemismoothnessof ψ pro-videdby Assumption3.30,we have ρ

(x, h

)= O

(‖h‖1+α1

)ash → 0. In essence,

Ωε is the setof all ω ∈ Ω wherethereexistsh ∈ εBm1 for which this asymptoticbehavior is not yet observed,becausethe remaindertermρ

(x, h

)exceeds‖h‖1+α1

by a factorof at leastε−α, which grows infinitely asε → 0. FromthecontinuityoftheLebesguemeasureit is clearthatµ(Ωε) → 0 asε → 0. Thedecreasecondition(3.29)essentiallystatesthat themeasureof thesetΩε whereG(y) takes“bad val-ues”, i.e., valuesat which theradiusof small residualis very small,decreaseswiththerateεγ .

The following subsectionappliesTheorem3.44andTheorem3.45to reformu-latednonlinearcomplementarityproblems.Furthermore,it providesaveryconcreteinterpretationof condition(3.29).


Application to NCPs

Weapplythesemismoothnessresultto theoperatorΦ thatarisesin thereformulation(3.21)of nonlinearcomplementarityproblems(3.5). In this situation,Assumption3.27canbeexpressedin form of Assumption3.32.Hence,Theorem3.44becomes

Theorem3.48. UnderAssumption3.27, theoperatorΦ : Lp(Ω)→ Lr(Ω) definedin (3.21)is semismoothonLp(Ω).

Remark 3.49. Due to the structureof Φ, we have for all M ∈ ∂Φ(u) andv ∈Lp(Ω)

Mv = d1v + d2 ·(F ′(y)v

), (3.31)

whered ∈ L∞(Ω)2 is a measurableselectionof ∂φ(u,F (u)

).

Theorem3.45is applicableaswell. OncewehavechosenaparticularNCP-function,condition(3.29)canbemadeveryconcrete,sothatwe canwrite Theorem3.45in amoreelegantform. We discussthis for theFischer–Burmeisterfunctionφ = φFB ,which is Lipschitzcontinuousand1-ordersemismooth,andthussatisfiesAssump-tions3.30(c) and(d) with α = 1. Thenholds

Theorem3.50. Let theAssumptions3.33(a), (b) hold andconsidertheoperatorΦwith φ = φFB . Assumethat for u ∈ Lp(Ω) thereexistsγ > 0 such that

µ (0 < |u|+ |F (u)| < ε) = O(εγ) asε→ 0. (3.32)

ThenΦ is β-ordersemismoothat u with

β = min

γν

1 + γ/q,αγν

α+ γν

, where

q = minp, p′, ν =q − rqr

if q <∞, ν =1r

if q =∞.(3.33)

Proof. We only have to establishthe equivalenceof (3.29)and(3.32).Obviously,this followseasilywhenwehaveestablishedthefollowing relation:

0 < ‖G(u)‖1 < ε ⊂ Ωε ⊂

0 < ‖G(u)‖1 <(1 + 2−1/2

)ε

(3.34)

with G(u) =(u,F (u)

). The functionφ = φFB is C∞ on R2 \ 0, seesection

2.5.2,with derivativeφ′(x) = (1, 1)− xT /‖x‖2.

To show thefirst inclusionin (3.34),let ω besuchthatx = G(u)(ω) satisfies0 <‖x‖1 < ε. Weobservethat,for all λ ∈ R, thereholds

φ(λx) = λ(x1 + x2)− |λ|‖x‖2,

andthus,for all σ > 0,


ρ(x,−(1 + σ)x) = −σ‖x‖2 + ‖x‖2 + (1 + σ)xTx

‖x‖ = 2‖x‖2.

Hence,for thechoiceh = −tx with t ∈ (1,√

2) suchthat‖h‖1 ≤ ε, weobtain

ρ(x, h) = 2‖x‖2 ≥√

2‖x‖1 =√

2t‖h‖1 > ‖h‖1 ≥ ε−α‖h‖1+α1 .

This impliesω ∈ Ωε andthusprovesthefirst inclusion.Next, weprove thesecondinclusionin (3.34).OnR2 \ 0 thereholds

φ′′(x) =1‖x‖32

( −x22 x1x2

x1x2 −x21

).

Theeigenvaluesof φ′′(x) are0 and−‖x‖−12 . In particular, we seethat‖φ′′(x)‖2 =

‖x‖−12 explodesasx → 0. If 0 /∈ [x, x + h], thenTaylor expansionof φ(x) about

x+ h yieldswith appropriateτ ∈ [0, 1]

ρ(x, h) = |φ(x+ h)− φ(x)− φ′(x+ h)h| = 12|hTφ′′(x+ τh)h| ≤ ‖h‖22

2‖x+ τh‖2 .

Further, ρ(0, h) = 0 andρ(x, 0) = 0.Now considerany ω ∈ Ω that is not containedin the right handsideof (3.34)

andsetx = G(u)(ω). If x = 0 thencertainlyω /∈ Ωε, sincethenρ(x, ·) ≡ 0. If ontheotherhand‖x‖1 ≥

(1 + 2−1/2

)ε thenwehavefor all h ∈ εB2

1

ρ(x, h) ≤ ‖h‖222‖x+ τh‖2 ≤

‖h‖21√2‖x+ τh‖1

≤ ε−1‖h‖21 ≤ ε−α‖h‖1+α1 ,

andthusω /∈ Ωε. utRemark 3.51. Themeaningof (3.29),which wasshown to beequivalentto (3.32),canbe interpretedin the following way: Theset0 < |u|+ |F (u)| < ε on whichthedecreaseratein measureis assumedis thesetof all ω wherestrict complemen-tarity holds,but is lessthanε. In a neighborhoodof thesepoints the curvatureofφ is very large since‖φ′′(G(u)(ω))‖2 = ‖G(u)(ω)‖−1

2 is big. This requiresthat|G(u + s)(ω) − G(u)(ω)| mustbevery small in orderto have a sufficiently smallresidualρ

(G(u)(ω), G(u+ s)(ω)−G(u)(ω)

).

We stressthata violation of strict complementarity, i.e.,u(ω) = F (u)(ω) = 0doesnot causeany problemssincethenρ(G(u)(ω), ·) ≡ ρ(0, ·) ≡ 0.

3.3.4 Illustrations

In this sectionwe give two examplesto illustratetheaboveanalysisby pointingoutthenecessityof themainassumptionsandby showing thatthederivedresultscannotbeimprovedin severalrespects:


• Example3.52shows the necessityof the norm gapbetweentheLqi - andLr-norms.

• Example3.53discussesthesharpnessof ourorderof semismoothnessβ in The-orem3.44for varyingvaluesof γ.

In order to prevent our examplesfrom being too academical,we will not workwith the simplestchoicespossible.Rather, we will throughoutusereformulationsof NCPsbasedon theFischer–Burmeisterfunction.

In theproofsof Theorem3.44andTheorem3.45,morepreciselyin thederivationof (3.41)and(3.42),we needthe gapbetweentheLqi - andLr-normsin order toapplyHolder’s inequality. The following exampleillustratesthat both theoremsdoin generalnot hold if we drop the conditionri < qi in the Assumptions3.27and3.30.

Example3.52 (Necessityof the Lqi-Lr-norm gap). We considerthe operatorΦ arising in semismoothreformulationsof the NCP by meansof the Fischer–Burmeisterfunction.Theorem3.48ensuresthat,underAssumption3.32,Φ is semis-mooth.Our aim hereis to show that the requirementr < q = minp, p′ is indis-pensablein thesensethatin general(3.27)(with Ψ ≡ Φ) is violatedfor r ≥ q.

In section3.2wedevelopedandanalyzedsemismoothNewtonmethods.A cen-tral requirementfor superlinearconvergenceis thesemismoothnessof theunderly-ing operatorat thesolution.Hence,we will constructa simpleNCPwith a uniquesolutionfor which (3.27)fails to holdwheneverr ≥ q.

Let 1 < p ≤ ∞ bearbitrary, chooseΩ = (0, 1), andset

F (u)(ω) = u(ω) + ω.

Obviously, u ≡ 0 is the uniquesolutionof theNCP. Choosingp′ = p, φ = φFB ,andα = 1, theAssumptions3.27and3.30aresatisfiedfor all r ∈ [1, p). To showthattherequirementr < p is reallynecessaryto obtainthesemismoothnessof Φ wewill investigatetheresidual

R(s) def= Φ(u+ s)− Φ(u)−Ms, M ∈ ∂Φ(u+ s), (3.35)

at u ≡ 0 with s ∈ L∞(Ω), s ≥ 0, s 6= 0. Our aim is to show thatfor all r ∈ [1,∞]holds

‖R(s)‖Lr = o(‖s‖Lp) ass→ 0 in L∞ =⇒ r < p. (3.36)

Settingσ = s(ω), wehave for all ω ∈ (0, 1)

(Ms)(ω) = d1(ω)s(ω) + d2(ω)(F ′(0)s)(ω) = d1(ω)σ + d2(ω)σ with

d(ω) ∈ ∂φ(s(ω), F (s)(ω))

= ∂φ(σ, σ + ω) = φ′(σ, σ + ω),

wherewe have usedσ + ω > 0 andthatφ is smoothat x 6= 0. Hence,with e =(1, 1)T , notingthatthelinearpartof φ cancelsin R(s)(ω), wederive


R(s)(ω) = φ(σ, σ + ω)− φ(0, ω)− φ′(σ, σ + ω)σe

= −‖(σ, σ + ω)‖2 + ‖(0, ω)‖2 +σ(σ, σ + ω)e‖(σ, σ + ω)‖2

= ω − σ2 + (σ + ω)2 − σ(2σ + ω)‖(σ, σ + ω)‖2 = ω − ω(σ + ω)

(2σ2 + 2σω + ω2)1/2.

Now let0 < ε < 1. For thespecialchoicesεdef= ε1(0,ε), i.e.,sε(ω) = ε for ω ∈ (0, ε)

andsε(ω) = 0, otherwise,weobtain

‖sε‖Lp = εp+1

p (1 < p <∞), ‖sε‖L∞ = ε.

In particular, sε → 0 in L∞ asε→ 0. For 0 < ω < ε holds

|R(sε)(ω)| ≥ ω(

1− sup0<t<1

1 + t√2 + 2t+ t2

)=

5− 2√

55

ω ≥ ω

10.

Hence,‖R(sε)‖L∞ ≥ ε

10≥ ‖sε‖Lp

10, andfor all r ∈ [p,∞)

‖R(sε)‖Lr ≥ 110

(∫ ε

0

ωrdω

) 1r

=ε

r+1r

10(r + 1)1r

≥ ‖sε‖Lp

10(r+ 1)1r

.

Therefore,(3.36) is proven.This shows that in (3.27)thenorm on the left mustbestrongerthanon theright.

Next, we show that, at leastin the caseq0 ≤ (1 + α)r, the order of our semis-moothnessresultis sharp.By showing this for varyingvaluesof γ, we alsoobservethat decreasingvaluesof γ reducethe maximumorderof semismoothnessexactlyasstatedin Theorem3.44.Hence,our resultdoesnot overestimatetherole of γ.

Example3.53 (Order of semismoothnessand its dependenceon γ). Weconsiderthefollowing NCP, which generalizestheonein Example3.52:Let 1 < p ≤ ∞ bearbitrary, setΩ = (0, 1), andchoose

F (y)(ω) = u(ω) + ωθ, θ > 0.

Obviously, u ≡ 0 is the uniquesolutionof theNCP. Choosingp′ = p, φ = φFB ,andα = 1, Assumption3.30is satisfiedfor all r ∈ [1, p).

FromF (u)(ω) = (0, ωθ) follows thatγ = 1/θ is themaximumvaluefor whichcondition(3.32),andthustheequivalentcondition(3.29),is satisfied.

With theresidualR(s) asdefinedin (3.35)weobtain

|R(s)(ω)| = ωθ − ωθ(s(ω) + ωθ)√2s(ω)2 + 2s(ω)ωθ + ω2θ

.

For ε ∈ (0, 1) andsεdef= εθ1(0,ε) we have


‖sε‖Lp = εpθ+1

p (1 < p <∞), ‖sε‖L∞ = εθ.

Further, for 0 < ω < ε holds

|R(sε)(ω)| ≥ ωθ(

1− sup0<t<1

1 + t√2 + 2t+ t2

)=

5− 2√

55

ωθ ≥ ωθ

10.

Hence,for all r ∈ [1, p)

‖R(sε)‖Lr ≥ 110

(∫ ε

0

ωrθdω

) 1r

=ε

rθ+1r

10(rθ+ 1)1r

≥ ‖sε‖prθ+pprθ+r

Lp

10(rθ+ 1)1r

=‖sε‖

1+ γν1+γ/q0

Lp

10(rθ+ 1)1r

with q0 = p′ = p, γ = 1/θ andν as in (3.30). This shows that the valueof βgiven in Theorem3.44 is sharpfor all valuesof θ (andthusγ) at leastaslong asq0 ≤ (1 + α)r, which in thecurrentsettingcanbewrittenasp ≤ (1 + α)r.

Wethink thatin thecaseq0 > (1+α)r ourvalueof β couldstill beslightly improvedby splittingΩ in morethanthetwo partsΩβε andΩc

βε by choosingdifferentvaluesεk for ε thatcorrespondto differentpowersof ‖v‖ΠiLqi . In orderto keeptheanalysisasclearaspossible,wedo notpursuethis ideaany furtherin thecurrentwork.

3.3.5 Proof of the Main Theorems

Wecansimplify theanalysisby exploiting thefollowing fact.

Lemma 3.54. Let theAssumptions3.27holdandsupposethat theoperator

Λ : u ∈∏

iLqi(Ω) 7→ ψ(u) ∈ Lr(Ω)

is semismooth.Thentheoperator Ψ : Y → Lr(Ω) definedin (3.20) is alsosemis-mooth.Further, if theAssumptions3.30holdandΛ isα-ordersemismooththenΨ isα-order semismooth.

Proof. Wefirst observethat,givenanyM ∈ ∂Ψ(y+s), thereisMΛ ∈ ∂Λ(G(y+

s))

suchthatM = MΛG′(y + s). In fact, thereexistsa measurableselectiond ∈

L∞(Ω)m of ∂ψ(ω) suchthatM =∑i di · G′i(y + s), andobviouslyMΛ : v 7→∑

i divi yields an elementof ∂Λ(G(y + s)

)with the desiredproperty. A more

generalchainrule will beestablishedin Theorem3.64.Settingg = G(y), v = G(y + s)−G(y), andw = G(y + s), wehave

supM∈∂Ψ(y+s)

‖Ψ(y + s)− Ψ(y)−Ms‖Lr

≤ supMΛ∈∂Λ(w)

‖Λ(w)− Λ(g)−MΛG′(y + s)s‖Lr


≤ supMΛ∈∂Λ(w)

‖Λ(w)− Λ(g)−MΛv‖Lr

+ supMΛ∈∂Λ(w)

‖MΛ

(G(y + s)−G(y)−G′(y + s)s

)‖Lr

def= ρΛ + ρMG.

By thelocalLipschitzcontinuityof G andthesemismoothnessof Λ, we obtain

ρΛ = o(‖v‖ΠiLqi ) = o(‖s‖Y ) ass→ 0 in Y .

Further, sinced ∈ LψBmL∞ by Lemma3.41,wehaveby Assumption3.27(a)

‖ρMG‖Lr ≤ Lψ∑

i‖Gi(y + s)−Gi(y)−G′i(y + s)s‖Lr

≤ Lψ∑

icr,ri

(Ω)‖Gi(y + s)−Gi(y)−G′i(y + s)s‖Lri

= o(‖s‖Y ) ass→ 0 in Y .

Thisprovesthefirst result.Now let theAssumptions3.30holdandΛ beα-ordersemismooth.ThenρΛ and

ρMG arebothof theorderO(‖s‖1+αY

), which impliesthesecondassertion. ut

For the proof of Theorems3.44and3.45we need,asa technicalintermediateresult,theBorel measurabilityof thefunction

ρ : Rm × Rm → R, ρ(x, h) = maxzT∈∂ψ(x+h)

|ψ(x+ h)− ψ(x)− zTh|. (3.37)

Weprovethisby showing thatρ is uppersemicontinuous.Readersfamiliarwith thistypeof resultsmightwantto skip theproof of Lemma3.55.

Recallthata functionf : Rl → R is uppersemicontinuousatx if

lim supx′→x

f(x′) ≤ f(x).

Equivalently, f is uppersemicontinuousif andonly if x : f(x) ≥ a is closedforall a ∈ R.

Lemma 3.55. Let f : (x, z) ∈ Rl × Rm 7→ R beuppersemicontinuous.Moreover,let themultifunctionΓ : Rl ⇒ Rm beuppersemicontinuousandcompact-valued.Thenthefunction

g : Rl → R, g(x) = maxz∈Γ (x)

f(x, z),

is well-definedanduppersemicontinuous.

Proof. For x ∈ Rl, let (zk) ⊂ Γ (x) besuchthat

limk→∞

f(x, zk) = supz∈Γ (x)

f(x, z).


SinceΓ (x) is compact,we mayassumethatzk → z∗(x) ∈ Γ (x). Now, by uppersemicontinuityof f ,

f(x, z∗(x)

) ≥ lim supk→∞

f(x, zk) = supz∈Γ (x)

f(x, z) ≥ f(x, z∗(x)).Thus,g is well-definedandthereexistsz∗ : Rl → Rm with g(x) = f

(x, z∗(x)

).

We now prove the uppersemicontinuityof g at x. Let (xk) ⊂ Rl tendto x insucha way that

limk→∞

g(xk) = lim supx′→x

g(x′),

andsetzk = z∗(xk) ∈ Γ (xk). By theuppersemicontinuityof Γ thereexists(zk) ⊂Γ (x) with (zk − zk)→ 0 ask →∞.

SinceΓ (x) is compact,a subsequencecanbe selectedsuchthat the sequence(zk), andthus(zk), convergesto somez ∈ Γ (x). Now, usingthatf is uppersemi-continuousandz ∈ Γ (x),

lim supx′→x

g(x′) = limk→∞

g(xk) = limk→∞

f(xk, zk)

= lim supk→∞

f(xk, zk) ≤ f(x, z) ≤ g(x).

Therefore,g is uppersemicontinuousatx. utLemma 3.56. Letψ : Rm → R belocally Lipschitz continuous.Thenthe functionρ definedin (3.37)is well-definedanduppersemicontinuous.

Proof. Since∂ψ is uppersemicontinuousandcompact-valued,themultifunction

(x, h) ∈ Rm × Rm 7→ ∂ψ(x + h)

is uppersemicontinuousandcompact-valuedaswell. Further, themapping

(x, h, z) 7→ |ψ(x + h)− ψ(x)− zTh|is continuous,andwemayapplyLemma3.55,which yieldstheassertion. ut

Proofof Theorem3.44

By Lemma3.54,it sufficesto prove thesemismoothnessof theoperator

Λ : u ∈∏

iLqi(Ω) 7→ ψ(u) ∈ Lr(Ω). (3.38)

In Lemma3.56we showedthatthefunction

ρ : Rm × Rm → R, ρ(x, h) = maxzT∈∂ψ(x+h)

|ψ(x+ h)− ψ(x) − zTh|,

is uppersemicontinuousandthusBorel measurable.Hence,for u, v ∈ ∏i Lri(Ω),

thefunctionρ(u, v) is measurable.Wedefinethemeasurablefunction


a =ρ(u, v)

‖v‖1 + 1v=0.

Sinceρ(u(ω), v(ω)

)= 0 wheneverv(ω) = 0, weobtain

ρ(u, v) = a‖v‖1.Furthermore,

a(ω) =ρ(u(ω), v(ω)

)‖v(ω)‖1 + 1v=0(ω)

=o(‖v(ω)‖1

)‖v(ω)‖1 + 1v=0(ω)

→ 0 asv(ω)→ 0.

(3.39)Dueto theLipschitzcontinuityof ψ, wehave

ρ(x, h) ≤ 2Lψ‖h‖1, (3.40)

which impliesa ∈ 2LψBL∞ .Now let (vk) tendto zeroin the space

∏i L

qi(Ω) andsetak = a|v=vk. Then

everysubsequenceof (vk) containsitself asubsequence(vk′) suchthatvk′ → 0 a.e.onΩ. By (3.39),this impliesak′ → 0 a.e.onΩ. Since(ak′) is boundedin L∞(Ω),weconclude

limk′→∞

‖ak′‖Lt = 0 for all t ∈ [1,∞).

Hence,in Lt(Ω), 1 ≤ t < ∞, zerois anaccumulationpoint of every subsequenceof (ak). Thisprovesak → 0 in all spacesLt(Ω), 1 ≤ t <∞.

Sincethesequence(vk), vk → 0, wasarbitrary, we thushaveproventhatfor all1 ≤ t <∞ holds

‖a‖Lt → 0 as‖v‖ΠiLqi → 0.

Now wecanuseHolder’s inequalityto obtain

‖ρ(u, v)‖Lr(Ω) ≤∑

i‖avi‖Lr ≤

∑i‖a‖Lpi‖vi‖Lqi

≤ ( max1≤i≤m

‖a‖Lpi

)‖v‖ΠiLqi

= o(‖v‖ΠiLqi

)as‖v‖ΠiLqi → 0,

(3.41)

wherepi = qirqi−r if qi < ∞ andpi = r if qi = ∞. Notethatherewe exploitedthe

factthatr < qi. Thisprovesthesemismoothnessof Λ. ut

Proofof Theorem3.45

Also here,by Lemma3.54,it sufficesto prove theβ-ordersemismoothnessof theoperatorΛ definedin (3.38).

We now supposethat the Assumption3.30and,in addition,(3.29)hold. First,notethatfor fixedε > 0 thefunction

(x, h) ∈ Rm × Rm 7→ ρ(x, h)− ε−α‖h‖1+α1


is uppersemicontinuousandthatthemultifunction

x ∈ Rm 7→ εBm1

is compact-valuedanduppersemicontinuous.Hence,by Lemma3.55,thefunction

x ∈ Rm 7→ max‖h‖1≤ε

(ρ(x, h)− ε−α‖h‖1+α1

)is uppersemicontinuousandthereforeBorel measurable.This provesthemeasura-bility of the setΩε appearingin (3.29).For ε > 0 and0 < β ≤ α we definetheset

Ωβε = ω : ρ(u(ω), v(ω)

)> ε−β‖v(ω)‖1+β1 ,

andobservethatΩβε ⊂ Ωε ∪ ‖v‖1 > ε def= Ωε ∪Ω′

ε.

In fact,letω ∈ Ωβε bearbitrary. Thenontrivial caseis ‖v(ω)‖1 ≤ ε. Wethenobtainfor h = v(ω)

ρ(u(ω), h

)> ε−β‖h‖1+β1 = ε−αεα−β‖h‖1+β1

≥ ε−α‖h‖α−β1 ‖h‖1+β1 = ε−α‖h‖1+α1 ,

andthus,since‖h‖1 ≤ ε,

max‖h‖1≤ε

(ρ(u(ω), h

)− ε−α‖h‖1+α1

)> 0,

showing thatω ∈ Ωε.In thecaseq0 = min

1≤i≤mqi <∞ we derive theestimate

µ(Ω′ε) = µ (‖v‖1 > ε) ≤ ‖ε−1‖v‖1‖q0Lq0 (Ω′ε)

≤ ε−q0(

maxicq0,qi

(Ω′ε))q0 ‖v‖q0ΠiLqi = ε−q0O

(‖v‖q0ΠiLqi

).

If wechooseε = ‖v‖λΠiLqi , 0 < λ < 1, then

µ(Ωβε) ≤ µ(Ωε) + µ(Ω′ε) = O

(‖v‖γλΠiLqi

)+ O

(‖v‖(1−λ)q0

ΠiLqi

).

This estimateis also true in the caseq0 = ∞ sincethenµ(Ω′ε) = 0 assoonas

‖v‖ΠiLqi < 1. Thiscanbeseenby notingthatthenfor a.a.ω ∈ Ω holds

‖v(ω)‖1 ≤ ‖‖v‖1‖L∞ ≤ ‖v‖ΠiLqi ≤ ‖v‖λΠiLqi = ε.

Introducingν = q0−rq0r

if q0 < ∞ andν = 1/r, otherwise,for all 0 < β ≤ α, weobtain,using(3.40)andLemmaA.4


‖ρ(u, v)‖Lr(Ωβε) ≤ ‖2Lψ‖v‖1‖Lr(Ωβε) ≤ 2Lψcr,q0(Ωβε)‖v‖Lq0 (Ωβε)m

≤ 2Lψµ(Ωβε)ν‖v‖Lq0 (Ωβε)m

= O(‖v‖1+γλνΠiLqi

)+ O

(‖v‖1+(1−λ)νq0

ΠiLqi

).

(3.42)

Again, we have usedherethe fact that r < q0 ≤ qi, which allowed us to takeadvantageof thesmallnessof thesetΩβε.

Finally, onΩcβε, (1+β)r ≤ q0, 0 < β ≤ α, holdswith ourchoiceε = ‖v‖λΠiLqi

‖ρ(u, v)‖Lr(Ωcβε) ≤ ‖ε−β‖v‖1+β1 ‖Lr(Ωc

βε) ≤ cr, q01+β

(Ωcβε)‖v‖−βλΠiLqi ‖v‖1+βLq0 (Ωc

βε)m

= O(‖v‖1+β(1−λ)

ΠiLqi

).

Therefore,

‖ρ(u, v)‖Lr = O(‖v‖1+γλνΠiLqi

)+O

(‖v‖1+(1−λ)νq0

ΠiLqi

)+O

(‖v‖1+β(1−λ)

ΠiLqi

).

We now choose0 < λ < 1 andβ > 0 with β ≤ α, (1 + β)r ≤ q0 in sucha waythat the orderof the right handsideis maximized.In the case(1 + α)r ≥ q0 theminimumof all threeexponentsis maximizedfor thechoiceβ = q0−r

r= νq0 and

λ = q0γ+q0

. Thenall threeexponentsareequalto 1 + γνq0γ+q0

andthus

‖ρ(u, v)‖Lr = O

(‖v‖1+

γνq0γ+q0

ΠiLqi

). (3.43)

If, on the otherhand,(1 + α)r < q0 then the third exponentis smallerthan thesecondonefor all 0 < λ < 1 and0 < β ≤ α. Further, it is not difficult to seethatundertheseconstraintsthefirst andthird exponentbecomemaximalfor β = α andλ = α

α+γνandattainthevalue1 + αγν

α+γν. Hence,

‖ρ(u, v)‖Lr = O(‖v‖1+

αγνα+γν

ΠiLqi

). (3.44)

Combining(3.43)and(3.44)provestheβ-ordersemismoothnessof Λ with β asin(3.30). ut

3.3.6 SemismoothNewtonMethods

Thedevelopedsemismoothnessresultscanbeusedto deriveasuperlinearlyconver-gentNewton-typemethodsfor thesolutionof thenonsmoothoperatorequation

Ψ(y) = 0 (3.45)

with Ψ asdefinedin (3.20).In fact,any of thethreevariantsof Newtonmethodsthatwe developedandanalyzedin section3.2.3canbeapplied.We just have to chooseZ = Lr(Ω), f ≡ Ψ , and∂∗f ≡ ∂Ψ . With thesesettings,the Algorithms 3.9,3.13,and3.17areapplicableto (3.45)andtheirconvergencepropertiesarestatedin


theTheorems3.12,3.15,and3.19,respectively. Thesemismoothnessrequirementson Ψ are ensuredby Theorems3.44 and 3.45 underAssumptions3.27 and3.30,respectively.

For illustration,we restatethe mostgeneralof thesemethods,Algorithm 3.17,when appliedto reformulations(3.21) of the NCP (3.5). We also recall the localconvergencepropertiesof the resultingmethod.The resultsequallywell hold forbilaterally constrainedproblems;the only differenceis that the reformulationthenrequiresanMCP-functioninsteadof anNCP-function.

For the reformulationof theNCP we work with anNCP-functionφ which, to-getherwith the operatorF , satisfiesAssumption3.32.Further, we assumethat wearegivenanadmissibleset

K = u ∈ Lp(Ω) : aK ≤ u ≤ bK onΩ,

which containsthesolutionu ∈ Lp(Ω), andin which all iteratesgeneratedby thealgorithmshouldstay. Therequirementson theboundsaK andbK are:ThereexistmeasurablesetsΩK

a , ΩKb ⊂ Ω suchthat:

aK = −∞ on Ω \ΩKa , bK = +∞ on Ω \ΩK

b ,

aK|ΩKa ∈ Lp(ΩKa ), bK|ΩKb ∈ Lp(ΩK

b ).(3.46)

Naturalchoicesfor K areK = Lp(Ω) orK = B = u ∈ Lp(Ω) : u ≥ 0.WedefinetheprojectionPK : Lp(Ω)→ K,

PK(u) = P[aK(ω),bK(ω)](u) = maxaK(ω),minu(ω), bK(ω),

which is easily seento assignto eachu ∈ Lp(Ω) a function PK(u) ∈ K thatis nearestto u in Lp (for p < ∞, PK(u) is the uniquemetric projection).Since|PK(u)− PK(v)| ≤ |u− v| pointwiseonΩ, weseethat

‖PK(u)− PK(v)‖Lp ≤ ‖u− v‖Lp for all u, v ∈ Lp(Ω).

In particular, sinceu ∈ K, weseethat

‖PK(u)− u‖Lp ≤ ‖u− u‖Lp for all u ∈ Lp(Ω).

Therefore,K andPK satisfytheAssumptions3.16.In section3.2.3wedevelopedNewton-likemethodsthatareformulatedin atwo-

normframework by incorporatinganadditionalspaceY0 with Y → Y0. However, sofar a rigorousjustificationfor thenecessityof two-normtechniquesis still missing.Wearenow in a positionto give this justification.

In thecurrentsetting,wehaveY = Lp(Ω), and,aswewill see,it is appropriateto chooseY0 = Lr(Ω). Algorithm 3.17thenbecomes:


Algorithm 3.57 (ProjectedInexact Newton’sMethod for NCP).

0. Chooseaninitial pointu0 ∈ K andsetk = 0.

1. Choosean invertibleoperatorBk ∈ L(Lr(Ω), Lr(Ω)), computesk ∈ Lr(Ω)from

Bksk = −Φ(uk),

andsetu0k+1 = uk + sk.

2. Performasmoothingstep:u0k+1 ∈ Lr(Ω) 7→ u1

k+1 = Sk(u0k+1) ∈ Lp(Ω).

3. ProjectontoK: uk+1 = PK(u1k+1).

4. If uk+1 = uk, thenSTOPwith resultu∗ = uk+1.


Todiscusstheroleof thetwo-normtechniqueandthesmoothingstep,it is conve-nientto considerthespecialcaseof thesemismoothNewtonmethodwith smoothingstepasdescribedin Algorithm 3.9,which is obtainedby choosingK = Lp(Ω) andBk = Mk ∈ ∂Φ(uk).

For well-definednessof the method,it is reasonableto requirethat theNewtonequationMksk = −Φ(uk) in step1 alwayspossessesa uniquesolution.Further, intheconvergenceanalysisanestimateis neededthatboundsthenormof sk in termsof ‖Φ(uk)‖Lr . It turnsout that theLp-norm is too strongfor this purpose.In fact,recallthateveryoperatorM ∈ ∂Φ(u) assumestheform

M = d1 · I + d2 · F ′(u),

with d ∈ L∞(Ω)2, d(ω) ∈ ∂φ(u(ω), F (u)(ω)). Now define

Ω1 = ω ∈ Ω : d2(ω) = 0.

Thenfor all ω ∈ Ω1 holds

(Mv)(ω) = d1(ω)v(ω).

ThisshowsthatMv is in generalnot moreregular(in theLq-sense)thanv andviceversa.Therefore,it is not appropriateto assumethatM ∈ ∂Φ(u) is continuouslyinvertible in L(Lp, Lr), as the norm on Lp is strongerthanon Lr. However, it isreasonableto assumethatM is an Lr-homeomorphism.This leadsto regularityconditionsof the form statedin Assumption3.11 (i) or in Assumption3.20 withY0 = Lr(Ω).

As a consequence,in the convergenceanalysiswe only have availablethe uni-form boundednessof ‖M−1

k ‖Z,Y0 , and this makesa smoothingstepnecessary, ascanbe seenfrom the following chainof implicationsthat we usedin the proof ofTheorem3.12(andits generalizations):

Mksk = −Φ(uk), Φ(u) = 0, vk = uk − u, v0k = u0

k − u=⇒Mkv

0k+1 = −(Φ(u+ vk)− Φ(u)−Mkvk

)


=⇒ ‖Mkv0k+1‖Lr = o(‖vk‖Lp) (semismoothness)

=⇒ ‖v0k+1‖Lr ≤ ‖M−1

k ‖Lr,Lr‖Mkv0k+1‖Lr = o(‖vk‖Lp) (regularity)

=⇒ ‖vk+1‖Lp = ‖Sk(u0k+1)− u‖Lp

= O(‖v0k+1‖Lr) = o(‖vk‖Lp) (smoothingstep).

Therefore,we seethat the two-normframework of our abstractanalysisin section3.2.3is fully justified.

Adaptedto thecurrentsetting,theAssumptions3.14and3.11requiredto applyTheorem3.19now readasfollows:

Assumption 3.58 (Dennis-More condition for Bk).

(i) Thereexist operatorsMk ∈ ∂Φ(uk + sk) suchthat

‖(Bk −Mk)sk‖Lr = o(‖sk‖Lr ) as‖sk‖Lp → 0, (3.47)

wheresk ∈ Lr(Ω) is thestepcomputedin step1.

(ii) Condition(i) holdswith (3.47)replacedby

‖(Bk −Mk)sk‖Lr = O(‖sk‖1+αLr ) as‖sk‖Lp → 0.

Assumption 3.59.

(i) (Regularity condition)Oneof thefollowing conditionsholds:

(a) The operatorsMk mapLr(Ω) continuouslyinto itself with boundedin-verses,andthereexistsaconstantCM−1 > 0 suchthat

‖M−1k ‖Lr,Lr ≤ CM−1 .

(b) Thereexist constantsη > 0 andCM−1 > 0 suchthat, for all u ∈ (u +ηBLp) ∩K, everyM ∈ ∂Φ(u) is aninvertibleelementof L(Lr, Lr) with‖M−1‖Lr,Lr ≤ CM−1 .

(ii) (Smoothingcondition)Thesmoothingstepsin step1 satisfy

‖Sk(u0k+1)− u‖Lp ≤ CS‖u0

k+1 − u‖Lr

for all k, whereu ∈ K solves(3.1).

Remark 3.60. In section4.3weestablishsufficientconditionsfor regularitythatarewidely applicableandeasyto apply.

Remark 3.61. In section4.1 we discusshow smoothingstepscan be computed.Further, in section4.2 we proposea choicefor φ which allows to get rid of thesmoothingstep.

SinceΦ is semismoothby Theorem3.44andlocally Lipschitzcontinuousby Propo-sition 3.31,we canapplyingTheorem3.19 to the currentsituationandobtain thefollowing local convergenceresult:


Theorem3.62. Denoteby u ∈ K a solutionof (3.1). Further, let theAssumptions3.32, 3.58(i), and3.59hold.Then:

(i) Thereexistsδ > 0 such that,for all u0 ∈ (u+δBLp)∩K, Algorithm3.13 eitherterminateswith uk = u or generatesa sequence(uk) ⊂ K that convergesq-superlinearlyto u in Lp(Ω).

(ii) If in (i) themappingΦ is α-order semismoothat u, 0 < α ≤ 1, andif Assump-tion 3.58(ii) is satisfied,thentheq-orderof convergenceis at least1 + α.

3.3.7 SemismoothCompositeOperators and Chain Rules

Thissectionconsidersthesemismoothnessof compositeoperators.Thereis acertainoverlapwith the resultof the abstractProposition3.7, but we think it is helpful tostudythepropertiesof thegeneralizeddifferential∂Φ in somemoredetail.

WeconsiderthescenariowhereG = H1 H2 is a compositionof theoperators

H1 : X 7→∏

iLri(Ω), H2 : Y 7→ X,

with X aBanachspace,andwhereψ = ψ1 ψ2 is a compositionof thefunctions

ψ1 : Rl → R, ψ2 : Rm → Rl.

We imposeassumptionson ψ1, ψ2, H1, andH2 to ensurethat G andψ satisfyAssumption3.27.Hereis oneway to do this:

Assumption 3.63. Thereare1 ≤ r ≤ ri < qi ≤ ∞, 1 ≤ i ≤ m, suchthat

(a) The operatorsH1 : X → ∏i L

ri(Ω) andH2 : Y → X are continuouslyFrechetdifferentiable.

(b) TheoperatorH1 mapsX locally LipschitzcontinuouslyintoLqi(Ω).(c) Thefunctionsψ1 andψ2 areLipschitzcontinuous.

(d) ψ1 andψ2 aresemismooth.

It is straightforward to strengthentheseassumptionssuchthat they imply theAssumptions3.30.For brevity, wewill notdiscusstheextensionof thenext theoremto semismoothnessof orderβ, which is easilyestablishedby slight modificationsoftheassumptionsandtheproofs.

Theorem3.64. Let the Assumptions3.63 hold and let G = H1 H2 and ψ =ψ1 ψ2. Then

(i) G andψ satisfytheAssumptions3.27.

(ii) Ψ asdefinedin (3.20)is semismooth.

(iii) The operator Ψ1 : z ∈ X 7→ ψ(H1(z)

) ∈ Lr(Ω) is semismoothand thefollowingchainrule holds:

∂Ψ(y) = ∂Ψ1

(H2(y)

)H ′

2(y) = M1H′2(y) : M1 ∈ ∂Ψ1

(H2(y)

).


(iv) If l = 1 andψ1 is strictly differentiable[32, p. 30] thentheoperator Ψ2 : y ∈Y 7→ ψ2

(G(y)

) ∈ Lr(Ω) is semismoothandthefollowingchainrule holds:

∂Ψ(y) = ψ′1(Ψ2(y)

)∂Ψ2(y) = ψ′1

(Ψ2(y)

) ·M2 : M2 ∈ ∂Ψ2(y).

Proof. (i): 3.63 (a) implies 3.27 (a), 3.27 (b) follows from 3.63 (a),(b), 3.63 (c)implies3.27(c),and3.27(d) holdsby 3.63(d),sincethecompositionof semismoothfunctionsis semismooth.

(ii): By (i), wecanapplyTheorem3.44.(iii): TheAssumptions3.63imply theAssumptions3.27with H1 andX instead

of G andY . Hence,Ψ1 is semismoothby Theorem3.44.For theproof of the“⊂” partof thechainrule, letM ∈ ∂Ψ(y) bearbitrary. By

definition,thereexistsameasurableselectiond of ∂ψ(G(y)

)suchthat

M =∑

idi ·G′i(y).

Now, sinceG′i(y) = H ′1i

(H2(y)

)H ′

2(y),

M =∑

idi ·H ′

1i

(H2(y)

)H ′

2(y) = M1H′2(y), where

M1 =∑

idi ·H ′

1i

(H2(y)

). (3.48)

Obviously, wehaveM1 ∈ ∂Ψ1

(H2(y)

).

To prove the reverseinclusion,notethat any M1 ∈ ∂Ψ1

(H2(y)

)assumesthe

form (3.48)with appropriatemeasurableselectiond ∈ ∂ψ(G(y)). Then

M1H′2(y) =

∑idi ·(H ′

1i

(H2(y)

)H ′

2(y))

=∑

idi ·G′i(y),

whichshowsM1H′2(y) ∈ ∂Ψ(y).

(iv): Certainly, G andψ2 satisfytheAssumptions3.27(with ψ replacedby ψ2).Hence,Theorem3.44yields thesemismoothnessof Ψ2. We proceedby noting thata.e.onΩ holds

ψ′1(Ψ2(y)(ω)

)∂ψ2

(G(y)(ω)

)= ∂ψ

(G(y)(ω)

), (3.49)

wherewehaveappliedthechainrule for generalizedgradients[32, Thm.2.3.9]andtheidentity∂ψ1 = ψ′1, see[32, Prop.2.2.4].

We first prove the“⊃” directionof thechainrule.LetM2 ∈ ∂Ψ2 bearbitrary.It assumestheform

M2 =∑

idi ·G′i(y),

whered ∈ L∞(Ω)m is ameasurableselectionof ∂ψ2

(G(y)

). Now for any operator

M containedin theright handsideof theassertionwehavewith ddef= ψ′1

(Ψ2(y)

)d

M = ψ′1(Ψ2(y)

) ·M2 =∑

idi ·G′i(y).


Obviously, d ∈ L∞(Ω)m and,by (3.49),d is a measurableselectionof ∂ψ(G(y)

).

Hence,M ∈ ∂Ψ(y).Conversely, to prove “⊂”, let M ∈ ∂Ψ(y) be arbitrary and denoteby d ∈

L∞(Ω)m the correspondingmeasurableselectionof ∂ψ(G(y)

). Now let d ∈

L∞(Ω)m beameasurableselectionof ∂ψ2

(G(y)

)anddefined ∈ L∞(Ω)m by

d(ω) = d(ω) onΩ0 = ω : ψ′1(Ψ2(y)(ω)

)= 0,

d(ω) =d(ω)

ψ′1(Ψ2(y)(ω)

) onΩ \Ω0.

Thend is measurableandd = ψ′1(Ψ2(y)

)d. Further, d(ω) = d(ω) ∈ ∂ψ2

(G(y)

)on

Ω0 and,using(3.49),

d(ω) =d(ω)

ψ′1(Ψ2(y)(ω)

) ∈ ψ′1(Ψ2(y)(ω))∂ψ2

(G(y)

)ψ′1(Ψ2(y)(ω)

) = ∂ψ2

(G(y)

)onΩ \Ω0.

Thus, d is a measurableselectionof ∂ψ2

(G(y)

), and consequentlyalso d ∈

L∞(Ω)m dueto theLipschitzcontinuityof ψ2. Therefore,

M2 =∑

idi ·G′i(y) ∈ ∂Ψ2(y)

andthusM ∈ ψ′1(Ψ2(y)

) · ∂Ψ2(y) asasserted. ut

3.3.8 Further Propertiesof the GeneralizedDiffer ential

We now establishthatour generalizeddifferentialis convex-valued,weakcompact-valuedandweaklygraphclosed.Thesepropertiescanprovide a basisfor futurere-searchontheconnectionsbetween∂Ψ andothergeneralizeddifferentials,in partic-ular theThibaultgeneralizeddifferential[135] andtheIoffe–Ralphgeneralizeddif-ferential[84, 123]. As weaktopologyonL(Y,Lr) we usetheweakoperatortopol-ogy, which is definedby theseminormsM 7→ |〈w,Mv〉Ω |, v ∈ Y , w ∈ Lr′(Ω),thedualspaceof Lr(Ω).

Thefollowing resultwill beof importance.

Lemma 3.65. UnderAssumption3.27, thesetK(y) definedin (3.26)is convex andweak∗ sequentiallycompactin L∞(Ω)m for all y ∈ Y .

Proof. FromLemma3.41weknow thatK(y) ⊂ LψBmL∞ isnonemptyandbounded.Further, theconvexity of ∂ψ(x) impliestheconvexity of K(y). Now let sk ∈ K(y)tendto s in L2(Ω)m. Thenfor a subsequenceholdssk′(ω)→ s(ω) for a.a.ω ∈ Ω.Since∂ψ

(u(ω)

)is compact,thisimpliesthatfor a.a.ω ∈ Ω holdss(ω) ∈ ∂ψ(u(ω)

)andthuss ∈ K(y).Hence,K(y) isabounded,closed,andconvex subsetofL2(Ω)m

andthereforeweaksequentiallycompactin L2(Ω)m. Therefore,K(y) is alsoweak∗

sequentiallyclosedin L∞(Ω)m, for, if (sk) ⊂ K(y) convergesweakly∗ to s inL∞(Ω)m, then〈w, sk − s〉Ω → 0 for all w ∈ L1(Ω)m ⊃ L2(Ω)m, showing thatsk → s weaklyin L2(Ω)m. Thus,K(y) is weak∗ sequentiallyclosedandboundedin L∞(Ω)m. SinceL1(Ω)m is separable,thisyieldsthatK(y) is weak∗ sequentiallycompact. ut


Convexity and Weak Compactness

As furtherusefulpropertiesof ∂Ψ we establishtheconvexity andweakcompact-nessof its images:

Theorem3.66. UndertheAssumptions3.27, thegeneralizeddifferential∂Ψ(y) isnonempty, convex,andweaklysequentiallycompactfor all y ∈ Y . If Y is separable,then∂Ψ(y) is alsoweaklycompactfor all y ∈ Y .

Proof. The nonemptynesswasalreadystatedin Theorem3.42.The convexity fol-lows immediatelyfrom the convexity of the setK(y) derived in Lemma3.41.Wenow prove weak sequentialcompactness.Let (Mk) ⊂ ∂Ψ(y) be any sequence.Then

Mk =∑

idki ·G′i(y)

with dk ∈ K(y), see(3.26).Lemma3.65 yields thatK(y) is weak∗ sequentiallycompactin L∞(Ω)m. Hence,wecanselectasubsequencesuchthat(dk) convergesweak∗ to d∗ ∈ K(y) in L∞(Ω)m. DefineM∗ =

∑i d∗i · Gi(y) andobserve that

M∗ ∈ ∂Ψ(y), sinced∗ ∈ K(y). It remainsto prove thatMk → M∗ weakly. Letw ∈ Lr′(Ω) = Lr(Ω)′ andv ∈ Y bearbitrary. We setzi = w · G′i(y)v andnotethatzi ∈ L1(Ω). Hence,

|〈w, (Mk −M∗)v〉Ω | ≤∑

i|〈w, (dk − d∗)i ·G′i(y)v〉Ω|

=∑

i|〈zi, (dk − d∗)i〉Ω| −→ 0 ask →∞.

(3.50)

Therefore,theweaksequentialcompactnessis shown.By Lemma3.37,∂Ψ(y) is containedin a closedball in L(Y,Lr), on which

the weaktopology is metrizableif Y is separable(note that 1 ≤ r < ∞ impliesthatLr(Ω) is separable).Hence,in thiscasetheweakcompactnessfollowsfrom theweaksequentialcompactness. ut

Weak Graph Closednessof the GeneralizedDiffer ential

Finally, weprove thatthemultifunction∂Ψ is weaklygraphclosed:

Theorem3.67. LettheAssumptions3.27besatisfiedandlet (yk) ⊂ Y and(Mk) ⊂L(Y,Lr(Ω)) besequencessuch thatMk ∈ ∂Ψ(yk) for all k, yk → y∗ in Y , andMk →M∗ weaklyin L(Y,Lr(Ω)). ThenholdsM∗ ∈ ∂Ψ(y∗). If, in addition,Y isseparable, thentheaboveassertionalsoholdsif wereplacethesequences(yk) and(Mk) by nets.

Proof. Let yk → y∗ in Y and∂Ψ(yk) 3Mk →M∗ weakly. Wehavetherepresen-tationsMk =

∑i dki ·G′i(yk) with measurableselectionsdk of ∂ψ(uk), whereuk =

G(yk). We alsointroduceu∗ = G(y∗). Themultifunctionω ∈ Ω 7→ ∂ψ(u∗(ω)

)is closed-valued(evencompact-valued)andmeasurable.Furthermore,the function


(ω, h) 7→ ‖dk(ω)−h‖2 is anormalintegrandonΩ×Rm [129,Cor. 2P].Hence,by[129, Thm.2K], themultifunctionsSk : Ω → Rm,

Sk(ω) = arg minh∈∂ψ(u∗(ω))

‖dk(ω)− h‖2

are closed-valued(even compact-valued)and measurable.We choosemeasurableselectionssk of Sk. Thesequence(sk) is containedin the,by Lemma3.65,sequen-tially weak∗ compactsetK(y∗) ⊂ L∞(Ω)m. Further, by Lemma3.41,we havedk ∈ LψBmL∞ .

Hence,by transitionto subsequenceswe achieve sk → s ∈ K(y∗) weak∗ inL∞(Ω)m anddk → d ∈ LψB

mL∞ weak∗ in L∞(Ω)m. Therefore,(dk − sk) →

(d − s) weak∗ in L∞(Ω)m andthusalsoweakly in L2(Ω)m. Sinceuk → u∗ in∏i L

qi(Ω), we achieveby transitionto a furthersubsequencethatuk → u∗ a.e.onΩ. Hence,sincedk(ω) ∈ ∂ψ

(uk(ω)

)for a.a.ω ∈ Ω and∂ψ is uppersemicon-

tinuous,we obtainfrom theconstructionof sk that (dk − sk) → 0 a.e.onΩ. Thesequence(dk− sk) is boundedin L∞(Ω)m andthustheLebesgueconvergencethe-oremyields(dk−sk)→ 0 inL2(Ω)m. From(dk−sk)→ 0 and(dk−sk)→ (d−s)weaklyin L2(Ω)m weseed = s. We thushave

dk → d = s ∈ K(y∗) weak∗ in L∞(Ω)m.

ThisshowsthatMdef=∑

i di ·G′i(y∗) ∈ ∂Ψ(y∗). It remainsto provethatMk → M

weakly. To show this, letw ∈ Lr′(Ω) = Lr(Ω)′ andv ∈ Y bearbitrary. Thenwithzki = w ·G′i(yk)v andzi = w ·G′i(y∗)v holdszki, zi ∈ L1(Ω) and

‖zki − zi‖L1 ≤ ‖w‖Lr′ ‖G′i(yk)v −G′i(y∗)v‖Lr → 0 ask →∞.

Hence,weobtainsimilar asin (3.50)

|〈w, (Mk − M)v〉Ω| ≤∑

i

∣∣〈w, dki ·G′i(yk)v − di ·G′i(y∗)v〉Ω∣∣=∑

i

∣∣〈(dki, zki〉Ω − 〈di, zi〉Ω∣∣≤∑

i

(|〈di − dki, zi〉Ω |+ ‖dki‖L∞‖zi − zki‖L1

)→ 0 ask →∞.

This impliesM∗ = M ∈ ∂Ψ(y∗) andcompletestheproof of thefirst assertion.Now let (yκ) ⊂ Y and(Mκ) ⊂ L(Y,Lr(Ω)) benetssuchthatMκ ∈ ∂Ψ(yκ)

for all κ, yκ → y∗ in Y , andMκ → M weakly in L(Y,Lr(Ω)). Since(yκ) finallystaysin any neighborhoodof y∗ andsinceG′ is continuous,we seefrom (3.25)thatw.l.o.g. we may assumethat (Mκ) is containedin a boundedball B ⊂ L(Y,Lr).Since,due to the assumedseparabilityof Y , B is metrizablewith respectto theweaktopology, weseethatwecanwork with sequencesinsteadof nets. ut

4. SmoothingStepsand RegularityConditions

The analysisof semismoothNewton methodsusedthreeingredients,semismooth-ness,a smoothingstepand a regularity condition. In this chapterwe show howsmoothingstepscanbe obtainedin practiceandalsodescribea particularmethodthat doesnot requirea smoothingstepat all. Furthermore,we establishsufficientconditionsthatimply theregularityconditionstatedin Assumption4.5.

4.1 SmoothingSteps

We considerthe VIP (1.1) underthe assumptionsstatedthere.It wasalreadyob-served in earlierwork [95, 143], andit canbe verifiedby consideringthe applica-tionsencounteredsofar, thatmany problemsof practicalinterestcanbestatedasaVIP (1.1)with theoperatorF meetingthefollowing requirement:

Assumption 4.1. TheoperatorF hastheform

F (u) = λu+G(u),

whereλ is positive and andG : Lr(Ω) → Lp(Ω), p > r, is locally Lipschitzcontinuous.

NotethatG(u) livesin asmootherspacethanits preimageu, sinceLp(Ω) → Lr(Ω)(usingthatΩ is bounded)with nonequivalentnorms.This form of G arises,e.g.,inthe first-ordernecessaryoptimality conditionsof a large classof optimal controlproblemswith boundson thecontrolandL2-regularization[95, 141, 143]. For ob-tainingsmoothingsteps,we borrow anideafrom Kelley andSachs[95].

SinceφE[α,β](x) = x1 − P[α,β](x1 − x2) is an MCP-function,we know thatu ∈ Lp(Ω) solvestheVIP (1.1) if andonly if S(u) = u, where

S(u) def= PB(u− λ−1F (u)), PB(u) = maxa,minu, b. (4.1)

Further, for all u ∈ Lr(Ω) wehave

u− λ−1F (u) = −λ−1G(u) ∈ Lp(Ω),

andthereforeS(u) = PB(−λ−1G(u)

). Wenow usethatfor all v,w ∈ Lp(Ω) there

holdspointwise|PB(v)−PB(w)| ≤ |v−w|, andthus‖PB(v)−PB(w)‖Lp ≤ ‖v−

70 4. SmoothingStepsandRegularityConditions

w‖Lp . Further,G is Lipschitzcontinuous(with rankLG) onanLr-neighborhoodofu. Hence,for all u ∈ Lr(Ω) in this neighborhood,weobtain

‖S(u)− u‖Lp = ‖S(u)− S(u)‖Lp = ‖PB(−λ−1G(u))− PB(−λ−1G(u))‖Lp

≤ λ−1‖G(u)−G(u)‖Lp ≤ LGλ−1‖u− u‖Lr .

Thisshows

Theorem4.2. Let theAssumption4.1 hold anddefineS by (4.1). Thenin anyLr-neighborhoodof u onwhichG is Lipschitzcontinuous(with rankLG) themapping

u0k ∈ Lr(Ω) 7→ uk

def= S(u0k) ∈ Lp(Ω)

is a smoothingstepin thesenseof Assumption3.11(ii) with constantCS = LG/λ.

Theapplicabilityof this approachto concreteproblemsis discussedin theapplica-tion chapters7 and8. Hereweonly considertheintroductoryexamplecontrolprob-lem (1.11)of section1.1.1.There,seeRemark3.34,we haveF (u) = λu − w(u),wherew(u) ∈ H1

0 (Ω) is the adjoint state,which dependscontinuouslyandaffinelinearly on u ∈ L2(Ω). SinceH1

0 (Ω) → Lp(Ω) for appropriatep > 2, the de-scribedscenariois givenwith G(u) = −w(u), andr = 2.

4.2 A Newton Method without SmoothingSteps

We now describehow a variantof the MCP-functionφE canbe usedto derive asemismoothreformulationof VIP to which a Newton methodwithout smoothingstepcanbeapplied.In fact,theverysameideausedin theconstructionof smoothingstepscanbe adopted.Hereby, we assumethat F hasthe samestructureas in theprevioussection4.1.Thesimpleideais to reformulate(1.1)equivalentlyas

u− S(u) = 0, (4.2)

andto establishthe semismoothnessof the operatoru ∈ Lr(Ω) 7→ u − S(u) ∈Lr(Ω).

Remark 4.3. In the recentreport [78], Hintermuller, Ito, andKunischobserve inthecontext of bound-constrainedlinear-quadraticcontrolproblemsthatsemismoothNewton methodsappliedto (4.2) areidentical to the classof primal dual methodsdevelopedin [14, 15]. Numerical testsin thesepapershave proven the excellentefficiency of thisclassof methods,andthusunderlinesthepotentialandimportanceof semismoothNewton methods.Thesepositive resultsare confirmedby all ournumericaltests,seechapter7.

Theorem4.4. LetF : Lr(Ω) → Lr(Ω) becontinuouslydifferentiableand let As-sumption4.1hold.Definetheoperator

Φ : u ∈ Lr(Ω) 7→ u− S(u) ∈ Lr(Ω),

4.2 A Newton MethodwithoutSmoothingSteps 71

with S as definedin (4.1). ThenΦ is locally Lipschitz continuousand ∂Φ-semi-smooth,with ∂Φ(u) consistingof all M ∈ L(Lr, Lr) of theform

M = I + λ−1dG′(u),

with d ∈ L∞(Ω),

d(ω) ∈ ∂P[a(ω),b(ω)](−λ−1G(u)(ω)), ω ∈ Ω. (4.3)

If F ′ isα-orderHoldercontinuous,α ∈ (0, 1], thenΦ is β-ordersemismoothwith βasgivenin Theorem3.45.

Proof. WeintroducethedisjointmeasurablepartitioningΩ = Ωf ∪Ωl ∪Ωu ∪Ωlu,

Ωf = Ω \ (Ωa ∪Ωb), Ωl = Ωa \Ωb, Ωu = Ωb \Ωa, Ωlu = Ωa ∩Ωb.

Now, set a = a onΩa and a = 0, otherwise,b = b onΩb and b = 1, otherwise.SinceΨf (u) = −λ−1G(u) mapsLr(Ω) continuouslydifferentiabletoLr(Ω),Ψf islocally Lipschitzcontinuousand(−λ−1G′)-semismooth.Further, we haveS(u) =Ψf (u) onΩf . Hence,by Proposition3.7, 1Ωf · S is locally Lipschitz continuousand[1Ωf · (−λ−1G′)]-semismooth.Obviously, thisgeneralizeddifferentialconsistsof all operatorsof theform

[1Ωf · (−λ−1dG′)]

with d asin (4.3).Next, wesetψl(t) = max0, t anddefineΨ l : Lr(Ω)→ Lr(Ω),

Ψ l(u) = ψl(−λ−1G(u)− a).

By Proposition3.31andTheorem3.44,thisoperatoris locally Lipschitzcontinuousand∂Ψ l-semismooth.Furthermore,thereholdsS(u) = a+ Ψ l(u) onΩl, andthus1Ωl ·S is locally Lipschitzcontinuousand(1Ωl ·∂Ψ l)-semismoothby Propositions3.4 and3.7.Looking at thestructureof ∂Ψ l we seethat (1Ωl · ∂Ψ l) is thesetofall operators

1Ωl · [−λ−1dG′(u)],

whered ∈ L∞(Ω) satisfies(4.3).In fact,for ω ∈ Ωl holdswith α = a(ω) = a(ω)

P[a(ω),b(ω)](t) = maxα, t = α+ max0, t− α = α+ ψl(t− α),

andthus∂P[α,∞)(t) = ∂ψl(t− α).

In a completelyanalogousway, we seethat1Ωu · S is locally Lipschitzcontinuousand(1Ωu ·∂Ψu)-semismooth,wherethelatterdifferentialis thesetof all operators

1Ωu · [−λ−1dG′(u)]

with d ∈ L∞(Ω) asin (4.3).


Finally, weconsiderω ∈ Ωlu. Forα = a(ω) = a(ω), β = b(ω) = b(ω) wehave

P[a(ω),b(ω)](t) = maxα,mint, β = α+ max0,mint− α, β − α

= α+ (β − α)ψlu(t− αβ − α

)with ψlu(t) = max0,mint, 1 = P[0,1](t). Weconcludefor ω ∈ Ωlu

∂P[a(ω),b(ω)](t) = (β − α)∂t

[ψlu

(t− αβ − α

)]= ∂ψlu

(t− αβ − α

). (4.4)

Now define

Ψ lu(u) = ψlu(−λ

−1G(u) + a

b− a).

By Proposition3.31andTheorem3.44,thisoperatoris locally Lipschitzcontinuousand∂Ψ lu-semismooth.Furthermore,thereholds

1Ωlu · S = 1Ωlu · [a+ (b− a) · Ψ lu].We useonceagainPropositions3.4 and 3.7 to concludethat 1Ωlu · S is locallyLipschitzcontinuousand(1Ωlu · (b − a) · ∂Ψ lu)-semismooth.From (4.4) we seethatthis differentialis thesetof all operators

1Ωlu · [−λ−1dG′(u)],

whered ∈ L∞(Ω) satisfies(4.3).Now, since

u− S(u) = u− 1Ωf · S(u)− 1Ωl · S(u)− 1Ωu · S(u)− 1Ωlu · S(u),

wecanapplyProposition3.4to completetheproofof thefirst assertion.If F ′ is α-Holder continuous,thenit is straightforward to modify the proof to

establishsemismoothnessof orderβ > 0. utTherefore,we can apply the Newton methodsof section3.2.3 to solve the refor-mulation(4.2) of theVIP. A smoothingstepis not required,sinceΦ is semismoothasa mappingfrom Lr = L2 into itself, and,aswe will demonstratefor NCPsinsection4.3, it is appropriateto useAssumption3.59(i), i.e., theuniformly boundedinvertibility of thegeneralizeddifferentialsin L(L2, L2) asregularitycondition.

4.3 Sufficient Conditions for Regularity

In this sectionwe establisha sufficient condition for solutionsof the NCP (1.4),posedin theusualsettingof (1.1),thatimpliesthefollowing regularitycondition:

Assumption 4.5. Thereexist constantsη > 0 andCM−1 > 0 suchthat, for allu ∈ u + ηBLp , every M ∈ ∂Φ(u) is an invertible elementof L(L2, L2) with‖M−1‖L2,L2 ≤ CM−1 .

4.3 SufficientConditionsfor Regularity 73

Hereby, Φ = φ(u,F (u)) is the superpositionoperatorarising in the semismoothreformulationvia theNCP-functionφ. WeconsiderproblemswhereF hastheformF (u) = λu +G(u), andG hasa smoothingproperty. In this setting,we show that,in broadterms,regularity is implied byL2-coercivity of F ′(u) on thetangentspaceof thestronglyactiveconstraints.

An alternative sufficient conditionfor regularity, which doesnot requirespecialstructureof F , but assumesthatF ′(u) is L2-coercive on the whole space,canbefoundin theauthor’spaper[141].

Wework underthefollowing assumptions:

Assumption 4.6. Thereexist p ∈ [2,∞] andp′ ∈ (2,∞] suchthat:

(a) F (u) = λu+G(u), λ ∈ L∞(Ω), λ ≥ λ0 > 0,

(b) G : L2(Ω)→ L2(Ω) is Frechetdifferentiablewith derivativeG′(u).(c) u ∈ Lp(Ω)→ G′(u) ∈ L(L2(Ω), L2(Ω)) is continuousnearu.

(d) For u nearu in Lp(Ω), theL2-endomorphismsG′(u) andG′(u)∗ arecontainedin L(L2(Ω), Lp

′(Ω)) with their normsuniformly boundedby a constantCG′ .

(e) Thereexistsaconstantν > 0 suchthatfor F ′(u) = λI +G′(u) holds

(v, F ′(u)v)L2(Ω) ≥ ν‖v‖2L2(Ω)

for all v ∈ L2(Ω) with v = 0 onω ∈ Ω : F (u)(ω) 6= 0.(f) φ is Lipschitzcontinuousandsemismooth.

(g) Thereexistsaconstantθ > 0 suchthatfor all x ∈ R2 andall g ∈ ∂φ(x) holds

g1g2 ≥ 0, |g1 + g2| ≥ θ.(h) For x ∈ (0,∞)×0 holds∂φ(x) ⊂ 0×R, andfor x ∈ 0× (0,∞) holds

∂φ(x) ⊂ R× 0.Remark 4.7. In thecaseof a minimizationproblem,i.e.,F = j′, condition(e) canbe interpretedasa strongsecondordersufficient condition:The Hessianoperatorj′′(u) hasto becoerciveonthetangentspaceof thestronglyactiveconstraints.Sim-ilar conditionscanbe found in, e.g.,Dunn, Tian [46] andUlbrich, Ulbrich [143].Strongsecondordersufficient conditionsarealsoessentialfor proving fastconver-genceof finite-dimensionalalgorithms,see,e.g.,[19, 77, 105].

Observe that Assumption4.6 with p > 2 implies Assumption3.32 with r = 2andp′ = minp, p′ on anLp-neighborhoodof u. Hence,Φ : Lp(Ω) → L2(Ω)is semismoothat u by Theorem3.48. In fact, (a)–(c) imply Assumption3.32 (a).Further, for u, u + v ∈ Lp(Ω) nearu holdswith s = minp, p′, using(d),

‖F (u+ v)− F (u)‖Ls ≤∫ 1

0

‖F ′(u+ tv)v‖Lsdt

≤ c‖λ‖L∞‖v‖Lp + c supt∈[0,1]

‖G′(u+ tv)‖Lp,Lp′‖v‖Lp

≤ c(‖λ‖L∞ + CG′)‖v‖Lp ,


which impliesAssumption3.32(b) for p′ = s. Finally (f) ensuresAssumption3.32(c),(d).

Next, we illustratetheAssumptions4.6 by verifying themfor thecontrolprob-lem (1.11).There,F (u) = j′(u) = λu− w(u), where

w(u) = ∆−1(y − yd) = −∆−1(∆−1u+ yd) ∈ H10 (Ω) (4.5)

is theadjointstate.Themappingu ∈ L2(Ω) 7→ w(u) ∈ H10 (Ω) is continuousand

affine linear. Thus,choosingp′ > 2 suchthatH10 (Ω) → Lp

′(Ω), F hastheform as

in assumption(a)withG : u ∈ L2(Ω) 7→ −w(u) ∈ Lp′(Ω) beingcontinuousaffinelinear. Therefore,G is smoothandG′ ∈ L(L2, Lp

′) is constant.From(4.5)we see

thatG′(u) = ∆−1∆−1, henceG′(u)∗ = G′(u) and,with z ∈ H10 (Ω) solutionof

−∆z = v, wehave

(F ′(u)v, v)L2 = (G′(u)v, v)L2 + (λv, v)L2 ≥ ‖z‖2L2 + λ‖v‖2L2 ≥ λ‖v‖2L2 .

Takingall together, weseethat(a)–(e)aresatisfiedfor any p ∈ [2,∞].Wenow establishoursufficient conditionfor regularity:

Theorem4.8. If Assumption4.6 holdsat a solution u ∈ Lp(Ω) of the NCP (1.4)thenthereexistsρ > 0 such thatAssumption4.5 is satisfied.

Proof. For convenience,weset(·, ·) = (·, ·)L2(Ω) and‖ · ‖ = ‖ · ‖L2(Ω).Any elementM ∈ ∂Φ(u) canbewritten in theform

M = d1 · I + d2 · F ′(u), di ∈ L∞(Ω), (d1, d2) ⊂ ∂φ(u). (4.6)

Due to the Lipschitz continuity of φ, the functionsd1, d2 areboundedin L∞(Ω)uniformly in u. Wedefine

c =d2

d1 + λd2, (4.7)

which,sinceby assumptiond1d2 ≥ 0, θ ≤ d1 +d2, andλ ≥ λ0 > 0, is well-definedanduniformly boundedin L∞(Ω) for all u ∈ Lp(Ω). UsingF ′(u) = λI +G′(u),weseethat

M = (d1 + λd2) · (I + c ·G′(u)).Since(d1 + λd2) and(d1 + λd2)−1 areuniformly boundedin L∞(Ω) for all u ∈Lp(Ω), theoperatorsM ∈ ∂Φ(u) arecontinuouslyinvertiblein L(L2(Ω), L2(Ω))onanLp-neighborhoodof uwith uniformly boundedinversesif andonly if thesameholdstruefor theoperatorsT = I + c ·G′(u).

Next, considerany M ∈ ∂Φ(u) with correspondingfunctions d1, d2, c ∈L∞(Ω) accordingto (4.6)and(4.7).Definethesets

Ω1 = (u, F (u)) 6= 0, Ω2 = u = 0, F (u) = 0,andconsiderthefunctione ∈ L∞(Ω),

e = c onΩ1, e = c onΩ2. (4.8)


Wefirst prove that,for arbitraryt ∈ [1,∞),

‖c− e‖Lt → 0 as u→ u in Lp(Ω). (4.9)

Assumethat this is not true.Thenthereexist t ≥ 1, ε > 0 anda sequence(uk) ⊂Lp(Ω) with uk → u in Lp(Ω) andcorrespondingdifferentialsMk ∈ ∂Φ(uk) suchthat

‖ck − ek‖Lt ≥ ε ∀ k. (4.10)

Hereby, we denoteby d1k, d2k, ck, andek theassociatedfunctionsdefinedin (4.6),(4.7), and(4.8). From uk → u follows F (uk) → F (u) in Lminp,p′(Ω). Hence,thereexistsasubsequencesuchthat(uk′ , F (uk′))→ (u, F (u)) a.e.onΩ.

SinceuF (u) = 0, wehave thedisjoint partitioningΩ1 = Ω11 ∪Ω12 with

Ω11 = F (u) 6= 0 = u = 0, F (u) 6= 0,Ω12 = u 6= 0 = u 6= 0, F (u) = 0.

On thesetΩ11 we have(a.e.)uk′ → 0, F (uk′)→ F (u) 6= 0 andthus,by theuppersemicontinuityof ∂φ andtheassumptionson φ, d1k′ → d1 6= 0, d2k′ → 0, whichimpliesck′ → 0 = c onΩ11. SinceΩ hasfinite measureandthesequence(ck′) isboundedin L∞(Ω), theLebesgueconvergencetheoremimplies

‖ck′ − c‖Lt(Ω11) → 0. (4.11)

On thesetΩ12 holdsuk′ → u 6= 0, F (uk′) → F (u) = 0 andthus,againusingthepropertiesof ∂φ, d1k′ → 0 = d1, d2k′ → d2 6= 0, whichimpliesck′ → 1/λ = c.InvokingLebesgue’sconvergencetheoremonceagainweseethat

‖ck′ − c‖Lt(Ω12) → 0. (4.12)

Thenit is animmediateconsequenceof (4.11)and(4.12)that

‖ck′ − ek′‖Lt(Ω) = ‖ck′ − c‖Lt(Ω1) ≤ ‖ck′ − c‖Lt(Ω11) + ‖ck′ − c‖Lt(Ω12) → 0,

whichcontradicts(4.10).Thus,(4.9) is proved.Wenow considertheoperators

T = I + c ·G′(u) and S = I + e ·G′(u).For all v ∈ L2(Ω) holds(with 2p′/(p′ − 2) to beinterpretedas2 if p′ =∞)

‖Tv − Sv‖ ≤ ‖(c− e) ·G′(u)v‖ + ‖c · (G′(u)v −G′(u)v)‖≤ ‖c− e‖

L2p′

p′−2‖G′(u)v‖Lp′ + ‖c‖L∞‖G′(u)v −G′(u)v‖

≤ ‖c− e‖L

2p′p′−2‖G′(u)‖L2,Lp′‖v‖ + ‖c‖L∞‖G′(u)−G′(u)‖L2,L2‖v‖.

Thisproves‖T − S‖L2,L2 → 0 asu→ u in Lp(Ω). (4.13)


Next, weprove‖S∗v‖ ≥ γ‖v‖ ∀ v ∈ L2(Ω), (4.14)

whereγ = 1 if G′(u) = 0 and

γ = minνκ, 1/2, κ =1

2‖G′(u)∗‖L2,L2if G′(u) 6= 0.

Theassertionis trivial if G′(u) = 0. To prove theassertionfor G′(u) 6= 0, we setw = ev anddistinguishtwo cases.Case1: ‖w‖ ≤ κ‖v‖.

Then

‖S∗v‖ = ‖v +G′(u)∗(ev)‖ ≥ ‖v‖ − ‖G′(u)∗w‖≥ (1− κ‖G′(u)∗‖L2,L2)‖v‖ ≥ 1

2‖v‖ ≥ γ‖v‖.

Case2: ‖w‖ > κ‖v‖:Sincew = ev ande = c = 0 onΩ11, wehavew = 0 onΩ11 andthus,by (e),

(w, (λI +G′(u)∗)w) ≥ ν‖w‖2.

In thecalculationsto follow we will usethat

1− λe = 1 onΩ11, 1− λe = 1− λc = 0 onΩ12,

1− λe = 1− λc =d1 + λd2 − λd2

d1 + λd2=

d1

d1 + λd2≥ 0 onΩ2.

In particular, 1− λe ≥ 0 onΩ, andthus

‖w‖‖S∗v‖ ≥ (w,S∗v) = (w, v) + (w,G′(u)∗w)

≥ (w, v) + ν‖w‖2 − (w,λw) = (w, (1− λe)v) + ν‖w‖2= (v, e(1− λe)v) + ν‖w‖2 ≥ ν‖w‖2 ≥ νκ‖w‖‖v‖ ≥ γ‖w‖‖v‖.

Hence,(4.14)is proved.In particular,S∗ is injective.Moreover,S∗ hasclosedrange.In fact,let S∗vk → z. Then

‖vk − vl‖ ≤ γ−1‖S∗vk − S∗vl‖ → 0 ask, l→∞.

Therefore,vk → v andS∗vk → S∗v, hencez = S∗v. By theclosedrangetheorem[91, Ch.XII], theinjectivity of S∗ now impliesthesurjectivity of S.

We proceedby showing the injectivity of S. Considerany v ∈ L2(Ω) withSv = 0. Let usintroducethefunctionz ∈ Lp′(Ω),

z = 0 onΩ11, z = −G′(u)v onΩ12 ∪Ω2. (4.15)

Observingthat


v = Sv − e ·G′(u)v = −e ·G′(u)v on Ω,

ande = 0 onΩ11, we seethat

v = ez on Ω,

andthatv vanishesonΩ11. Therefore,using(e),

0 = (z, Sv) = (z, v) + (z, e ·G′(u)v) = (z, v) + (ez,G′(u)v)

= (z, v) + (v,G′(u)v) ≥ (z, v) + ν‖v‖2 − (v, λv) = ν‖v‖2 + (z − λez, ez)= ν‖v‖2 + (z, (1− λe)ez) ≥ ν‖v‖2,

since(1− λe)e ≥ 0. This impliesv = 0, whichprovestheinjectivity of S.We thushave shown theS ∈ L(L2(Ω), L2(Ω)) is bijective andhence,by the

openmappingtheorem,continuouslyinvertible.Furthermore,for all v ∈ L2(Ω) wehave

‖v‖ = ‖S∗(S∗)−1v‖ ≥ γ‖(S∗)−1v‖,

andthus

‖S−1‖L2,L2 = ‖(S∗)−1‖L2,L2 ≤ 1γ.

By (4.13), there exists ρ > 0 such that for all u ∈ Lp(Ω), ‖u − u‖Lp ≤ ρ,holds ‖T − S‖L2,L2 ≤ γ/2. Therefore,by Banach’s theorem[91, Ch. V.4.6],T ∈ L(L2(Ω), L2(Ω)) is invertiblewith

‖T−1‖L2,L2 ≤ ‖S−1‖L2,L2

1− ‖S−1‖L2,L2‖T − S‖L2,L2≤ 2γ.

utThe sufficient conditionof Theorem4.8 andthe sufficient conditionfor regularityestablishedin [141] arevery helpful in establishingregularity for concreteapplica-tions.

5. Variational Inequalities and MixedProblems

So far, we have demonstratedthe applicability of semismoothNewton methodsmainly for the NCP (1.4). We now discussseveral applicationsto more generalclassesof problems.First, we show how the semismoothreformulationapproachthatweinvestigatedin detailfor theNCPcanbeextendedto thelargerproblemclassof bound-constrainedVIPs(1.1).In addition,wedescribehow semismoothreformu-lationscanbeobtainedfor evenmoregeneralproblemsthanthebound-constrainedVIP. The secondextensionconsidersmixed problemsconsistingof VIPs andandadditionaloperatorequations.In particular, thefirst ordernecessary(Karush–Kuhn–Tucker, KKT) conditionsof very generaloptimizationproblemscanbe written inthis form.

5.1 Application to Variational Inequalities

5.1.1 Problemswith Bound-Constraints

We now describehow our treatmentof the NCP canbe carriedover to the bound-constrainedVIP (1.1). Onepossibility wasalreadydescribedin section4.2, wherewe presenteda semismoothreformulationthat doesnot requirea smoothingstep.Here,we describea similar approachfor which generalNCP- andMCP-functionscanbeused.

For thederivationof a semismoothreformulation,let begivenanNCP-functionφ andMCP-functionsφ[α,β] for all compactintervals.Wenow definetheoperator

Φ(u)(ω) =

F (u)(ω) on Ωf = Ω \ (Ωa ∪Ωb),φ(u(ω)− a(ω), F (u)(ω)

)on Ωl = Ωa \Ωb,

−φ(b(ω)− u(ω),−F (u)(ω))

on Ωu = Ωb \Ωa,

φ[a(ω),b(ω)]

(u(ω), F (u)(ω)

)on Ωlu = Ωa ∩Ωb.

(5.1)

It wasshown in section1.2thatu ∈ Lp(Ω) solves(1.1) if andonly if

Φ(u) = 0. (5.2)

Our aim is to establishthesemismoothnessof Φ andto characterizeits generalizeddifferential.Hereby, we require:

80 5. VariationalInequalitiesandMixedProblems

Assumption 5.1. Thereexist r ∈ [1, p) ∩ [1, p′) suchthat

(a) Themappingu ∈ Lp(Ω) 7→ F (u) ∈ Lr(Ω) is continuouslydifferentiable.


(c) Thefunctionφ : R2 → R is Lipschitzcontinuousandsemismooth.

(d) Thefunctionx 7→ ψ[x1,x2](x3, x4) is Lipschitzcontinuousandsemismooth.

For semismoothnessof higherorderweneedslightly strongerrequirements.

Assumption 5.2. Thereexistsr ∈ [1, p) ∩ [1, p′) andα ∈ (0, 1] suchthat

(a) The mappingu ∈ Lp(Ω) 7→ F (u) ∈ Lr(Ω) is differentiablewith locally α-Holdercontinuousderivative.


(c) Thefunctionφ : R2 → R is Lipschitzcontinuousandα-ordersemismooth.

(d) The functionx 7→ ψ[x1,x2](x3, x4) is Lipschitzcontinuousandα-ordersemis-mooth.

Remark 5.3. At this point it would be more convenient if we had establishedsemismoothnessresults for superpositionoperatorsof the form ψ(ω,G(u)(ω)).This is certainlypossible,but not really neededin this work. Instead,the trick wewill usehereis to build superpositionoperatorswith the inner operatorgiven byu 7→ (a, b, u, F (u)), wherea, b arecutoff versionsof a andb to makethemfinite.

A differentapproachwouldbeto transformtheproblemsuchthat[a, b] → [0, 1]onΩa∩Ωb and[a, b]→ [0,∞) on(Ωa∪Ωb)\(Ωa∩Ωb). Thereis however, acertaindangerthatthis transformationaffectsthescalingof theproblemin a negativeway.Thelatterapproachwasimplicitly usedin theproof of Theorem4.4.

Theorem5.4. UnderAssumption5.1 theoperator Φ : Lp(Ω) → Lr(Ω) is locallyLipschitz continuousand∂Φ-semismooth,where∂Φ(u) consistsof all operatorsM ∈ L(Lp, Lr) of theform

M = d1I + d2 · F ′(u),

with d1, d2 ∈ L∞(Ω),

(d1, d2)(ω) ∈

(0, 1) on Ωf ,

∂φ(u(ω)− a(ω), F (u)(ω)

)on Ωl,

∂φ(b(ω)− u(ω),−F (u)(ω)

)on Ωu,

∂φ[a(ω),b(ω)]

(u(ω), F (u)(ω)

)on Ωlu.

(5.3)

UnderAssumption5.2 theoperator Φ is evenβ-order semismooth,whereβ > 0 isasin Theorem3.45.

Proof. Let usdefinea, b ∈ Lp(Ω) by a = a onΩa, a = 0, otherwise,b = b onΩb,b = 0, otherwise.Further, let

5.1 Applicationto VariationalInequalities 81

ψf (x) = x4, ψl(x) = φ(x3 − x1, x4),

ψu(x) = −φ(x2 − x3,−x4), ψlu(x) = φ[x1,x2](x3, x4),

whichareLipschitzcontinuousandsemismooth.Define

T : u ∈ Lp(Ω) 7→ (a, b, u, F (u)) ∈ Lr(Ω)4,

which is continuouslydifferentiablewith derivative T ′(u) = (0|0|I|F ′(u)), andlocally LipschitzcontinuousasamappingLp(Ω)→ Lp(Ω)3 × Lp′(Ω).

Next, for γ ∈ f, l, u, lu, we introducethesuperpositionoperators

Ψγ : Lp(Ω)→ Lr(Ω), Ψγ(u)(ω) = ψγ(T (u)(ω)

).

By Proposition3.31 and Theorem3.44, theseoperatorsare ∂Ψγ-semismooth;hereby, theoperatorMγ ∈ L(Lr, Lr) is anelementof ∂Ψγ(u) if andonly if

Mγ = (dγa, dγb , d

γ1 , d

γ2) · T ′(u) = dγ1I + dγ2 · F ′(u),

wheredγa, dγb , d

γ1 , d

γ2 ∈ L∞(Ω) satisfy(dγa, d

γb , d

γ1 , d

γ2) ∈ ∂ψγ(T (u)) onΩ. Wenow

use[32, Prop.2.3.16],adirectconsequenceof Proposition2.3,to conclude

∂(x3,x4)ψγ(x) ⊂ g ∈ R2 : ∃h ∈ R2 : (h, g) ∈ ∂ψγ(x).

Now let d1, d2 ∈ L∞(Ω) bearbitrarysuchthat (5.3) holds.Thenholds(d1, d2) ∈∂(x3,x4)ψ

γ(T (u)) onΩγ . Therefore,usingFilippov’stheorem[11, Thm.8.2.10],weconcludethatthereexist dγa, d

γb ∈ L∞(Ω) with

(dγa, dγb , d1, d2) ∈ ∂ψγ(T (u)) onΩγ , γ ∈ f, l, u, lu.

Thisshows1Ωγ · [d1I + d2 · F ′(u)] ∈ 1Ωγ · ∂Ψγ(u). (5.4)

Finally, wedefineH ∈ L([Lr]4, Lr),

Hv = 1Ωf v1 + 1Ωlv2 + 1Ωuv3 + 1Ωluv4.

andobservethat

Φ(u) = H(Ψf (u), Ψ l(u), Ψu(u), Ψ lu(u)

).

Thus,Φ is locally Lipschitz continuous.Applicationof the directproductrule andthechainrule,Propositions3.5and3.7(notethatH ′ ≡ H is bounded),weconcludethatΦ isH ′ (∂Ψf × ∂Ψ l × ∂Ψu × ∂Ψ lu)-semismoothandthat,by (5.4),thisgeneralizeddifferentialcontainsall M ∈ L(Lr, Lr) of the form M = d1I + d2 ·F ′(u), whered1, d2 ∈ L∞(Ω) satisfy(5.3).

If Assumption5.2holds,thenit is straightforwardto modify theproof to estab-lish semismoothnessof orderβ > 0. ut


It shouldbe immediatelyclear from our detaileddiscussionof NCPsin previoussectionshow thesemismoothreformulation(5.2) canbeusedto applyour classofsemismoothNewtonmethods.Theresultingalgorithmlooksexactly like Algorithm3.57,with theonly differencethatΦ is definedby (5.1).Also the regularity condi-tion of Assumption3.59 is appropriateandthe assertionsof Theorem3.62canbeestablishedaswell.

We now discusswaysof choosingφ andφ[α,β]. Considerany NCP-functionφthat is positive on (0,∞)2 andnegative on R2 \ [0,∞)2, Thenthe following con-struction,which wasproposedby Billups [18] for φ = φFB , canbeusedto obtainanMCP-functionφ[α,β],−∞ < α < β < +∞:

φ[α,β](x) = φ(x1 − α,−φ(β − x1,−x2)

). (5.5)

Proposition5.5. Letφ beanNCP-functionthat is positiveon (0,∞)2 andnegativeon R2 \ [0,∞)2. Then,for any interval [α, β], −∞ < α < β < ∞, the functionφ[α,β](x) definedin (5.5) is anMCP-function.

Proof. Wehaveto show thatφ[α,β](x) = 0 holdsif andonly if

α ≤ x1 ≤ β, (x1 − α)x2 ≤ 0, (x1 − β)x2 ≤ 0. (5.6)

To this end,observe thatφ[α,β](x) = 0 is equivalentto

x1 − α ≥ 0, φ(β − x1,−x2) ≤ 0, (x1 − α)φ(β − x1,−x2) = 0, (5.7)

wherewehaveusedthefactthatφ is anNCP-function.For x1 < α, (5.6) and(5.7) arebothviolated.For x1 = α, we usetheassump-

tionsonφ to obtain

(5.6)⇐⇒ x2 ≥ 0⇐⇒ φ(β − α,−x2) ≤ 0⇐⇒ (5.7).

Finally, for x1 > α,

(5.6)⇐⇒ x1 ≤ β, x2 ≤ 0, (x1 − β)x2 ≤ 0⇐⇒ φ(β − x1,−x2) = 0⇐⇒ (5.7).

utWedemonstratethisconstructionfor

φ(x) = φE(x) = x1 − P[0,∞)(x1 − x2) = minx1, x2.Then

φ[α,β](x) = minx1 − α,−minβ − x1,−x2= minx1 − α,maxx1 − β, x2 = x1 − P[α,β](x1 − x2) = φE[α,β](x).

Therefore,startingwith theprojection-basedNCP-functionφE , we obtainthepro-jection-basedMCP-functionφE[α,β]. Concerningtheconcretecalculationof ∂φE and

∂φE[α,β], wehave


Proposition5.6. ThefunctionφE is piecewiseaffinelinear on R2 andaffinelinearon thesetsx : x1 < x2, x : x1 > x2. Thereholds:

∂φE(x) = ∂BφE(x) = φE ′(x) = (1, 0) for x1 < x2,

∂φE(x) = ∂BφE(x) = φE ′(x) = (0, 1) for x1 > x2,

∂BφE(x) = (1, 0), (0, 1), ∂φE(x) = (t, 1− t) : 0 ≤ t ≤ 1 for x1 = x2.

ThefunctionφE[α,β] ispiecewiseaffinelinearonR2 andaffinelinearontheconnectedcomponentsof x : x1 − x2 6= α, x1 − x2 6= β. Thereholds:

∂φE[α,β](x) = ∂BφE[α,β](x) = φE ′[α,β](x) = (1, 0) for x1 − x2 /∈ [α, β],

∂φE[α,β](x) = ∂BφE[α,β](x) = φE ′[α,β](x) = (0, 1) for x1 − x2 ∈ (α, β),

∂BφE[α,β](x) = (1, 0), (0, 1),

∂φE[α,β](x) = (t, 1− t) : 0 ≤ t ≤ 1

for x1 − x2 ∈ α, β.

Proof. This is animmediateconsequenceof Proposition2.25. utThegeneralizeddifferentialof φFB wasalreadyderivedin section2.5.2.In asimilarway, it is possibleto obtainformulasfor the generalizeddifferentialof φFB[α,β], see[54].

5.1.2 Pointwise ConvexConstraints

Moregeneralthanboundconstraints,wecanconsiderpointwiseconvex constraints,i.e., thefeasiblesetC is givenby

C = u ∈ Lp(Ω)m : u(ω) ∈ C onΩ, (5.8)

with p > 1, whereC ⊂ Rm is a nonemptyclosedconvex setand,asthroughoutthis work, Ω is boundedandmeasurablewith µ(Ω) > 0. Equally well, we couldconsidersetsC consistingof all u ∈ Lp(Ω)m with u(ω) ∈ C(ω) onΩ, with themultifunctionC having suitableproperties.For convenience,however, we restrictour discussionto thecase(5.8).

Wewish to solve thefollowing problem:

Variational Inequality with Pointwise ConvexConstraints:

u ∈ C, 〈F (u), v − u〉 ≥ 0 ∀ v ∈ C, (5.9)

with the sameassumptionsas in (1.1), but F being an operatorbetweenm-di-mensionalspaces,i.e., F : Lp(Ω)m → Lp

′(Ω)m, 1/p + 1/p′ ≤ 1 and〈u, v〉 =∫

Ωu(ω)T v(ω)dω. ThesetC is definedin (5.8).Supposethata continuousfunction

π : Rm × Rm → Rm is availablewith theproperty

π(x1, x2) = 0⇐⇒ x1 = PC(x1 − x2), (5.10)


wherePC is theEuclideanprojectionontoC. We will prove that(5.9) is equivalentto theoperatorequation

Π(u) = 0, where Π(u)(ω) = π(u(ω), F (u)(ω)

). (5.11)

Remark 5.7. Thefunction

πE(x1, x2) = x1 − PC(x1 − x2) (5.12)

satisfies(5.10).It generalizestheprojection-basedNCP-functionφE .

Proposition5.8. Let thefunctionπ : Rm × Rm → Rm satisfy(5.10)anddefineΠby (5.11). Thenu solves(5.9) if andonly if (5.11)is satisfied.

Proof. TheprojectionxP = PC(x) is characterizedby

xP ∈ C, (xP − x)T (z − xP ) ≥ 0 ∀ z ∈ C. (5.13)

Now, if Π(u) = 0, thenu(ω) = PC(u(ω) − F (u)(ω)

)a.e.on Ω. In particular,

u(ω) ∈ C and,by (5.13),for all v ∈ C,(u(ω)− [u(ω)− F (u)(ω)]

)T (v(ω)− u(ω)) ≥ 0,

wherewehaveusedv(ω) ∈ C. IntegratingthisoverΩ showsthatu solves(5.9).Conversely, assumethatΠ(u) 6= 0. If u /∈ C, thenu doesnot solve(5.9).Other-

wise,u ∈ C andtheset

Ω′ = ω : u(ω) 6= PC(u(ω)− F (u)(ω)

)haspositivemeasure.Setz = u− F (u) andv = u+ σw, where,for ω ∈ Ω,

w(ω) = PC(z(ω))− u(ω), σ(ω) =1

max1, ‖w(ω)‖2 .

Thenholdsv ∈ C, w 6= 0, and,

F (u)(ω)T (v(ω)− u(ω)) = σ(ω)F (u)(ω)Tw(ω)

= σ(ω)(w(ω) + F (u)(ω)

)Tw(ω) − σ(ω)‖w(ω)‖22

= σ(ω)(PC(z(ω))− z(ω)

)T (PC(z(ω))− u(ω)

)− σ(ω)‖w(ω)‖22≤ −σ(ω)‖w(ω)‖22 ≤ −min

‖w(ω)‖2, ‖w(ω)‖22.

IntegrationoverΩ yields〈F (u), v − u〉 < 0.

Therefore,sincev ∈ C, u is not asolutionof (5.9). ut


Thereformulation(5.11)isanoperatorequationinvolving thesuperpositionoperatorΠ. Theapplicationof semismoothNewtonmethodsis attractive if a functionπ canbe found that is (a) Lipschitz continuousand(b) semismooth,andfor which (c) πand∂Cπ canbecomputedefficiently. Requirement(a)holds,e.g.,for φ = φE , sincethe Euclideanprojectionis nonexpansive.(b) dependson the setC; if, e.g.,C is apolyhedron,thenPC is piecewise affine linear, see[132, Prop.2.4.4], andthus1-ordersemismooth.Also (c) dependson thesetC. We will give anexamplebelow.Requirements(a)and(b) areessentialfor proving thesemismoothnessof Π.

As a preparationfor thetreatmentof mixedproblems,we will prove thesemis-moothnessof a slightly moregeneralclassof operatorsthanthosedefinedin (5.11).Hereby, we consideroperatorsΠ(z, u) that arisefrom the reformulationof prob-lems(5.9)whereF dependsonanadditionalparameterz ∈ Z, whereZ is aBanachspace:

F : Z × Lp(Ω)m → Lp′(Ω)m.

For z ∈ Z we thenconsidertheproblem

u ∈ C, 〈F (z, u), v − u〉 ≥ 0 ∀ v ∈ C, (5.14)

which can be interpretedas a classof problems(5.9) that is parameterizedby z.Hereby, C is definedby (5.8).

Remark 5.9. The problem(5.9) is containedin the class(5.14)by choosingZ =0 andF (0, u) = F (u).

By Proposition5.8 we canusea functionπ satisfying(5.10) to reformulate(5.14)equivalentlyas

Π(z, u) = 0, where Π(z, u)(ω) = π(u(ω), F (z, u)(ω)

), ω ∈ Ω. (5.15)

Now supposethatthefollowing holds:

Assumption 5.10. Thereare1 ≤ r < minp, p′ suchthat

(a) F : Z × Lp(Ω)m → Lr(Ω)m is continuouslyFrechetdifferentiable.

(b) (z, u) ∈ Z × Lp(Ω)m 7→ F (z, u) ∈ Lp′(Ω)m is locally Lipschitzcontinuous.

(c) Thefunctionπ is Lipschitzcontinuous.

(d) π is semismooth.

Thenweobtain:

Theorem5.11. UnderAssumption5.10theoperator

Π : Z × Lp(Ω)m → Lr(Ω)m

definedin (5.15) is locally Lipschitz continuousand ∂CΠ-semismooth,where thegeneralizeddifferential∂CΠ(u) consistsof all operatorsM ∈ L(Z×[Lp]m, [Lr ]m)of theform


M(v,w) = D1w +D2(F ′(z, u)(v,w)) ∀ (v,w) ∈ Z × Lp(Ω)m, (5.16)

whereDi ∈ L∞(Ω)m×m andD = (D1|D2) satisfies

D(ω) ∈ ∂Cπ(u(ω), F (z, u)(ω)

), ω ∈ Ω. (5.17)

Proof. Considerthe ith componentΠi(z, u) = πi(u,F (z, u)

)of Π. Obviously,

Assumption5.10 implies Assumption3.27 with Y = Z × Lp(Ω)m, G(z, u) =(u,F (z, u)), ri = r, i = 1, . . . , 2m, qi = p, i = 1, . . . ,m, qi = p′, i =m + 1, . . . , 2m, andψ = πi. Therefore,by Proposition3.31 andTheorem3.44,theoperatorΠi : Z×Lp(Ω)m → Lr(Ω) is locally Lipschitzcontinuousand∂Πi-semismooth.Hence,wecanapplyProposition3.5to concludethat

Π : Z × Lp(Ω)m → Lr(Ω)m

is ∂CΠ-semismooth,where∂CΠ = ∂Π1 × · · · × ∂Πm. From the definition ofthe C-subdifferential it is clearthat∂CΠ(z, u) canbe characterizedby (5.16)and(5.17). utWealsocanprovesemismoothnessof higherorder:

Assumption 5.12. As Assumption5.12,but with (a), (d) replacedby:Thereexistsα ∈ (0, 1] suchthat

(a) F : Z×Lp(Ω)m → Lr(Ω)m is continuouslyFrechetdifferentiablewith locallyα-Holdercontinuousderivative.

(d) π isα-ordersemismooth.

UnderthesestrengthenedassumptionswecanuseTheorem3.45to prove:

Theorem5.13. Under the Assumption5.12 the assertionsof Theorem 5.11 holdtrue and,in addition,theoperatorΠ is β-order∂CΠ-semismooth,whereβ canbedeterminedasin Theorem3.45.

The establishedsemismoothnessresultsallow to solve problem(5.9) by applyingthe semismoothNewton methodsof section3.2.3to the reformulation(5.11).Theresultingmethodsareof thesameform asAlgorithm 3.57for NCPs,only Φ hastobereplacedbyΠ andall Lp-spacesarenowm-dimensional.Smoothingstepscanbeobtainedasdescribedin section4.1.An appropriateregularity conditionis obtainedby requiringthatall Mk areelementsof L([Lr]m, [Lr]m) with uniformly boundedinverses.

In section4.2 we describeda situationwhere,throughanappropriatechoiceoftheMCP-function,thesmoothingstepcanbeavoided.This approachcanbegener-alizedto thecurrentsituation:

Assumption 5.14. TheoperatorF hastheformF (z, u) = λu+G(z, u) with λ > 0andthereexist 1 ≤ r < p′ ≤ ∞ suchthat

(a) G : Z × Lr(Ω)m → Lr(Ω)m is continuouslyFrechetdifferentiable.


(b) (z, u) ∈ Z × Lr(Ω)m 7→ G(z, u) ∈ Lp′(Ω)m is locally Lipschitzcontinuous.

(c) Thefunctionπ is definedby π(x1, x2) = x1 − PC(x1 − λ−1x2), wherePC istheprojectiononC.

(d) TheprojectionPC is semismooth.

Undertheseassumptionswecanprove:

Theorem5.15. Let theAssumption5.14hold.Then,wehave

Π(z, u)(ω) = u(ω)− PC(−λ−1G(z, u)(ω)

),

andΠ : Z × Lr(Ω)m → Lr(Ω)m is ∂CΠ-semismooth.Hereby, ∂CΠ(z, u) is thesetof all M ∈ L(Z × Lr(Ω)m, Lr(Ω)m) of theform

M =(λ−1DGz(z, u)|I + λ−1DGu(z, u)

), (5.18)

withD ∈ L∞(Ω)m×m,D(ω) ∈ ∂CPC(−λ−1G(z, u)(ω)

)onΩ.

Proof. WesetT (z, u) = −λ−1G(z, u), ψ(x) = PC(x). Then

T : Z × Lr(Ω)m → Lr(Ω)m

is continuouslydifferentiableandmapslocally LipschitzcontinuousintoLp′(Ω)m.

Further, ψ is Lipschitz continuousandsemismooth.Therefore,we canapply The-orem3.44componentwise(with Y = Z × Lr(Ω)m, ri = r, qi = p′) andobtainthat Ψi : (z, u) ∈ Z × Lr(Ω)m 7→ ψi(T (z, u)) ∈ Lr(Ω) is ∂Ψi-semismooth.Therefore,by Proposition3.5,weseethat

Ψ : Z × Lr(Ω)m → Lr(Ω)m

is ∂CΨ -semismooth.Now, usingthe (0|I)-semismoothnessof (z, u) 7→ u andthesumrule for semismoothoperators,Proposition3.4,weseethat

Π : Z × Lr(Ω)m → Lr(Ω)m

is ∂CΠ-semismoothwith ∂CΠ = (0|I)− ∂CΨ . It is straightforwardto seethattheelementsof ∂Π arecharacterizedby (5.18). utThesituationtypically arisingin practiceis r = 2. Underthe(reasonable)regularityrequirementMk ∈ L([Lr]m, ([Lr]m) with uniformly boundedinverses,superlinearconvergenceof thesemismoothNewtonmethodcanbeestablishedasfor thecaseofbound-constraints,seesection4.2.

Finally, wegiveanexamplehow afunctionπ andits differentialcanbeobtainedin aconcretesituation.

Example5.16. Modelsfor theflow of Binghamfluids [62, 63] involve VIPs of theform (5.14),where

C = x : ‖x‖2 ≤ 1.


Wenow deriveexplicit formulasfor πE(x1, x2) = x1−PC(x1− x2) andits differ-entials∂BπE , ∂πE , and∂CπE . First,observe that

PC(x) =1

max1, ‖x‖2x,

is Lipschitz continuousandPC∞ on Rm. Further, PC is C∞ on x : ‖x‖2 6= 1with

P ′C(x) = I for ‖x‖2 < 1, P ′C(x) =1‖x‖2 I −

xxT

‖x‖32for ‖x‖2 > 1.

This shows thatπE is LipschitzcontinuousandPC∞ on Rm. Hence,πE is 1-ordersemismoothand

∂BπE(x1, x2) = (I − S|S) : S ∈MB,

∂πE(x1, x2) = (I − S|S) : S ∈M,∂Cπ

E(x1, x2) = (I − S|S) : S ∈MC,where,with w = x1 − x2,

MB = M = MC = I for ‖w‖2 < 1,

MB = M = MC =

1‖w‖2 I −

wwT

‖w‖32

for ‖w‖2 > 1,

MB = I, I − wwT , M = I − twwT : 0 ≤ t ≤ 1,MC = I − diag(t1, . . . , tm)wwT : 0 ≤ t1, . . . , tm ≤ 1

for ‖w‖2 = 1.

5.2 Mixed Problems

Sofar we haveconsideredvariationalinequalitiesin anLp-setting.Often,theprob-lemto solveis notgivenin thisparticularform, becausetheoriginalproblemformu-lationcontainsadditionalunknowns(e.g.,thestate)andadditionaloperatorequalityconstraints(e.g., the stateequation).In the caseof control problemswith uniquecontrol-to-statemappingu 7→ y(u) (inducedby thestateequation)wedemonstratedhow, byusingthedependencey = y(u), areducedproblemcanbeobtainedthatonlydependsonthecontrol.Thisreductionmethodis calledblack-boxapproach. Havingthe advantageof reducingthe problemdimension,the black-boxapproachnever-thelesssuffersfrom severaldisadvantages:Theevaluationof theobjective functionrequiresthe solutionof the (possiblynonlinear)stateequation.Further, the black-box approachis only viable if thestateequationadmitsa uniquesolutiony(u) foreverycontrolu.

Therefore,it canbe advantageousto employ the all-at-onceapproach, i.e., tosolve for u andy simultaneously. In the following we describehow the developedideascanbeextendedto theall-at-onceapproach.

5.2 MixedProblems 89

5.2.1 Karush–Kuhn–Tucker Systems

Considertheoptimizationproblem(with controlstructure)

minimize J(y, u) subjectto E(y, u) = 0 and u ∈ C. (5.19)

Hereby, let C ⊂ U be a nonemptyclosedconvex setandassumethat the operatorE : Y ×U →W ∗ andtheobjectivefunctionJ : Y ×U → R aretwicecontinuouslydifferentiable.Further, let the control spaceU and the statespaceY be BanachspacesandW a reflexiveBanachspacewith dualW ∗.

Now considera local solution(y, u) ∈ Y × U of (5.19)at which Robinson’sregularity condition[126] holds.Moreprecisely, thismeansthat

0 ∈ int(E′(y, u)(v,w), u+ w − u) : v ∈ Y, w ∈ U, u ∈ C ,

or, which turnsout to beequivalent,

0 ∈ int E′(y, u)(v, u− u) : v ∈ Y, u ∈ C . (5.20)

In particular, (5.20)is satisfiedif Ey(y, u) is onto,whichholdstruefor many controlproblems.

If the regularity condition(5.20)holdsat a local solution(y, u), thenthereex-ists a Lagrangemultiplier w ∈ W suchthat the triple (y, u, w) satisfiesthe KKT-conditions,cf., e.g.,[150]:

u ∈ C, 〈Ju(y, u) +Eu(y, u)∗w, v − u〉U∗,U ≥ 0 ∀ v ∈ C, (5.21)

Jy(y, u) +Ey(y, u)∗w = 0, (5.22)

E(y, u) = 0. (5.23)

Thissystemconsistsof avariationalinequality(parameterizedby z = (y, w)) of theform (5.14)with F (y, u,w) = Ju(y, u) +Eu(y, u)∗w (exceptthatthespaceU andthe convex setC arenot yet specified)andtwo operatorequations.For convenientnotation,we introducetheLagrangefunction

L : Y × U ×W → R, L(y, u,w) = J(y, u) + 〈w,E(y, u)〉W ∗,W .

Then the operatorsappearingin (5.21)–(5.23)are Lu(y, u, w), Ly(y, u, w), andLw(y, u, w), respectively. Therefore,we canwrite (5.21)–(5.23)in theform

u ∈ C, 〈Lu(y, u, w), v − u〉U∗,U ≥ 0 ∀ v ∈ C, (5.24)

Ly(y, u, w) = 0, (5.25)

E(y, u) = 0. (5.26)

Our aim is to reformulatethevariationalinequalityasanequivalentnonsmoothop-eratorequation.To this end,we considerU = Lp(Ω)m, p ∈ (1,∞], Ω boundedwith µ(Ω) > 0, andassumethat C hasappropriatestructure.In the following weanalyzethecasewhereC is describedby pointwiseconvex constraintsof the form


(5.8)andassumethatacontinuousfunctionπ : Rm × Rm → Rm with theproperty(5.10) is available.Note that this problemclassincludesthe NCP andthe bound-constrainedVIP in normalform asspecialcases.Accordingto Proposition5.8,wecanreformulate(5.24)asΠ(y, u, w) = 0, where

Π(y, u,w)(ω) = π(u(ω), Lu(y, u,w)(ω)

), ω ∈ Ω,

andthus(y, u, w) is aKKT-triple if andonly if it is asolutionto thesystem

Σ(y, u,w) def=

Ly(y, u,w)Π(y, u,w)E(y, u)

= 0. (5.27)

Wecontinueby consideringtwo approaches,parallelto thesituationsin Assumption5.10andAssumption5.14,respectively.

Thefirst approachrequiresthefollowing hypotheses:

Assumption 5.17. Thereexist 1 ≤ r < minp, p′ ≤ ∞ suchthat

(a) E : Y × Lp(Ω)m → W ∗ andJ : Y × Lp(Ω)m → R aretwice continuouslydifferentiable.

(b) The operator(y, u,w) ∈ Y × Lp(Ω)m × W 7→ Lu(y, u,w) ∈ Lr(Ω)m iswell-definedandcontinuouslydifferentiable.

(c) The operator(y, u,w) ∈ Y × Lp(Ω)m × W 7→ Lu(y, u,w) ∈ Lp′(Ω)m is

well-definedandlocally Lipschitzcontinuous.

(d) π is Lipschitzcontinuousandsemismooth.

Remark 5.18. Variantsof Assumption5.17arepossible.

Weobtain:

Theorem5.19. LettheAssumption5.17hold.ThentheoperatorΣ : Y ×Lp(Ω)m×W → Y ∗ × Lr(Ω)m × W ∗ definedin (5.27) is locally Lipschitz continuousand∂CΣ-semismoothwith ∂CΣ = L′y × ∂CΠ × E′. More precisely, ∂CΣ(y, u,w) isthesetof all M ∈ L(Y × [Lp]m ×W,Y ∗ × Lr(Ω)m ×W ∗) of theform

M =

Lyy(y, u,w) Lyu(y, u,w) Ey(y, u)∗

D2Luy(y, u,w) D1I +D2Luu(y, u,w) D2Eu(y, u)∗

Ey(y, u) Eu(y, u) 0

, (5.28)

whereDi ∈ L∞(Ω)m×m, (D1|D2)(ω) ∈ ∂Cπ(u(ω), Lu(y, u,w)(ω)

).

Proof. We set Z = Y × W and F (y,w, u) = Lu(y, u,w). Assumption5.17then implies Assumption5.10, and thus Π is locally Lipschitz continuousand∂CΠ-semismoothby Theorem5.11.Fromthedifferentiability requirementsin As-sumption5.17 we obtain the local Lipschitz continuity and, by Proposition3.3,theL′y- andE′-semismoothnessof the secondandthird componentof Σ, respec-tively. Proposition3.5 now yields the local Lipschitz continuity and the ∂CΣ-semismoothnessof Σ for ∂CΣ = L′y × ∂CΠ × E′. Theelementsof ∂CΣ(y, u,w)areeasilyseento begivenby (5.28). ut


In Example5.23,weapplyTheorem5.19to acontrolproblem.A secondapproachfor establishingthesemismoothnessof Π relieson the fol-

lowing hypotheses:

Assumption 5.20. Thereexist 1 ≤ r < p′ ≤ ∞ suchthat:

(i) E : Y × Lr(Ω)m → W ∗ andJ : Y × Lr(Ω)m → R aretwice continuouslydifferentiable.

(ii) Lu hastheformLu(y, u,w) = λu+G(y, u,w) with λ > 0 and:

(a) G : Y × Lr(Ω)m ×W → Lr(Ω)m is continuouslyFrechetdifferentiable.

(b) Theoperator(y, u,w) ∈ Y × Lr(Ω)m ×W 7→ G(y, u,w) ∈ Lp′(Ω)m islocally Lipschitzcontinuous.

(iii) Thefunctionπ is definedby π(x1, x2) = x1 − PC(x1 − λ−1x2) andthepro-jectionPC onC is semismooth.

Theorem5.21. Let theAssumption5.20hold.Thenwehave

Π(y, u,w)(ω) = u(ω)− PC(−λ−1G(y, u,w)(ω)

),

andΣ : Y ×Lr(Ω)m×W → Y ∗×Lr(Ω)m×W ∗ is locally Lipschitzcontinuousand∂CΣ-semismooth.Hereby,∂CΣ(y, u,w) is thesetof all M ∈ L(Y ×Lr(Ω)m×W,Y ∗ × Lr(Ω)m ×W ∗) of theform

M =

Lyy(y, u,w) Lyu(y, u,w) Ey(y, u)∗

λ−1DGy(y, u,w) I + λ−1DGu(y, u,w) λ−1DGw(y, u,w)Ey(y, u) Eu(y, u) 0

(5.29)

with

D ∈ L∞(Ω)m×m, D(ω) ∈ ∂CPC(−λ−1G(y, u,w)(ω)

)on Ω. (5.30)

Proof. Assumption5.20impliesAssumption5.14forZ = Y ×W andF (y,w, u) =Lu(y, u,w). Theorem5.15is applicableandyieldsthelocalLipschitzcontinuityand∂CΠ-semismoothnessof Π : Y ×Lr(Ω)m×W → Lr(Ω)m, where∂CΠ(y, u,w)is thesetof all MΠ ∈ L(Y × Lr(Ω)m ×W,Lr(Ω)m) of theform

MΠ =(λ−1DGy(y, u,w)|I + λ−1DGu(y, u,w)|λ−1DGw(y, u,w)

),

whereD is asin theTheorem.FromAssumption5.20andProposition3.3follow thelocal Lipschitzcontinuityaswell astheL′y- andE′-semismoothnessof thesecondandthird componentof Σ, respectively. Therefore,theoperator

Σ : Y × Lr(Ω)m ×W → Y ∗ × Lr(Ω)m ×W ∗

is locally Lipschitz continuousand, by Proposition3.5, ∂CΣ-semismoothwith∂CΣ = L′y × ∂Π × E′. It is straightforward to verify that the elementsof∂CΣ(y, u,w) areexactly theoperatorsM in (5.29). ut


Remark 5.22. If PC is α-ordersemismooth,it is easyto modify Assumption5.20andTheorem5.21suchthathigherordersemismoothnessof Π canbeestablished.

The following exampleillustrateshow Theorem5.19 and 5.21 can be appliedinpractice.

Example5.23. LetΩ ⊂ Rn bea boundedLipschitzdomainandconsiderthecon-trol problem

minimizey∈H1

0 (Ω),u∈L2(Ω)

12

∫Ω

(y(x)− yd(x))2dx+λ

2

∫Ω

u(x)2dx

subjectto −∆y = f + gu onΩ β1 ≤ u ≤ β2 onΩ.(5.31)

This is a problemof theform (5.19)with U = L2(Ω), Y = H10 (Ω),W = H1

0 (Ω),W ∗ = H−1(Ω),C = [β1, β2], C definedin (5.8),and

J(y, u) =12

∫Ω


2

∫Ω

u(x)2dx,

E(y, u) = −∆y − f − gu.

We assume−∞ < β1 < β2 < +∞, yd ∈ L2(Ω), λ > 0, f ∈ H−1(Ω), andg ∈ L∞(Ω). Observe that

(a) J is strictly convex,

(b) (y, u) : −∆y = f + gu, u ∈ [β1, β2] ⊂ H10 (Ω)× L2(Ω) is closed,convex,

andbounded.

In (b) we have usedthat−∆ ∈ L(H10 ,H

−1) is a homeomorphism.Hence,by astandardresult[49, Prop.II.1.2], thereexistsa uniquesolution(y, u) ∈ H1

0 (Ω) ×L2(Ω) to theproblem.

SinceC ⊂ max|β1|, |β2|BL∞ , we have u ∈ Lp(Ω) for all p ∈ [1,∞]. Hence,insteadof considering(5.31)asaproblemposedinH1

0 (Ω)×L2(Ω) wecanequallywell treatit in Y × U = H1

0 (Ω)× Lp(Ω), with arbitraryp ∈ [2,∞], whichwewilldo in thefollowing.

Thecontinuousinvertibility of Ey(y, u) = −∆ ∈ L(H10 ,H

−1) guaranteesthatRobinson’sregularitycondition(5.20)is satisfied,sothatthesolution(y, u) is char-acterizedby (5.24)–(5.26),wherew ∈ W = H1

0 (Ω) is the Lagrangemultiplier.Usingintegrationby parts,wehavefor y,w ∈ H1

0 (Ω)

〈−∆y,w〉H−1,H10

=∫Ω

∇y(x) · ∇w(x)dx = 〈−∆w, y〉H−1,H10.

Hence,

L(y, u,w) = J(y, u) + 〈−∆w, y〉H−1,H10− (f + gu,w)L2 .

Therefore,


Ly(y, u,w) = y − yd −∆w,Lu(y, u,w) = λu− gw,

and(5.24)–(5.26)aresatisfiedby thetriple (y, u, w) if andonly if it solvesthesystem

u ∈ Lp(Ω), u ∈ C, (λu− gw, v − u)L2 ≥ 0 ∀ v ∈ Lp(Ω), v ∈ C, (5.32)

y − yd −∆w = 0 (5.33)

−∆y = f + gu. (5.34)

Now, let q be arbitrary with q ∈ (2,∞] if n = 1, q ∈ (2,∞) if n = 2, andq ∈ (2, 2n/(n − 2)] if n ≥ 3. ThenthecontinuousembeddingH1

0 (Ω) → Lq(Ω)impliesthattheoperator

(y, u,w) ∈ Y × Lp(Ω)×W → Lu(y, u,w) = λu− gw ∈ Lq(Ω)

is continuouslinearandthusC∞ for all p ≥ q.It is now straightforward to seethat Assumption5.17 (a)–(c) holds for any

p ∈ (2,∞], p′ ∈ (2,minp, q] with q > 2 asspecified,andany r ∈ [2, p′). Forπ we canchooseany Lipschitz continuousandsemismoothMCP-functionfor theinterval [β1, β2] to meetAssumption5.17(d). ThismakesTheorem5.19applicable.

Now we turn to the situation of Assumption5.20. Obviously, for r = 2and p′ = q, Assumptions5.20 (i), (ii) hold with G(y, u,w) = −gw. Further,PC(x) = maxβ1,minx, β2 is 1-ordersemismooth,so that also Assumption5.20(iii) holds.Hence,Theorem5.21is applicable.

Having establishedthe semismoothnessof the operatorΣ, we canapply the (pro-jected)semismoothNewton method(Algorithm 3.13 or 3.17) for the solution of(5.27).For thesuperlinearconvergenceresults,Theorem3.15and3.19,respectively,theregularity conditionof Assumption3.14or oneof its variants,Assumption3.20or 3.23,respectively, hasto be satisfied.Essentially, theseassumptionsrequiretheboundedinvertibility of someor all elementsof ∂CΣ, viewedasoperatorsbetweenappropriatespaces,nearthesolution.In thenext sectionwe establisha relationbe-tween∂CΣ andthe generalizeddifferentialof the reformulatedreducedproblem.This relation can then be usedto show that regularity conditionsfor the reducedproblemimply regularityof thefull problem(5.27).Further, wediscusshow smooth-ing stepscanbeconstructedfor thescenarioof Assumption5.17.As we will see,inthesettingof Assumption5.20nosmoothingstepis required.

5.2.2 Connectionsto the ReducedProblem

Weconsidertheproblem(5.19)and,in parallel,thereducedproblem

minimize j(u) subjectto u ∈ C, (5.35)

wherej(u) = J(y(u), u) andy(u) ∈ Y is suchthat


E(y(u), u) = 0. (5.36)

Weassumethaty(u) existsuniquelyfor all u in a neighborhoodV of C (thiscanberelaxed,seeRemark5.24)andthatEy(y(u), u) is continuouslyinvertible.Then,bythe implicit function theorem,the mappingu ∈ U 7→ y(u) ∈ Y is twice continu-ouslydifferentiable.

Theadjointrepresentationof thegradientj′(u) ∈ U∗ is givenby

j′(u) = Ju(y(u), u) +Eu(y(u), u)∗w(u),

wherew = w(u) ∈W solvestheadjointequation

Ey(y(u), u)∗w = −Jy(y(u), u), (5.37)

seeappendixA.1. In termsof theLagrangefunction

L(y, u,w) = J(y, u) + 〈w,E(y, u)〉W,W ∗

this canbewrittenasj′(u) = Lu(y(u), u,w(u)), (5.38)

wherew(u) satisfiesLy(y(u), u,w(u)) = 0. (5.39)

Any solutionu ∈ U of (5.35)satisfiesthefirst-ordernecessaryoptimalityconditionsfor (5.35):

u ∈ C, 〈j′(u), v − u〉U∗,U ≥ 0 ∀ v ∈ C. (5.40)

Now, settingy = y(u) andcombining(5.40)with (5.38),(5.39),and(5.36),we canwrite (5.40)equivalentlyas

u ∈ C, 〈Lu(y, u, w, v − u〉U∗,U ≥ 0 ∀ v ∈ C,Ly(y, u, w) = 0

E(y, u) = 0.

Theseareexactly theKKT-conditions(5.24)–(5.26)of problem(5.19).Therefore,if u ∈ U is acritical pointof (5.35),i.e. if u ∈ U satisfies(5.40),then

(y, u, w) = (y(u), u, w(u)) is a KKT-triple of (5.19),i.e., (y, u, w) satisfies(5.24)–(5.26).Conversely, if (y, u, w) is a KKT-triple of (5.19),thenthereholdsy = y(u),w = w(u), andu is acritical pointof (5.35).

Remark 5.24. We have assumedthaty(u) existsuniquelywith Ey(y(u), u) beingcontinuouslyinvertible for all u in a neighborhoodof C. This requirementcanberelaxed.In fact, let (y, u, w) bea KKT-triple of (5.19)andassumethatEy(y, u) iscontinuouslyinvertible.Then,by theimplicit functiontheoremthereexist neighbor-hoodsVU of u andVY of y anda uniquemappingu ∈ VU 7→ y(u) ∈ VY withy(u) = y andEy(y(u), u) = 0 for all u ∈ VU . Furthermore,y(u) is twice continu-ouslydifferentiable.Introducingj(u) = J(y(u), u), u ∈ VU , we seeasabove that(5.24)–(5.26)and(5.40)areequivalent.

Dueto this equivalenceof theoptimality systemsfor (5.19)and(5.35)we expecttofind closerelationsbetweenNewton methodsfor the solutionof (5.24)–(5.26)andthosefor thesolutionof (5.40).This is theobjectiveof thenext section.


5.2.3 RelationsbetweenFull and ReducedNewtonSystem

Wenow returnto problems(5.19)with U = Lp(Ω)m and

C = u ∈ Lp(Ω)m : u(ω) ∈ C, ω ∈ Ω,whereC ⊂ Rm is closedand convex. As in Remark5.24, let us supposethat(y, u, w) is a KKT-triple with continuouslyinvertibleoperatorEy(y, u) anddenoteby y(u) thelocally uniquecontrol-to-statemappingwith y(u) = y.

We considerthe reformulation(5.27) of (5.24)–(5.26)under the Assumption5.17.If weworkwith exactelementsM of thegeneralizeddifferential∂CΣ(y, u,w),thesemismoothNewton methodfor thesolutionof (5.27)requiresto solve systemsof theformMs = −Σ(y, u,w). Accordingto Theorem5.19,thesesystemsassumetheform Lyy Lyu E∗y ρ1

D2Luy D1I +D2Luu D2E∗u ρ2

Ey Eu 0 ρ3

, (5.41)

wherewe have omitted the arguments(y, u,w) and (y, u). By the Banachtheo-rem,Ey(y, u) is continuouslyinvertiblein a neighborhoodof (y, u) with uniformlyboundedinverse.Usingthis,we canperformthefollowing block elimination: Lyy Lyu E∗y ρ1


Ey Eu 0 ρ3

m (Row 1− LyyE−1

y × Row 3) 0 Lyu − LyyE−1y Eu E∗y ρ1 − LyyE−1

y ρ3


Ey Eu 0 ρ3

m (Row 2−D2LuyE

−1y × Row 3) 0 Lyu − LyyE−1

y Eu E∗y ρ1 − LyyE−1y ρ3

0 D1I +D2(Luu − LuyE−1y Eu) D2E

∗u ρ2 −D2LuyE

−1y ρ3

Ey Eu 0 ρ3

m (Row 2−D2E

∗u(E∗y )−1 × Row 1) 0 Lyu − LyyE−1

y Eu E∗y ρ1 − LyyE−1y ρ3

0 D1I +D2H 0 ρ′2Ey Eu 0 ρ3

,

where

H(y, u,w) = Luu − LuyE−1y Eu −E∗u(E∗y)−1Lyu

+E∗u(E∗y )−1LyyE

−1y Eu, (5.42)


ρ′2 = ρ2 −D2E∗u(E

∗y )−1ρ1 +D2(E∗u(E

∗y)−1Lyy − Luy)E−1

y ρ3.

TheoperatorH canbewritten in theform

H = T ∗(Lyy LyuLuy Luu

)T, T (y, u) =

(−E−1y EuI

).

Therefore,thecontinuousinvertibility of M is closelyrelatedto thecontinuousin-vertibility of theoperatorD1I +D2H.

We now considerthereducedobjective functionj(u) = J(y(u), u) in a neigh-borhoodof u. It is shown in appendixA.1 thattheHessianj′′(u) canberepresentedin theform

j′′(u) = T (y, u)∗(Lyy(y, u,w) Lyu(y, u,w)Luy(y, u,w) Luu(y, u,w)

)T (y, u),

T (y, u) =(−Ey(y, u)−1Eu(y, u)

I

),

wherey = y(u), andw = w(u) is the adjoint state,given by the adjoint equa-tion (5.37), which can also be written in the form (5.39). Therefore,we seethatj′′(u) = H(y(u), u,w(u)) and,hence,j′′(u) = H(y, u, w), sincey = y(u) andw = w(u). For (y, u,w) = (y(u), u,w(u)) we haveLu(y(u), u,w(u)) = j′(u) by(5.38).Hence,with D = (D1|D2),

D(ω) ∈ ∂Cπ(u(ω), Lu(y(u), u,w(u))(ω)

) ⇐⇒ D(ω) ∈ ∂Cπ(u(ω), j′(u)(ω)

).

Thus,by Theorems5.11and5.19,for any (y, u,w) = (y(u), u,w(u)) andall oper-atorsM of theform (5.28)theSchurcomplementsatisfies

MR = D1I +D2H(y(u), u,w(u)) ∈ ∂CΠR(u),

whereΠR(u)(ω) = π

(u(ω), j′(u)(ω)

).

For theapplicationof theclassof (projected)semismoothNewtonmethodsto prob-lem (5.27) we needthe invertibility of Mk ∈ ∂CΣ(yk, uk, wk) as operatorbe-tweenappropriatespaces.We alreadyobserved that for the reducedproblemit isappropriateto requiretheuniformly boundedinvertibility of MR

k ∈ ∂CΠR(uk) inL([Lr]m, [Lr ]m). In agreementwith this wenow require:

Assumption 5.25. At leastoneof thefollowing conditionsholds:

(a) TheoperatorsMk ∈ ∂CΣ(yk, uk, wk) arecontinuouslyinvertibleelementsofL(Y × [Lr]m×W,Y ∗× [Lr ]m×W ∗) with thenormsof their inversesboundedby aconstantCM−1 .

(b) Thereexist constantsη > 0 andCM−1 > 0 suchthat, for all (y, u,w) ∈(y, u, w) + ηBY×[Lp]m×W , everyM ∈ ∂CΣ(yk, uk, wk) is an invertible el-ementof L(Y × [Lr]m ×W,Y ∗ × [Lr]m ×W ∗) with thenormof its inverseboundedbyCM−1 .


Thisassumptioncorrespondsto Assumption3.11(i) with Y0 = Y × [Lr]m ×W .Under Assumptions5.17, 5.25 and 3.11 (ii) (ensuringthe availability of a

smoothingstep),we can apply Algorithm 3.9 or its projectedversion,Algorithm3.17, (with, Bk = Mk and, e.g.,K = C) for f = Σ, ∂∗f = ∂CΣ, Y =Y × [Lp]m×W ,Z = Y ∗× [Lr]m×W ∗, andY0 = Y × [Lr ]m×W . TheTheorems3.12and3.19thenguaranteesuperlinearconvergencesince,by Theorem5.19,Σ is∂CΣ-semismooth.In section5.2.4wewill proposeawayof constructingsmoothingsteps.

In thesameway, we canconsiderreformulationsarisingundertheAssumption5.20.In this casewe have

Lu(y, u,w) = λu+G(y, u,w), π(x) = x1 − PC(x1 − λ−1x2).

Further, for all M ∈ ∂CΣ(y, u,w), there exists D ∈ L∞(Ω)m×m with D ∈∂CPC(−λ−1G(y, u,w)) suchthat

M =

Lyy Lyu E∗yλ−1DGy I + λ−1DGu λ−1DGwEy Eu 0

=

Lyy Lyu E∗yλ−1DLuy I + λ−1D(Luu − λI) λ−1DE∗u

Ey Eu 0

=

Lyy Lyu E∗yD2Luy D1I +D2Luu D2E

∗u

Ey Eu 0

,

withD1 = I−D andD2 = λ−1D. Notethat(D1,D2) ∈ ∂Cπ(u,Lu(y, u,w)) and,hence,for thesechoicesof D1 andD2, the operatorM assumesthe form (5.28).Thus,we canapply the sametransformationsto the Newton systemasbeforeandobtainagainthatfor (y, u,w) = (y(u), u,w(u)) thegeneralizeddifferentialsof thereducedsemismoothreformulationappearasSchurcomplementof thefull system.As regularity conditionwechoose:

Assumption 5.26. At leastoneof thefollowing conditionsholds:

(a) TheoperatorsMk ∈ ∂CΣ(yk, uk, wk) arecontinuouslyinvertibleelementsofL(Y ×[Lr ]m×W,Y ∗×[Lr]m×W ∗) with thenormsof their inversesuniformlyboundedby aconstantCM−1 .

(b) Thereexist constantsη > 0 andCM−1 > 0 suchthat, for all (y, u,w) ∈(y, u, w) + ηBY×[Lr ]m×W , everyM ∈ ∂CΣ(yk, uk, wk) is an invertible el-ementof L(Y × [Lr]m ×W,Y ∗ × [Lr]m ×W ∗) with thenormof its inverseboundedbyCM−1 .

Thisassumptioncorrespondsto Assumption3.11(i) with Y0 = Y = Y ×[Lr]m×W .Now, underAssumptions5.20and5.26,wecanapplyAlgorithm 3.9or its projectedversion,Algorithm 3.17,for f = Σ, ∂∗f = ∂CΣ, Y = Y0 = Y × [Lr]m ×W ,


andZ = Y ∗ × [Lr]m × W ∗. SinceY0 = Y , we do not needa smoothingstep.Theorems3.12and3.19establishsuperlinearconvergencesince,by Theorem5.21,Σ is ∂CΣ-semismooth.

5.2.4 SmoothingSteps

In additionto Assumption5.17,we require:

Assumption 5.27. ThederivativeLu hastheformLu(y, u,w) = λu+G(y, u,w),with

(y, u,w) ∈ Y × Lr(Ω)m ×W 7→ G(y, u,w) ∈ Lp(Ω)m

beinglocally Lipschitzcontinuous.

Example5.28. Weverify thisassumptionfor thecontrolproblemof Example5.23.There,wehadY = W = H1

0 , U = Lp with p ≥ 2 arbitrary, and

Lu(y, u,w) = λu− gw = λu+G(y, u,w) with G(y, u,w) = −gw.

Sinceg ∈ L∞ andw ∈ H10 → Lq for all q ∈ [1,∞] if n = 1, all q ∈ [1,∞) if

n = 2, andall q ∈ [1, 2n/(n − 2)] if n ≥ 3, we seethatG mapsLr, with r ≥ 2arbitrary, linearandcontinuoustoLq. Thus,Assumption5.27holdsfor all p ∈ (2, q].

Wecanshow:

Theorem5.29. Let theAssumptions5.17and5.27hold.Thentheoperator

S : Y × Lr(Ω)m ×W 7→ Y × Lp(Ω)m ×W,

S(y, u,w) =

yPC(u− λ−1Lu(y, u,w))

w

,

definesa smoothingstep.

Proof. Wefirst notethat

x1 = PC(x1 − λ−1x2) ⇐⇒ x1 = PC(x1 − x2) ⇐⇒ π(x) = 0,

sothatu = PC

(u− λ−1Lu(y, u,w)

) ⇐⇒ Π(y, u,w) = 0.

Hence,for any solution(y, u, w) of (5.27),we have

S(y, u, w) = (y, u, w).

Furthermore,asin section4.1,pointwiseonΩ holds


‖PC(u− λ−1Lu(y, u,w)

) − u‖2= ‖PC


)− PC(u− λ−1Lu(y, u, w))‖2

= ‖PC(−λ−1G(y, u,w)

)− PC(−λ−1G(y, u, w))‖2

≤ λ−1‖G(y, u,w) −G(y, u, w)‖2,

andthus,with CG denotingthelocal Lipschitzconstantof G near(y, u, w),

‖PC(u− λ−1Lu(y, u,w)

)− u‖[Lp]m

≤ CGcλ−1‖(y, u,w) − (y, u, w)‖Y×[Lr ]m×W ,

wherec dependsonm only. Theproof is complete,since

‖S(y, u,w)− (y, u, w)‖Y×[Lp]m×W≤ c(‖(y,w) − (y, w)‖Y×W + ‖PC


) − u‖[Lp]m).

ut

5.2.5 Regularity Conditions

We alreadyobserved that the all-at-onceNewton systemis closely relatedto theblack-boxNewton system.In this sectionwe show how theregularity of theall-at-onceNewton systemcanbe reducedto regularity conditionson its Schurcomple-ment.Since,for (y, u,w) = (y(u), u,w(u)), this Schurcomplementcoincideswiththeoperatorof theblack-boxNewtonsystem,sufficientconditionsfor regularitycanthenbe developedalongthe lines of section4.3. In the following, we restrictourinvestigationsto thesituationof Assumptions5.20and5.26.

Our hypothesison theSchurcomplementis:

Assumption 5.30. Thereexist constantsη > 0 andCRM−1 > 0 suchthat, for all(y, u,w) ∈ (y, u, w) + ηBY×[Lr]m×W holds:

(i) Ey(y, u,w) ∈ L(Y × [Lr]m×W,Y ∗× [Lr]m×W ∗) is continuouslyinvertiblewith uniformly boundedinverse.

(ii) For all D satisfying(5.30),theSchurcomplement

D1 +D2H,

with D1 = I − D, D2 = λ−1D, andH asdefinedin (5.42), is an invertibleelementof L([Lr]m, [Lr]m) with ‖M−1‖[Lr]m,[Lr]m ≤ CRM−1 .

Theorem5.31. Let theAssumptions5.20and5.30hold.Thentheregularity condi-tion of Assumption5.26(b) holds.

Proof. Let (y, u,w) ∈ (y, u, w) + ηBY×[Lr ]m×W andM ∈ ∂CΣ(y, u,w) be ar-bitrary. ThenthereexistsD satisfying(5.30)suchthatM assumestheform (5.29).


Now considerany ρ = (ρ1, ρ2, ρ3)T ∈ Y ∗×[Lr]m×W ∗. Then,accordingto section5.2.3,solvingthesystem

M(sy, su, sw)T = ρ

is equivalentto

(D1I +D2H)su = ρ2 −D2E∗u(E

∗y)−1ρ1

+D2(E∗u(E∗y)−1Lyy − Luy)E−1

y ρ3, (5.43)

Eysy = ρ3 −Eusu, (5.44)

E∗ysw = ρ1 − LyyE−1y ρ3 − (Lyu − LyyE−1

y Eu)su. (5.45)

Theassumptionsensuretwicecontinuousdifferentiabilityof L anduniformly boun-dedinvertibility of Ey andD1 + D2H. Furthermore,D andthusD1, D2 areuni-formly boundedin [L∞]m×m dueto theLipschitzcontinuityof PC . Thisand(5.43)–(5.45)show that,possiblyaftershrinkingη, thereexistsCM−1 > 0 suchthat

‖s‖Y×[Lr ]m×W ≤ CM−1‖s‖Y ∗×[Lr]m×W ∗ ,

holdsuniformly on (y, u, w) + ηBY×[Lr]m×W . ut

6. Trust-Region Globalization

Sofar, we have concentratedon locally convergentNewton-typemethods.We nowproposea classof trust-region algorithmswhich are globally convergentand use(projected)Newton stepsascandidatesfor trial steps.Hereby, we restrictourselftothecasewheretheproblemis posedin Hilbert space,which, from a practicalpointof view, is notvery restrictive.

To motivateour approach,weconsider(1.1)with U = L2(Ω) andcontinuouslydifferentiablefunctionF : U → U . UsinganMCP/NCP-functionφ, wereformulatetheproblemin theform

Φ(u) = 0. (6.1)

Let theAssumption5.1holdwith r = 2 andsomep, p′ ∈ (2,∞]. ThentheoperatorΦ : Lp(Ω) → L2(Ω) is semismoothby Theorem5.4. Alternatively, if F assumestheformF (u) = λu+G(u) andG hasthesmoothingpropertyof section4.2,andifΦ(u) = u−PB(u−λ−1G(u)) is chosen,thenby Theorem4.4,Φ : L2(Ω)→ L2(Ω)is is locally Lipschitzcontinuousandsemismooth.

For globalization,we needa minimizationproblemwhosesolutionsor criticalpoints correspondto solutionsof (6.1). We proposethreedifferentapproachestoobtaintheseminimizationreformulations:

Most naturally, wecanchoosethesquaredresidual

h(u) =12‖Φ(u)‖2L2

asobjective function.In fact,any globalsolutionof h is a solutionto Φ(u) = 0 andviceversa.Therefore,(6.1) is equivalentto theminimizationproblem

minimizeu∈L2(Ω)

h(u). (6.2)

We will show that, for appropriatechoicesof φ, the functionh(u) = ‖Φ(u)‖2L2/2is continuouslydifferentiable.This makes(6.2) aC1 problemposedin theHilbertspaceL2(Ω).

As wasdiscussedin the context of the projectedsemismoothNewton method(Algorithm 3.17),it is oftendesirablethat thealgorithmstaysfeasiblewith respectto a givenclosedconvex setK ⊂ Lp(Ω) which containsthesolutionu ∈ Lp(Ω).UsuallyK = B is chosen.Weconsidersetsof thegeneralformK = aK ≤ u ≤ bKwith lower and upperboundfunctionssatisfyingthe conditions(3.46). Then theconstrainedminimizationproblem

102 6. Trust-Region Globalization

minimizeu∈L2(Ω)

h(u) subjectto u ∈ K (6.3)

is equivalentto (6.1)in thesensethatany globalsolutionu ∈ K of (6.3)solves(6.1)andviceversa.

Finally, we cometo a third possibility of globalization,which can be usediftheVIP is obtainedfrom thefirst-ordernecessaryoptimality conditionsof thecon-strainedminimizationproblem

minimize j(u) subjectto u ∈ B (6.4)

with B = u ∈ L2(Ω) : a ≤ u ≤ b asin (1.1).Thenwe canusetheproblem(6.4)itself for thepurposeof globalization.

In all threeapproaches,(6.2),(6.3),and(6.4),weobtainaminimizationproblemof theform

minimizeu∈L2(Ω)

f(u) subjectto u ∈ K. (6.5)

For thedevelopmentandanalysisof thetrust-regionmethod,ratherthanworkingin L2, wepreferto chooseageneralHilbert spacesetting.Thishastheadvantageofcoveringalsothefinite-dimensionalcase,andmany othersituations,e.g.,therefor-mulationof mixedproblems,seesection5.2.Therefore,in thefollowing weconsidertheproblem

minimizeu∈U

f(u) subjectto u ∈ K, (6.6)

wheref : U → R is a continuouslydifferentiablefunction that is definedon theHilbert spaceU . The feasiblesetK ⊂ U is assumedto be nonempty, closed,andconvex. In particular, thereexistsauniquemetricprojection

PK : U → K, PK(u) = argminv∈K

‖v − u‖U .

Weidentify thedualU∗ of U with U , i.e.,we use〈·, ·〉U∗,U = (·, ·)UOurideais to useprojectedsemismoothNewtonstepsastrial stepcandidatesfor

atrust-regionglobalizationbasedon(6.6).In general,thepresenceof thesmoothingstepin thesemismoothNewton methodmakesit difficult to prove rigorouslytran-sition to fastlocal convergence.Therearewaysto do this, but theapproachwouldbehighly technical,andthuswe will prove transitionto fastlocal convergenceonlyfor thecasewherethesemismoothNewtonmethodconvergessuperlinearlywithouta smoothingstep.This is justifiedfor two reasons:As we will seein our numericaltests,experienceshows that we usuallyobserve fastconvergencewithout incorpo-rating a smoothingstepin the algorithm.One reasonfor this is that a discretiza-tion would have to be very fine to resolve functionsthat yield an excessively big‖ · ‖Lp/‖ · ‖L2-ratio. Second,in section4.2 we have developeda reformulationtowhich thesemismoothNewtonmethodis applicablewithouta smoothingstep.

For unconstrainedproblems,globalconvergenceusuallymeansthatthemethod“converges”to a critical point, i.e., a pointu ∈ U suchthatf ′(u) = 0 in thesensethat at least lim infk→∞ ‖f ′(uk)‖U = 0. In the constrainedcontext, we have toclarify whatwemeanby acritical point.

6. Trust-Region Globalization 103

Definition 6.1. We call u ∈ U a critical point of (6.6) if

u ∈ K and (f ′(u), v − u)U ≥ 0 ∀ v ∈ K. (6.7)

Thefollowing resultis important:

Lemma 6.2.

(i) Letu bea local solutionof (6.6); moreprecisely, u ∈ K andthereexistsδ > 0such thatf(v) ≥ f(u) for all v ∈ (u+ δBU ) ∩ K. Thenu is a critical point of(6.6).

(ii) Thefollowingstatementsareequivalent:

(a) u is a critical pointof (6.6).

(b) u− PK(u− f ′(u)) = 0.

(c) u− PK(u− tf ′(u)) = 0 for somet > 0.

(d) u− PK(u− tf ′(u)) = 0 for all t ≥ 0.

Proof. (seealso[66, §8]).(i): For any v ∈ K, thereholdsv(t) = u + t(y − u) ∈ (u + δBU ) ∩ K for

sufficiently smallt > 0 andthus

0 ≤ [f(v(t))− f(u)]/t→ (f ′(u), v − u)U as t→ 0+.

(ii): Let t > 0 bearbitrary. Condition(6.7) is equivalentto

u ∈ K, (u− (u− tf ′(u)), v − u)U ≥ 0 ∀ v ∈ K,

which is thesameasu = PK(u− tf ′(u)). Thisprovestheequivalenceof (a)–(d).ut

Next, we introducetheconceptof criticality measures.

Definition 6.3. A continuousfunctionχ : K → [0,∞) with theproperty

χ(u) = 0⇐⇒ u is acritical pointof problem(6.6) (6.8)

is calledcriticality measure for (6.6).

Example6.4. By Lemma6.2,for any t > 0, thefunction

χP,t(u) = ‖u− PK(u− tf ′(u))‖Uis acriticality measurefor (6.6).For t = 1, theresultingcriticality measure

χP (u) = χP,1(u) = ‖u− PK(u− f ′(u))‖Uis thenormof theprojectedgradient.


The algorithmthat we presentin this chapterusesideasdevelopedin the author’spaper[140] on trust-region methodsfor finite-dimensionalsemismoothequations.Othertrust-region approachesfor thesolutionof finite-dimensionalNCPsandVIPscanbefoundin, e.g.,[88, 93,119]. Trust-regionalgorithmsfor infinite-dimensionalconstrainedoptimization problemsare investigatedin, e.g., [96, 136, 144]. Themethodwe proposeallows for nonmonotonicityof thesequenceof generatedfunc-tion values.This hasprovenadvantageousto avoid convergenceto local, but non-globalsolutionsof theproblem[28, 65, 93, 137, 140].

Before we describethe trust-region algorithm, we show that for appropriatechoiceof φ the function h(u) = ‖Φ(u)‖2L2/2 is continuouslydifferentiable.Webegin with thefollowing result:

Lemma 6.5. Letψ : V → R belocally Lipschitzcontinuouson thenonemptyopensetV ⊂ Rm. Assumethatψ is continuouslydifferentiableonV \ ψ−1(0). Thenthefunctionψ2 is continuouslydifferentiableon V . Moreover, (ψ2)′(x) = 2ψ(x)g forall g ∈ ∂ψ(x) andall x ∈ V .

Thesimpleproof canbefoundin [140].

Lemma 6.6. Let ψ : Rm → R be Lipschitz continuouson Rm and continuouslydifferentiableon Rm \ ψ−1(0). Further, let G : U 7→ L2(Ω)m be continuouslydifferentiable. Thenthefunction

h : u ∈ U 7→ 12‖Ψ(u)‖2L2(Ω)m with Ψ(u)(ω) = ψ(G(u)(ω)), ω ∈ Ω,

is continuouslydifferentiablewith

h′(u) = M∗Ψ(u) ∀M ∈ ∂Ψ(u).

Remark 6.7. Notethat∂Ψ(u) ⊂ L(U,L2) by Lemma3.37.

Proof. Using Lemma6.5, η = ψ2/2 is continuouslydifferentiablewith η′(x) =ψ(x)g for all g ∈ ∂ψ(x). TheLipschitzcontinuityof ψ implies

‖η′(x)‖2 = |ψ(x)|‖g‖2 ≤ L(|ψ(0)|+ |ψ(x) − ψ(0)|) ≤ L|ψ(0)|+ L2‖x‖2.Hence,by PropositionA.10, thesuperpositionoperator

T : w ∈ L2(Ω)m 7→ η(w) ∈ L1(Ω)m

is continuouslydifferentiablewith derivative

(T ′(w)v)(ω) = η′(w(ω))v(ω) = ψ(w(ω))gT v(ω) ∀gT ∈ ∂ψ(w(ω)).

From this and the chain rule we seethatH : u ∈ U 7→ (G(u)) ∈ L1(Ω)m iscontinuouslydifferentiablewith

(H ′(u)v)(ω) = η′(G(u)(ω))(G′(u)v)(ω)

= ψ(G(u)(ω))gT (G′(u)v)(ω) ∀gT ∈ ∂ψ(G(u)(ω)).

6.1 TheTrust-Region Algorithm 105

Hence,H ′(u) = Ψ(u) ·M ∀M ∈ ∂Ψ(u).

Thus,we seethath : u ∈ U 7→ ∫ΩH(u)(ω)dω is continuouslydifferentiablewith

(h′(u), v)U =∫Ω

H ′(u)(ω)v(ω)dω =∫Ω

Ψ(u)(ω)(Mv)(ω)dω = (M∗Ψ(u), v)U

for all M ∈ ∂Ψ(u). utRemark 6.8. TheFischer–BurmeisterfunctionφFB meetsall requirementsof Lem-ma 6.6. Hence, if F : L2(Ω) → L2(Ω) is continuouslydifferentiable,thenh(u) = ‖Φ(u)‖2L2/2 with Φ(u) = φ

(u,F (u)

)is continuouslydifferentiable.The

sameholdstruefor theMCP-functionφFB[α,β] definedin (5.5).

6.1 The Trust-RegionAlgorithm

We usethe continuousdifferentiabilityof f to build anat leastfirst-orderaccuratequadraticmodel

qk(s) = (gk, s)U +12(s,Bks)U

of f(uk + s) − f(uk) at the current iterateuk, wheregkdef= f ′(uk) ∈ U is the

gradientof f at uk. The self-adjointoperatorBk ∈ L(U,U) canbe viewed asanapproximationof theHessianoperatorof f (if it exists).Westress,however, thattheproposedtrust-regionmethodis globallyconvergentfor verygeneralchoicesof Bk,includingBk = 0.

In eachiterationof thetrust-region algorithm,a trial stepsk is computedasap-proximatesolutionof the

Trust-Region Subproblem:

minimize qk(s) subjectto uk + s ∈ K, ‖s‖U ≤ ∆k. (6.9)

Wewill assumethatthetrial stepsmeetthefollowing two requirements:

Feasibility Condition:

uk + sk ∈ K and ‖sk‖U ≤ β1∆k, (6.10)

Reduction Condition:

predk(sk)def= − qk(sk) ≥ β2χ(uk) min∆k, χ(uk) (6.11)

with constantsβ1 ≥ 1 andβ2 > 0 independentof k. Hereby, χ is a suitablychosencriticality measure,seeDefinition 6.3.Usually, theupdateof thetrust-region radius∆k is controlledby theratio of actualreduction

aredk(s)def= f(uk)− f(uk + s)


andpredictedreductionpredk(s)def= − qk(s).

It hasbeenobserved [28, 65, 93, 137] that the performanceof nonlinearpro-grammingalgorithmscan be significantly improved by using nonmonotonelinesearch-or trust-region techniques.Hereby, in contrastto the traditionalapproach,the monotonicityf(uk+1) ≤ f(uk) of the function valuesis not enforcedin ev-ery iteration.To achieve this, we generalizea nonmonotonetrust-region techniquethatwasrecentlyintroducedby theauthor[140] in thecontext of finite-dimensionalsemismoothequations.For this algorithmall global convergenceresultsfor mono-tone,finite-dimensionaltrust-region methodsremainvalid. However, the decreaserequirementis significantlyrelaxed.Beforewe describethis approachandthecor-respondingreductionratio ρk(s) in detail,we first statethebasictrust-region algo-rithm.

Algorithm 6.9 (Trust-Region Algorithm).

1. Initialization: Chooseη1 ∈ (0, 1), ∆min ≥ 0, and a criticality measureχ.Chooseu0 ∈ K, ∆0 > 0 such that ∆0 ≥ ∆min, and a model HessianB0 ∈ L(U,U). Chooseanintegerm ≥ 1 andfix λ ∈ (0, 1/m] for thecompu-tationof ρk. Setk := 0 andi := −1.

2. Computeχk := χ(uk). If χk = 0, thenSTOP.

3. Computea trial stepsk satisfyingtheconditions(6.10)and(6.11).

4. Computethe reductionratio ρk := ρk(sk) by calling Algorithm 6.11 withmk := mini+ 1,m.

5. Computethenew trust-region radius∆k+1 by invokingAlgorithm 6.10.

6. If ρk ≤ η1, thenrejectthestepsk, i.e.,setuk+1 := uk,Bk+1 := Bk, incrementk by 1, andgo to Step3.

7. Acceptthestep:Setuk+1 := uk + sk andchooseanew modelHessianBk+1 ∈L(U,U). Setji+1 := k, incrementk andi by 1 andgo to Step2.

Theincreasingsequence(ji)i≥0, enumeratesall indicesof acceptedsteps.Moreover,

uk = uji ∀ ji−1 < k ≤ ji, ∀ i ≥ 1. (6.12)

Conversely, if k 6= ji for all i, thensk wasrejected.In thefollowing we denotethesetof all these“successful”indicesji by S:

S def= ji : i ≥ 0 = k : trial stepsk is accepted.Sometimes,acceptedstepswill alsobecalledsuccessful.Wewill repeatedlyusethefactthat

uk : k ≥ 0 = uk : k ∈ S.The trust-region updatesareimplementedasusual.We dealwith two differentfla-vors of updaterulessimultaneouslyby introducinga nonnegative parameter∆min.We requirethat after successfulsteps∆k+1 ≥ ∆min holds.If ∆min = 0 is chosen,this is automaticallysatisfied.For∆min > 0, however, it is anadditionalfeaturethatallows for specialproof techniques.

6.1 TheTrust-Region Algorithm 107

Algorithm 6.10 (Update of the Trust-Region Radius).

∆min ≥ 0 andη1 ∈ (0, 1) aretheconstantsdefinedin Step1 of Algorithm 6.9.

Let η1 < η2 < 1, and0 ≤ γ0 < γ1 < 1 < γ2 befixed.

1. If ρk ≤ η1, thenchoose

∆k+1 ∈ (γ0∆k, γ1∆k].2. If ρk ∈ (η1, η2), thenchoose

∆k+1 ∈ [γ1∆k,max∆min,∆k] ∩ [∆min,∞).3. If ρk ≥ η2, thenchoose

∆k+1 ∈ (∆k,max∆min, γ2∆k] ∩ [∆min,∞).

Westill haveto describehow thereductionratiosρk(s) aredefined.Hereisadetaileddescription:

Algorithm 6.11 (Computation of RelaxedReduction Ratio).

1. Choosescalars

λkr ≥ λ, r = 0, . . . ,mk − 1,mk−1∑r=0

λkr = 1.

2. Computetherelaxedactualreductionraredk := raredk(sk), where

raredk(s)def= max

f(uk),

mk−1∑r=0

λkrf(uji−r)

− f(uk + s). (6.13)

3. Computethereductionratioρk := ρk(sk) accordingto

ρk(s)def=

raredk(s)predk(s)

.

Remark 6.12. At the very beginning of Algorithm 6.9, Step4 invokesAlgorithm6.11with mk = 0. In this casethesumin (6.13)is emptyandthus

raredk(s) = maxf(uk), 0 − f(uk + s) = f(uk)− f(uk + s) = aredk(s).

Theideabehindtheaboveupdaterule is thefollowing: Insteadof requiringthatf(uk+sk) besmallerthanf(uk), it isonly requiredthatf(uk+sk) is eitherlessthanf(uk) or lessthantheweightedmeanof thefunctionvaluesatthelastmk = mini+1,m successfuliterates.Of course,if m = 1, thenraredk(s) = aredk(s) andtheusualreductionratio is recovered.Our approachis a slightly strongerrequirementthanthestraightforwardideato replacearedk with

rared∞k (s) = max0≤r<mk

f(uji−r)− f(uk + s).


Unfortunately, for this latter choiceit doesnot seemto be possibleto establishalltheglobalconvergenceresultsthatareavailablefor themonotonecase.For our ap-proach,however, this is possiblewithout makingthe theorysubstantiallymoredif-ficult. Moreover, we can approximaterared∞k arbitrarily accuratelyby raredk ifwe chooseλ sufficiently small, in eachiteration select0 ≤ rk < mk satisfyingf(uji−rk

) = max0≤r<mk

f(uji−r), andset

λkr = λ if r 6= rk, λkrk= 1− (mk − 1)λ. (6.14)

6.2 Global Convergence

For theglobalconvergenceanalysiswe rely on thefollowing

Assumption 6.13.

(i) Theobjectivefunctionf is continuouslydifferentiableonanopenneighborhoodof thenonemptyclosedconvex setK.

(ii) Thefunctionf is boundedbelow onK.

(iii) Thenormsof themodelHessiansareuniformly bounded:

‖Bk‖U,U ≤ CB for all k.

Throughoutthis section,Assumption6.13is requiredto hold.Wefirst proveanimportantdecreasepropertyof thefunctionvaluesf(uk).

Lemma 6.14. Let uk, sk, ∆k, ji, etc.,begeneratedby Algorithm6.9. Thenfor allcomputedindicesi ≥ 1 holds

f(uji) < f(u0)− η1λi−2∑r=0

predjr(sjr)− η1predji−1(sji−1) < f(u0). (6.15)

Proof. We will usetheshortnotationsaredk = aredk(sk), raredk = raredk(sk),andpredk = predk(sk). First, let usnotethat (6.11)impliespredk > 0 wheneveruk is notcritical. Therefore,thesecondinequalityholds.

Theproofof thefirst inequalityis by induction.For i = 1 wehaveby (6.12)andusingρj0(sj0) > η1

f(uj1) = f(uj0+1) = f(uj0)− aredj0 < f(uj0)− η1predj0 = f(u0)− η1predj0 .

Now assumethat(6.15)holdsfor 1, . . . , i.If raredji = aredji then,using(6.15)andλ ≤ 1,

f(uji+1) = f(uji+1) = f(uji)− aredji = f(uji)− raredji

< f(u0)− η1λi−2∑r=0

predjr − η1predji−1− η1predji

≤ f(u0)− η1λi−1∑r=0

predjr − η1predji .

6.2 GlobalConvergence 109

If raredji 6= aredji thenraredji > aredji , andwith q = mini,m − 1 weobtain

f(uji+1) = f(uji+1) =q∑p=0

λjipf(uji−p)− raredji

<

q∑p=0

λjip

(f(u0)− η1λ

i−p−2∑r=0

predjr − η1predji−p−1

)− η1predji .

Usingλji0 + · · · + λjiq = 1, λjip ≥ λ, and

0, . . . , q × 0, . . . , i − q − 2 ⊂ (p, r) : 0 ≤ p ≤ q, 0 ≤ r ≤ i− p− 2,

wecanproceed

f(uji+1) < f(u0)− η1λi−q−2∑r=0

( q∑p=0

λjip

)predjr

− η1λq∑p=0

predji−p−1− η1predji

≤ f(u0)− η1λi−q−2∑r=0

predjr − η1λi−1∑

r=i−q−1

predjr − η1predji

= f(u0)− η1λi−1∑r=0

predjr − η1predji .

utLemma 6.15. Let uk, sk, ∆k, etc.,be generatedby Algorithm 6.9. Thenfor arbi-trary u ∈ K with χ(u) 6= 0 and 0 < η < 1 there exist∆ > 0 and δ > 0 suchthat

ρk ≥ ηholdswhenever‖uk − u‖U ≤ δ and∆k ≤ ∆ aresatisfied.

Proof. Sinceχ(u) 6= 0, by continuitythereexist δ > 0 andε > 0 suchthatχ(uk) ≥ε for all k with ‖uk−u‖U ≤ δ. Now, for 0 < ∆ ≤ ε andany k with ‖uk−u‖U ≤ δand0 < ∆k ≤ ∆, weobtainfrom thedecreasecondition(6.11):

predk(sk) = −qk(sk) ≥ β2χ(uk) min∆k, χ(uk) ≥ β2ε∆k.

In particular, by (6.10)

‖sk‖U ≤ β1∆k ≤ β1

β2εpredk(sk). (6.16)


Further, with appropriateyk = uk + τksk, τk ∈ [0, 1], by the intermediatevaluetheorem

aredk(sk) = f(uk)− f(uk + sk) = −(f ′(yk), sk)U

= −qk(s) + (gk − f ′(yk), sk)U +12(sk, Bksk)U

≥ predk(sk)−(‖gk − f ′(yk)‖U +

12‖Bksk‖U

)‖sk‖U .

Sincef ′ is continuous,thereexistsδ′ > 0 suchthat

‖f ′(u′)− f ′(u)‖U ≤ (1− η)β2ε

4β1

for all u′ ∈ K with ‖u′ − u‖U < δ′. Further, since‖Bk‖U,U ≤ CB by Assumption6.13(iii), choosing∆ sufficiently smallyields

12‖Bksk‖U ≤ (1− η)β2ε

2β1.

for all k with ∆k ≤ ∆. By reducing∆ andδ, if necessary, suchthatδ + β1∆ < δ′

weachieve,using(6.10),thatfor all k with ‖uk − u‖U ≤ δ and0 < ∆k ≤ ∆

‖yk − u‖U ≤ ‖uk − u‖U + τk‖sk‖U ≤ δ + β1∆ < δ′, ‖uk − u‖U ≤ δ < δ′.

Hence,for all theseindicesk,

‖gk − f ′(yk)‖U ≤ ‖gk − f ′(u)‖U + ‖f ′(u)− f ′(yk)‖U ≤ (1− η)β2ε

2β1,

andthusby (6.16)(‖gk − f ′(yk)‖U +

12‖Bksk‖U

)‖sk‖U ≤ (1− η)β2ε

β1‖sk‖U

≤ (1− η)predk(sk).

This impliesthatfor all thesek thereholds

raredk(sk) ≥ aredk(sk) ≥ predk(sk)−(‖gk − f ′(yk)‖U +

12‖Bksk‖U

)‖sk‖U

≥ ηpredk(sk).

Theproof is complete. utLemma 6.16. Algorithm6.9eitherterminatesafter finitely manystepswith a criti-cal pointuk of (6.6)or generatesan infinitesequence(sji) of acceptedsteps.


Proof. Assumethat Algorithm 6.9 neitherterminatesnor generatesan infinite se-quence(sji) of acceptedsteps.Thenthereexists a smallestindex k0 suchthat allstepssk arerejectedfor k ≥ k0. In particular, uk = uk0 , k ≥ k0, andthesequenceof trust-region radii∆k tendsto zeroask →∞, because

∆k0+j ≤ γj1∆k0 .

Sincethealgorithmdoesnot terminate,we know thatχ(uk0) 6= 0. But now Lemma6.15 with u = uk0 yields that sk is acceptedassoonas∆k becomessufficientlysmall. This contradictsour assumption.Therefore,the assertionof the Lemmaistrue. utLemma 6.17. Assumethat Algorithm6.9generatesinfinitelymanysuccessfulstepssji andthat thereexistsS ′ ⊂ S with∑

k∈S′∆k =∞. (6.17)

Then lim infS′3k→∞

χ(uk) = 0.

Proof. Let the assumptionsof the lemma hold and assumethat the assertioniswrong. Thenthereexists ε > 0 suchthatχ(uk) ≥ ε for all k ∈ S ′ ⊂ S. From(6.17)follows thatS ′ is not finite. For all k ∈ S ′ holdsby (6.11)

predk(sk) ≥ β2χ(uk) min∆k, χ(uk) ≥ β2εmin∆k, ε .

Fromthisestimate,thefactthatf is boundedbelow onK, seeAssumption6.13(ii),andLemma6.14we obtainfor all j ∈ S ′, usingλ ≤ 1

f(u0)− f(uj) > η1λ∑k∈Sk<j

predk(sk) ≥ η1λ∑k∈S′k<j

predk(sk)

≥ η1λβ2ε∑k∈S′k<j

min∆k, ε → ∞ (asj →∞).

This is a contradiction.Therefore,the assumptionwas wrong and the lemmaisproved. ut

Wenow haveeverythingathandthatweneedto establishourfirst globalconver-genceresult.It is applicablein thecaseγ0 > 0,∆min > 0 andsaysthataccumulationpointsarecritical pointsof (6.6).

Theorem6.18. Letγ0 > 0 and∆min > 0. Assumethat Algorithm6.9doesnot ter-minateafter finitely manystepswith a critical pointuk of (6.6). Thenthealgorithmgeneratesinfinitely manyacceptedsteps(sji). Moreover, everyaccumulationpointof (uk) is a critical point of (6.6).


Proof. SupposethatAlgorithm 6.9doesnot terminateafterafinite numberof steps.Thenaccordingto Lemma6.16infinitely many successfulsteps(sji) aregenerated.Assumethat u is anaccumulationpoint of (uk) that is not a critical point of (6.6).Sinceχ(u) 6= 0, invokingLemma6.15with u = u yields∆ > 0 andδ > 0 suchthatk ∈ S holdsfor all k with ‖uk − u‖ ≤ δ and∆k ≤ ∆. Sinceu is anaccumulationpoint, thereexists an infinite increasingsequencej′i ∈ S, i ≥ 0, of indicessuch‖uj′i − u‖ ≤ δ anduj′i → u.

If (j′i − 1) ∈ S, then∆j′i ≥ ∆min. Otherwise,sj′i−1 wasrejected,which, sincethenuj′i−1 = uj′i , is only possibleif ∆j′i−1 > ∆, andtherefore∆j′i ≥ γ0∆j′i−1 >γ0∆. We concludethatfor all i holds∆j′i ≥ min∆min, γ0∆. Now Lemma6.17isapplicablewith S ′ = j′i : i ≥ 0 andyields

0 6= χ(u) = limi→∞

χ(uj′i) = lim infi→∞

χ(uj′i) = 0,

wherewe have usedthecontinuityof χ. This is a contradiction.Therefore,theas-sumptionχ(u) 6= 0 waswrong. ut

Next, weprovea resultthatholdsalsofor ∆min = 0. Moreover, theexistenceofaccumulationpointsis not required.

Theorem6.19. Let γ0 > 0 or ∆min = 0 hold. Assumethat Algorithm 6.9 doesnot terminateafter finitely manystepswith a critical point uk of (6.6). Thenthealgorithmgeneratesinfinitelymanyacceptedsteps(sji). Moreover,

lim infk→∞

χ(uk) = 0. (6.18)

In particular, if uk convergesto u, thenu is a critical pointof (6.6).

Proof. By Lemma6.16,infinitely many successfulsteps(sji) aregenerated.Nowassumethat(6.18)is wrong,i.e.,

lim infk→∞

χ(uk) > 0. (6.19)

Thenweobtainfrom Lemma6.17that∑k∈S

∆k <∞. (6.20)

In particular, (uji) is a Cauchysequenceby (6.10)and(6.12).Therefore,(uk) con-vergesto somelimit u, at which accordingto (6.19)andthe continuity of χ holdsχ(u) 6= 0.Case1:∆min > 0.

Thenby assumptionalsoγ0 > 0, andTheorem6.18yieldsχ(u) = 0, which is acontradiction.Case2:∆min = 0.

Lemma6.15with u = u andη = η2 yields∆ > 0 andδ > 0 suchthatk ∈ Sand∆k+1 ≥ ∆k holdsfor all k with ‖uk − u‖ ≤ δ and∆k ≤ ∆. Sinceuk → u,


thereexistsk′ ≥ 0 with ‖uk − u‖ ≤ δ for all k ≥ k′.Case2.1:Thereexistsk′′ ≥ k′ with ∆k ≤ ∆ for all k ≥ k′′.

Thenk ∈ S and(inductively)∆k ≥ ∆k′′ for all k ≥ k′′. Thiscontradicts(6.20).Case2.2:For infinitely many k holds∆k > ∆.

By (6.20) thereexistsk′′ ≥ k′ with ∆ji ≤ ∆ for all ji ≥ k′′. Now, for eachji ≥ k′′, thereexists an index ki > ji suchthat∆k ≤ ∆, ji ≤ k < ki, and∆ki

> ∆. If ki ∈ S, setj′i = ki, thusobtainingj′i ∈ S with ∆j′i > ∆. If ki /∈ S,we have j′i

def= ki − 1 ≥ ji ≥ k′, andthusj′i ∈ S, sinceby construction∆j′i ≤ ∆.Moreover, ∆ < ∆ki

≤ γ2∆j′i (here∆min = 0 is used)implies that∆j′i > ∆/γ2.By this construction,we obtain an infinitely increasingsequence(j′i) ⊂ S with∆j′i > ∆/γ2. Again, this yieldsacontradictionto (6.20).

Therefore,in all caseswe obtain a contradiction.Thus, the assumptionwaswrongandtheproof of (6.18)is complete.

Finally, if uk → u, thecontinuityof χ and(6.18)imply χ(u) = 0. Therefore,uis acritical pointof (6.6). ut

Thenext resultshowsthatunderappropriateassumptionsthe“lim inf” in (6.18)canbereplacedby “lim”.

Theorem6.20. Let γ0 > 0 or ∆min = 0 hold. Assumethat Algorithm 6.9 doesnot terminateafter finitely manystepswith a critical point uk of (6.6). Thenthealgorithmgeneratesinfinitely manyacceptedsteps(sji). Moreover, if thereexistsasetO that contains(uk) andonwhichχ is uniformlycontinuous,then

limk→∞

χ(uk) = 0. (6.21)

Proof. In view of Theorem6.19we only have to prove (6.21).Thus,let usassumethat (6.21) is not true.Thenthereexistsε > 0 suchthatχ(uk) ≥ 2ε for infinitelymany k ∈ S. Since(6.18)holds,we thuscanfind increasingsequences(j′i)i≥0 and(k′i)i≥0 with j′i < k′i < j′i+1 and

χ(uj′i) ≥ 2ε, χ(uk) > ε ∀ k ∈ S with j′i < k < k′i, χ(uk′i) ≤ ε.SettingS ′ =

⋃∞i=0 S ′i with S ′i = k ∈ S : j′i ≤ k < k′i, we have

lim infS′3k→∞

χ(uk) ≥ ε.Therefore,with Lemma6.17 ∑

k∈S′∆k <∞.

In particular,∑k∈S′i ∆k → 0 asi→∞, andthus,using(6.10)and(6.12),

‖uk′i − uj′i‖U ≤∑k∈S′i‖sk‖U ≤ β1

∑k∈S′i

∆k → 0 (asi→∞).

This is acontradictionto theuniform continuityof χ, since

limi→∞

(uk′i − uj′i) = 0, but |χ(uk′i)− χ(uj′i)| ≥ ε ∀ i ≥ 0.

Therefore,theassumptionwaswrongandtheassertionis proved. ut


6.3 ImplementableDecreaseConditions

Algorithm6.9requiresthecomputationof trial stepsthatsatisfytheconditions(6.10)and (6.11). We now describehow thesecondition can be implementedby meansof a generalizedCauchypoint which is basedon the projectedgradientpath.Ascriticality measurewe can useany criticality measureχ that is majorizedby theprojectedgradientin thefollowing sense:

θχ(u) ≤ χP (u) def= ‖u− PK(u− f ′(u))‖U (6.22)

with a fixedparameterθ > 0. For uk ∈ K andt ≥ 0, we introducethe projectedgradientpath

πk(t) = PK(uk − tgk)− uk.anddefinethegeneralizedCauchy point sck asfollows:

sck = πk(σk), with σk ∈ 1, 2−1, 2−2, . . . chosenmaximalsuchthat

qk(πk(σk)) ≤ γ(gk, πk(σk))U , (6.23)

‖πk(σk)‖U ≤ ∆k, (6.24)

whereγ ∈ (0, 1) is afixedparameter.Our aim is to show that the following conditionensuresthat (6.11) is satisfied

with a constantβ2 independentof uk.

Fraction of Cauchy DecreaseCondition:

predk(sk) ≥ β3predk(sck), (6.25)

whereβ3 ∈ (0, 1] is fixed.Wefirst establishseveralusefulpropertiesof theprojectedgradientpath.

Lemma 6.21. Letuk ∈ K. Thenfor all t ∈ (0, 1] andall s ≥ 1 holds

‖πk(t)‖U ≤ ‖πk(st)‖U ≤ s‖π(t)‖U , (6.26)

−(gk, πk(t))U ≥ 1t‖πk(t)‖2U ≥ χP (uk)‖πk(t)‖U ≥ tχP (uk)2, (6.27)

Proof. The first inequality in (6.26) is well known, see,e.g., [136, Lem. 2]. Thesecondinequalityis provedin [27]. For (6.27),we usethat

(PK(v)− v, u− PK(v))U ≥ 0 ∀ u ∈ K, v ∈ U, (6.28)

sincew = PK(v) minimizes‖w − v‖2U onK. Wesetvk(t) = uk − tgk andderive

−(tgk, πk(t))U = (πk(t) + [(vk(t)− PK(vk(t))], πk(t))U= ‖πk(t)‖2 + (vk(t)− PK(vk(t)), PK(vk(t))− uk) ≥ ‖πk(t)‖2,

wherewehaveused(6.28)in thelaststep.FromχP (uk) = πk(1) and(6.26)followtheremainingassertions. ut

6.3 ImplementableDecreaseConditions 115

Thisallowsusto prove thewell-definednessof thegeneralizedCauchypoint.

Lemma 6.22. For all uk ∈ K, thecondition(6.23)is satisfiedwhenever

0 < σk ≤ σ0 def= min

1,

2(1− γ)CB

.

Furthermore, thecondition(6.24)holdsfor all σk ∈ (0, 1] with

σk‖gk‖U ≤ ∆k.

Proof. For all 0 < t ≤ σ0 holdsby Assumption6.13(iii) and(6.27)

qk(πk(t)) = (gk, πk(t))U +12(πk(t), Bkπk(t))U ≤ (gk, πk(t))U +

CB2‖πk(t)‖2U

≤ (1− CBt/2)(gk, πk(t)) ≤ γ(gk, πk(t))U .Furthermore,(6.24)is metby all σk ∈ (0, 1] satisfyingσk‖gk‖U ≤ ∆k, since

‖πk(t)‖U ≤ t‖gk‖Uholdsfor all t ∈ [0, 1], see(6.27). utLemma 6.23. Let sk satisfy the feasibility condition (6.10) and the fraction ofCauchy decreasecondition(6.25). Thensk satisfiesthe reductioncondition(6.11)for anycriticality measureχ verifying(6.22)andany

0 < β2 ≤ 12β3γθ

2 min

1,

2(1− γ)CB

.

Proof. 1. If σk = 1, thenby (6.23)and(6.27)

predk(sck) = −qk(πk(σk)) ≥ −γ(gk, πk(1))U ≥ γχP (uk)2.

2. If σk < 1, thenfor τk = 2σk eitherholds‖πk(τk)‖U > ∆k or

qk(πk(τk)) > γ(gk, πk(τk))U .

In thesecondcasewemusthaveτk > σ0 by Lemma6.22,andthus,using(6.26),

‖πk(τk)‖U ≥ τk‖πk(1)‖U ≥ σ0χP (uk).

Therefore,in bothcases,

‖πk(σk)‖U = ‖πk(τk/2)‖U ≥ 12‖πk(τk)‖U ≥ 1

2minσ0χP (uk),∆k.

Now, weobtainfrom (6.23)and(6.27)

predk(sck) = −qk(πk(σk)) ≥ −γ(gk, πk(σk))U ≥ γχP (uk)‖πk(σk)‖U≥ γ

2χP (uk) minσ0χP (uk),∆k.

As shown in 1, this alsoholdsfor thecaseσk = 1. Theproof is completedby using(6.22)and(6.25). ut


Remark 6.24. Obviously, the generalizedCauchy point sck satisfies(6.10) and(6.25).Sincesck is computedby anArmijo-type projectedline search,we thushavean easily implementableway of computingan admissibletrial stepby choosingsk = sck.

6.4 Transition to FastLocal Convergence

Wenow returnto theproblemof solvingthesemismoothoperatorequation

Φ(u) = 0.

We assumethat any u ∈ U with Φ(u) = 0 is a critical point of the minimizationproblem(6.6).Especiallythesmoothingstepmakesit theoreticallydifficult to provethat closeto a regular solutionprojectedsemismoothNewton stepssatisfy the re-ductioncondition(6.11)(or (6.25)).In orderto preventour discussionfrom becom-ing too technical,we avoid the considerationof smoothingstepsby assumingthatΦ : U → U is ∂Φ-semismooth.In theframework of MCPsthis is, e.g.,satisfiedforU = L2(Ω) andΦ(u) = u−PB(u−λ−1F (u)) if F hastheformF (u) = λu+G(u)andG : L2(Ω) 7→ Lp

′(Ω) is locally Lipschitzcontinuous,seesection4.2.

Therefore,theassumptionsof this sectionare:

Assumption 6.25. In additionto Assumption6.13,let thefollowing hold:

(i) TheoperatorΦ : U → U is continuouswith generalizeddifferential∂Φ.

(ii) Thecriticality measureχ satisfies

vk ∈ K, limk→∞

‖Φ(vk)‖U = 0 =⇒ limk→∞

χ(vk) = 0.

Remark 6.26. Assumption(ii) implies thatany u ∈ U with Φ(u) = 0 is a criticalpoint of (6.6).

In orderto coverthedifferentvariants(6.2)–(6.4)of minimizationproblemsthatcanbeusedfor globalizationof (1.1),weproposethefollowing hybridmethod:

Algorithm 6.27 (Trust-Region ProjectedNewtonAlgorithm).

1. Initialization: Chooseη1 ∈ (0, 1), ∆min ≥ 0, ν ∈ (0, 1), anda criticality mea-sureχ. Chooseu0 ∈ K, ∆0 > ∆min, anda modelHessianB0 ∈ L(U,U).Chooseanintegerm ≥ 1 andfix λ ∈ (0, 1/m] for thecomputationof ρk. Com-puteζ−1 := ‖Φ(u0)‖U andset l−1 := −1, r := −1, k := 0, i := −1, andin := −1.

2. Computeχk := χ(uk). If χk = 0, thenSTOP.

3. Computea modelHessianBk ∈ L(U,U) andadifferentialMk ∈ ∂Φ(uk).4. Try to computesn,1k ∈ U by solving

Mksn,1k = −Φ(uk).

If this fails, thengo to Step11.Otherwise,setsn,2k := PK(uk + sn,1k )− uk.

6.4 Transitionto FastLocal Convergence 117

5. Computesnk := min

1,

∆k

‖sn,2k ‖U

sn,2k and ζk := ‖Φ(uk + snk )‖U .

6. If ζk ≤ νζlr , thensetsk := snk .Otherwise,go to Step10.

7. If sk fails to satisfy(6.11),thengo to Step9.

8. Call Algorithm 6.11with mk = mini − in,m to computeρk := ρk(sk). Ifρk ≤ η1 thengo to Step9.Otherwise,obtaina new trust-region radius∆k+1 by invoking Algorithm 6.10,setlr+1 := k, incrementr by 1 andgo to Step15.

9. Setuk+1 := uk + sk, ∆k+1 := max∆min,∆k, ji+1 := k, lr+1 := k, andin := i+ 1. Incrementk, r, andi by 1 andgo to Step2.

10. If sk = snk satisfies(6.11),thensetsk := snk andgo to Step12.

11. Computea trial stepsk satisfyingtheconditions(6.10)and(6.11).

12. Computethereductionratioρk := ρk(sk) bycallingAlgorithm 6.11withmk =mini− in,m.

13. Computethenew trust-region radius∆k+1 by invokingAlgorithm 6.10.

14. If ρk ≤ η1 thenrejectthestepsk: Setuk+1 := uk,Bk+1 := Bk, andMk+1 :=Mk. If thecomputationof sn,2k wassuccessful,thensetsn,2k+1 := sn,2k , incrementk by 1, andgo to Step5.Otherwise,incrementk by 1 andgo to Step11.

15. Acceptthestep:Setuk+1 := uk + sk andji+1 := k. Incrementk andi by 1andgo to Step2.

In eachiteration,a semismoothNewton stepsn,1k for the equationΦ(u) = 0 iscomputed.This stepis projectedontoK andscaledto lie in the trust-region; theresultingstepis snk . In Step6 a test is performedto decideif snk canbe acceptedright away or not. If the outcomeis positive, the stepsnk is acceptedin any case(eitherin step9 or, via step8, in step15, seebelow), the index k is storedin lr+1,andr is incremented.Therefore,the sequencel0 < l2 < · · · lists all iterationsatwhich thetestin Step6 wassuccessfuland,thus,thesemismoothNewton stepwasaccepted.The resultingresidualζlr = ‖Φ(ulr + snlr)‖U is storedin ζlr , andζl−1

holdstheinitial residual‖Φ(u0)‖U . Thetestin Step6 ensuresthat

ζlr ≤ νζlr−1 ≤ · · · ≤ νr+1ζl−1 = νr+1‖Φ(u0)‖U .

After a positive outcomeof thetestin Step6, it is first checkedif thestepsk = snkalsopassesthe “ordinary” (relaxed) reduction-ratio-basedacceptancetest.This isdoneto embedthe new acceptancecriterion assmoothlyaspossiblein the trust-region framework. If sk = snk satisfiesthe reduction-ratio-basedtest, then sk istreatedasany otherstepthat is acceptedby the trust-region mechanism.If it doesnot, the stepis neverthelessaccepted(in Step9), but now in is setto i + 1, whichhasthe consequencethat in the next iterationwe havemk = 0, which resultsin arestartof therared-nonmonotonicitymechanism.If thetestζk ≤ νζlr in Step6 fails,


thensnk is chosenas“ordinary” trial stepif it satisfiesthecondition(6.11);notethat(6.10)is satisfiedautomatically. Otherwise,a differenttrial stepis computed.

Theglobalconvergenceresultof Theorem6.19cannow easilybegeneralizedtoAlgorithm 6.27.

Theorem6.28. Let the Assumption6.25 hold and let γ0 > 0 or ∆min = 0. As-sumethat Algorithm6.27doesnot terminateafter finitelymanystepswith a criticalpointuk of (6.6). Thenthealgorithmgeneratesinfinitelymanyacceptedsteps(sji).Moreover,

lim infk→∞

χ(uk) = 0.

In particular, if uk convergesto u, thenu is a critical pointof (6.6).

Proof. Thewell-definednessof Algorithm 6.27follows immediatelyfrom thewell-definednessof Algorithm 6.9,which wasestablishedin Lemma6.16.Therefore,ifAlgorithm 6.27 doesnot terminatefinitely, the sequences(sji) of acceptedstepsis infinite. If r remainsboundedduring the algorithm, i.e., if only finitely manystepssnk passthetestin Step6, thenAlgorithm 6.27eventuallyturnsinto Algorithm6.9. In fact, if Step9 is never entered,thenall acceptedstepspassthe reduction-ratio-basedtestandthusAlgorithm 6.27behaveslike Algorithm 6.9 from theverybeginning.Otherwise,let k′ = ji′ be the last iterationat which Step9 is entered.Then∆k′+1 ≥ ∆min andin = i′ + 1 for all k > k′. In particular, mk = 0 for allji′ < k ≤ ji′+1. Thus,Algorithm 6.27behaveslike an instanceof Algorithm 6.9startedatu0 = uk′+1 with∆0 = ∆k′+1. Hence,theassertionfollowsfrom Theorem6.19.If, on theotherhand,r →∞ duringthealgorithm,thenwehave inductively

‖Φ(ulr+1)‖U = ζlr ≤ νζlr−1 ≤ · · · ≤ νr+1‖Φ(u0)‖U → 0 asr →∞.

By Assumption6.25(ii) this impliesχ(ulr+1) → 0. Sinceχ is continuous,we seethatuk → u impliesthatu is acritical pointof (6.6). utRemark 6.29. Variousgeneralizationscanbe incorporated.For instance,it is pos-sible not to resetmk to zeroafter acceptanceof snk in Step9. Hereby, we wouldhave to generalizeLemma6.14 along the lines of [142]. Further, we could allowfor nonmonotonicityof theresidualsζlr in a similar way asfor thefunctionvaluesf(uji).

Wenow cometo theproof of transitionto fastlocal convergence.

Theorem6.30. Let theAssumption6.25hold andlet∆min > 0. Assumethat Algo-rithm 6.27generatesan infinite sequence(uk) of iteratesthat convergesto a pointu ∈ U with Φ(u) = 0. LetΦ be∂Φ-semismoothat u andLipschitz continuousnearu. Further, assumethatMk is invertiblewith ‖M−1

k ‖U,U ≤ CM−1 wheneveruk issufficientlycloseto u. Then(uk) convergesq-superlinearlyto u. If Φ is evenα-ordersemismoothat u, 0 < α ≤ 1, thentheq-rateof convergenceis at least1 + α.

6.4 Transitionto FastLocal Convergence 119

Proof. Usingtheassumptions,theabstractlocalconvergenceresultof Theorem3.19for projectedsemismoothNewtonmethodsis applicablewith Sk(u) = u andyields

‖uk + sn,2k − u‖U = o(‖uk − u‖U ) (asuk → u). (6.29)

Therefore,

‖sn,2k ‖U ≤ ‖uk − u‖U + ‖uk + sn,2k − u‖U ≤ 32‖uk − u‖U , (6.30)

‖sn,2k ‖U ≥ ‖uk − u‖U − ‖uk + sn,2k − u‖U ≥ 12‖uk − u‖U (6.31)

for all uk in a neighborhoodof u, andthus

12‖uk − u‖U ≤ ‖sn,2k ‖U ≤ ‖sn,1k ‖U = ‖M−1

k Φ(uk)‖U ≤ CM−1‖Φ(uk)‖U .

Weconcludethatfor uk nearu holds

‖Φ(uk+ sn,2k )‖U ≤ L‖uk + sn,2k − u‖U = o(‖uk− u‖U ) = o(‖Φ(uk)‖U ), (6.32)

whereL is theLipschitzconstantof Φ nearu.Sinceuk → u, weseefrom (6.30)and(6.32)thatthereexistsK with

‖sn,2k ‖U ≤ ∆min, ‖Φ(uk + sn,2k )‖U ≤ ν‖Φ(uk)‖U ∀ k ≥ K.Themechanismof updating∆k implies∆k ≥ ∆min wheneverk − 1 ∈ S. Hence,for all k ≥ K with k − 1 ∈ S wehavesnk = sn,2k andthusζk ≤ ν‖Φ(uk)‖U .

Now assumethatnoneof thestepssnk , k ≥ K, passesthetestin Step6. Thenrandthusζlr > 0 remainunchangedfor all k ≥ K. But sinceΦ(uk)→ 0 ask →∞,thereexistsk ≥ K with k − 1 ∈ S and‖Φ(uk)‖U ≤ ζlr . Thussnk would satisfythe test in Step6, which is a contradiction.Hence,thereexistsk′ ≥ K for whichsnk′ satisfiesthetestin Step6 andthusis accepted.Then,in iterationk = k′ + 1, wehave∆k ≥ ∆min, snk = sn,2k , andζk ≤ ν‖Φ(uk)‖U = νζk′ , sothatsnk againpassesthetestin Step6 andthereforeis accepted.Inductively, all stepssnk = sn,2k , k ≥ k′,areaccepted.Thesuperlinearconvergencenow follows from (6.29).If Φ is α-ordersemismooth,then(6.29)holdswith “o(‖uk− u‖U )” replacedby “O(‖uk− u‖1+αU )”andtherateof convergenceis thusat least1 + α. utThereasonwhy we requireconvergenceuk → u insteadof consideringanaccumu-lation point u is that,althoughwe canshow thatζk = o(‖Φ(uk)‖U) for k − 1 ∈ Sanduk closeto u, it couldbethatζlr is sosmallthatneverthelessζk > νζlr .

However, dependingon thechoiceof theobjective functionf , it oftenis easytoestablishthatthereexistsa constantCΦ > 0 with

‖Φ(uk)‖U ≤ CΦ‖Φ(ulr)‖U for all iterationsk andcorrespondingr. (6.33)

Thisholds,e.g.,for f(u) = ‖Φ(u)‖2U/2 if theamountof nonmonotonicityof f(ulr)is slightly restricted.If (6.33)holds,wecanprovethefollowing moregeneralresult:


Theorem6.31. Let theAssumption6.25hold andlet∆min > 0. Assumethat Algo-rithm 6.27generatesan infinite sequence(uk) of iteratesthat hasan accumulationpoint u ∈ U withΦ(u) = 0. LetΦ be∂Φ-semismoothat u andLipschitzcontinuousnear u. Further, assumethatMk is invertiblewith ‖M−1

k ‖U,U ≤ CM−1 wheneveruk is sufficientlycloseto u. Finally, assumethat (6.33)holds.Then(uk) convergesq-superlinearlyto u. If Φ is evenα-order semismoothat u, 0 < α ≤ 1, thentheq-rateof convergenceis at least1 + α.

Proof. As in theproof of Theorem6.30wecanshow that(6.29)holds.We thencanproceedsimilar asabove to show that thereexists δ > 0 suchthat for all k withk − 1 ∈ S anduk ∈ u+ δBU holds

snk = sn,2k , uk + snk ∈ u+ δBU ,

ζk = ‖Φ(uk + snk )‖U ≤ν

CΦ‖Φ(uk)‖U ≤ ν‖Φ(ulr)‖U = νζlr ,

wherewe have used(6.33).Let k′ beany of thosek. Thenthestepsnk′ satisfiesthetestin Step6 andhenceis accepted.Furthermore,k = k′+1 againsatisfiesk−1 ∈ Sanduk ∈ u + δBU , so that alsosnk is accepted.Inductively, snk is acceptedfor allk ≥ k′. Superlinearconvergenceto u andconvergencewith rate≥ 1+α now followasin theproof of Theorem6.30. ut

7. Applications

7.1 Distrib uted Control of a Nonlinear Elliptic Equation

Let Ω ⊂ Rn be a nonemptyandboundedopendomainwith sufficiently smoothboundaryandconsiderthenonlinearcontrolproblem

minimizey∈H1

0 (Ω),u∈L2(Ω)

12

∫Ω


2

∫Ω

(u(x)− ud(x))2dx

subjectto −∆y + ϕ(y) = f + gu on Ω,

β1 ≤ u ≤ β2 on Ω.

(7.1)

We assumeyd ∈ L2(Ω), ud ∈ L∞(Ω) (Lq with q > 2 would alsobepossible)f ∈L2(Ω), g ∈ L∞(Ω),−∞ ≤ β1 < β2 ≤ +∞; λ > 0 is theregularizationparameter.Further, letϕ : R→ R benondecreasingandtwice continuouslydifferentiablewith

|ϕ′′(τ)| ≤ c1 + c2|τ |s−3, (7.2)

wherec1, c2 ≥ 0 are constantsands > 3 is fixed with s ∈ (3,∞] for n = 1,s ∈ (3,∞) for n = 2, ands ∈ (3, 2n/(n− 2)] for n = 3, 4, 5.

WesetU = L2(Ω), Y = H10 (Ω),W = H1

0 (Ω),W ∗ = H−1(Ω),C = [β1, β2],

C = u ∈ U : u(x) ∈ C on Ω,anddefine

J(y, u) =12

∫Ω


2

∫Ω

(u(x)− ud(x))2dx, (7.3)

E(y, u) = −∆y + ϕ(y)− f − gu. (7.4)

Thenwecanwrite (7.1) in theform

minimizey∈Y,u∈U

J(y, u) subjectto E(y, u) = 0, u ∈ C. (7.5)

Wenow begin with our investigationof thecontrolproblem.

Lemma 7.1. TheoperatorE : Y × U → W definedin (7.4) is twicecontinuouslydifferentiablewith derivatives

122 7. Applications

Ey(y, u) = −∆+ ϕ′(y)I,Eu(y, u) = −gI,Eyu(y, u) = 0, Euy(y, u) = 0, Euu(y, u) = 0Eyy(y, u)(v1, v2) = ϕ′′(y)v1v2.

Proof. By PropositionA.11 and(7.2),thesuperpositionoperator

u ∈ Ls(Ω) 7→ ϕ(u) ∈ Ls′(Ω),1s

+1s′

= 1,

is twicecontinuouslydifferentiable,since

s− 2s′

s′=

s

s′− 2 = s− 3.

Thechoiceof s impliestheembeddings

H10 (Ω) → Ls(Ω), Ls

′(Ω) → H−1(Ω).

Therefore,theoperatory ∈ H10 (Ω) 7→ ϕ(y) ∈ H−1(Ω) is twice continuouslydif-

ferentiable,too,andthusalsoE. Theform of thederivativesis obvious,seePropo-sitionsA.10 andA.11. utLemma 7.2. For everyu ∈ U , thestateequationE(y, u) = 0 possessesa uniquesolutiony = y(u) ∈ Y .

Proof. Integrating(7.2) twice,weseethatthereexistsconstantsC ′i, Ci ≥ 0 with

|ϕ′(τ)| ≤ C ′1 + C ′2|τ |s−2, |ϕ(τ)| ≤ C1 + C2|τ |s−1. (7.6)

Therefore,by PropositionA.9,

y ∈ Lt(Ω) 7→ ϕ(y) ∈ L ts−1 (Ω) is continuousfor all s− 1 < t <∞, (7.7)

y ∈ Lt(Ω) 7→ ϕ′(y) ∈ L ts−2 (Ω) is continuousfor all s− 2 < t <∞. (7.8)

Now, let

θ(t) =∫ t

0

ϕ(τ)dτ.

Thenθ′(t) = ϕ(t), andfrom (7.6) andPropositionA.11 follows that the mappingy ∈ Lt 7→ θ(y) ∈ Lt/s is twice continuouslydifferentiablefor all s ≤ t < ∞with first derivative v 7→ ϕ(y)v andsecondderivative (v,w) 7→ ϕ′(y)vw. SinceH1

0 → Ls, this alsoholdsfor y ∈ H10 7→ θ(y) ∈ L1(Ω). Now consider, for fixed

u ∈ C, thefunctione : H10 7→ R,

e(y) =12

∫Ω

∇y(x) · ∇y(x)dx+∫Ω

θ(y(x))dx− (f + gu, y)L2 .

This functionis twice continuouslydifferentiablewith

7.1 DistributedControlof aNonlinearElliptic Equation 123

e′(y) = −∆y + ϕ(y)− f − gu = E(y, u),

e′′(y)(v, v) = 〈−∆v, v〉H−1,H10

+∫Ω

ϕ′(y(x))v(x)v(x)dx ≥ ‖v‖2H10.

Therefore,by standardexistenceand uniquenessresultsfor strongly convex op-timization problems,see,e.g., [147, Prop. 25.22], thereexists a unique solutiony = y(u) ∈ H1

0 (Ω) of E(y, u) = 0. Thus,for all u, thereexists a uniquesolu-tion y = y(u) of thestateequation. utNext, we discussthe existenceof solutionsof the control problemfor the casesn = 1, 2, 3. To simplify thepresentation,we assumes ∈ (3, 4] in thecasen = 3.

Lemma 7.3. Let n = 1, 2, 3, and,assumes ∈ (3, 4] in the casen = 3. Thenthecontrol problem(7.5)admitsa solution.

Proof. By Lemma7.2thereexistsa (feasible)minimizingsequence(yk, uk) for thecontrol problem,which, due to the structureof J , is boundedin L2 × L2. Notethat in the caseβ1, β2 ∈ R the particularform of C even implies that ‖uk‖L∞ ≤max|β1|, |β2|, but we do not needthis here.FromE(yk, uk) = 0 and(ϕ(y) −ϕ(0))y ≥ 0 we obtain

‖yk‖2H10≤ 〈−∆yk, yk〉H−1,H1

0+∫Ω

[ϕ(yk)(x)− ϕ(0)]yk(x)dx

= (f + guk − ϕ(0), yk)L2

≤ (‖f‖L2 + ‖g‖L∞‖uk‖L2 + µ(Ω)1/2|ϕ(0)|)‖yk‖L2

≤ C(‖f‖L2 + ‖g‖L∞‖uk‖L2 + |ϕ(0)|)‖yk‖H10.

This implies that (yk) is boundedin H10 . SinceH1

0 → Lt for all 1 ≤ t ≤ ∞ ifn = 1, 1 ≤ t <∞ if n = 2, andall 1 ≤ t ≤ 2n/(n− 2) = 6 if n = 3, weconcludefrom (7.8) thatϕ(yk) is boundedin all spacesLt, 1 ≤ t ≤ ∞ if n = 1, 1 ≤ t < ∞if n = 2, 1 ≤ t ≤ 6/(s − 1) ≥ 2 if n = 3. Thus,−∆yk is boundedin L2, andtherefore,usingregularity results(we assumethat theboundaryof Ω is sufficiently“nice”), yk is boundedin H1

0 ∩H2.SinceH1

0 ∩ H2 is compactlyembeddedin L∞ andalsoin H10 , we canextract

a subsequencewith yk → y∗ strongly in H10 andstrongly in L∞, and,dueto the

boundednessof uk in L2 andthe weaksequentialclosednessof C, uk → u∗ ∈ Cweaklyin L2. Hence,ϕ(yk)→ ϕ(y∗) stronglyin L∞. Now

f + guk →f + gu∗ weaklyin L2,

f + guk = −∆yk + ϕ(yk)→−∆y∗ + ϕ(y∗) stronglyin H−1

shows E(y∗, u∗) = 0. Therefore,(y∗, u∗) is feasible.Further, J is continuousand convex, and thus weakly lower semicontinuous.From the weak convergence(yk, uk)→ (y∗, u∗) we thusconcludethat(y∗, u∗) solvestheproblem. ut

124 7. Applications

7.1.1 Black-Box Approach

In Lemma7.2 it wasproved that the stateequationadmitsa uniquesolutiony(u).Therefore,wecanintroducethereducedobjective function

j(u) = J(y(u), u)

andconsidertheequivalentreducedproblem

minimizeu∈U

j(u) subjectto u ∈ C. (7.9)

FromLemma7.1weknow thatE is twicecontinuouslydifferentiable.Ournext aimis to apply the implicit function theoremto prove that y(u) is twice continuouslydifferentiable.To this endweobserve:

Lemma 7.4. For all y ∈ Y andu ∈ U , thepartial derivative

Ey(y, u) = −∆+ ϕ′(y)I ∈ L(Y,W ∗) = L(H10 ,H

−1)

is a homeomorphismwith

‖Ey(y, u)−1‖W ∗,Y ≤ 1.

Proof. Sinceϕ is nondecreasing,we haveϕ′ ≥ 0 andthusfor all v ∈ H10

〈Ey(y, u)v, v〉H−1,H10

= (v, v)H10

+∫Ω

ϕ′(y)v2dx ≥ ‖v‖2H10.

Therefore,by theLax–Milgramtheorem,Ey(y, u) ∈ L(H10 ,H

−1) = L(Y,W ∗) isahomeomorphismwith ‖Ey(y, u)−1‖W ∗,Y ≤ 1. utTherefore,wecanapplytheimplicit functiontheoremto obtain

Lemma 7.5. Themappingu ∈ U 7→ y(u) ∈ Y is twicecontinuouslydifferentiable.

Sincetheobjective functionJ is quadratic,we thushave

Lemma 7.6. Thereducedobjectivefunctionj : U → R is twicecontinuouslydif-ferentiable.

Finally, weestablishthefollowing structuralresultfor thereducedgradient:

Lemma 7.7. Thereducedgradientj′(u) hastheform

j′(u) = λu+G(u), G(u) = −gw(u) − λud,wherew = w(u) solvestheadjointequation

−∆w + ϕ′(y)w = yd − y(u). (7.10)

Themappingu ∈ U 7→ G(u) ∈ Lp′(Ω) is continuouslydifferentiable, and thus

locally Lipschitz continuous,for all p′ ∈ [2,∞] if n = 1, p′ ∈ [2,∞) if n = 2, andp′ ∈ [2, 2n/(n− 2)] if n ≥ 3. Asa consequence, themapping

u ∈ Lp(Ω) 7→ j′(u) ∈ Lr(Ω)

is continuouslydifferentiablefor all p ∈ [2,∞] andall r ∈ [1,minp, p′].


Proof. Usingtheadjointrepresentationof j′, we seethat

j′(u) = Ju(y(u), u) + Eu(y(u), u)∗w(u) = λ(u− ud)− gw(u),

wherew = w(u) solves the adjoint equationEy(y(u), u)∗w = −Jy(y(u), u),whichhastheform (7.10).SinceEy(y(u), u)∗ is ahomeomorphismby Lemma7.4,the adjoint statew(u) is unique.Further, sinceEy, y(u), andJy arecontinuouslydifferentiable,we canusethe implicit function theoremto prove that the mappingu ∈ U 7→ w(u) ∈ W is continuouslydifferentiable,andthus,in particular, locallyLipschitzcontinuous.

For p′ asgiven in the Lemma,the embeddingW = H10 → Lp

′implies that

theoperatorG(u) = −gw(u)− λud is continuouslydifferentiable,andthuslocallyLipschitzcontinuous,asa mappingfromU to Lp

′. Thelastassertionof theLemma

follows immediately. utOur aim is to apply our classof semismoothNewton methodsto computecriticalpointsof problem(7.9), i.e., to solve theVIP

u ∈ C, (j′(u), v − u)L2 = 0 ∀ v ∈ C. (7.11)

Thesolutionsof (7.11)enjoy thefollowing regularityproperty:

Lemma 7.8. Everysolution u ∈ U of (7.11)satisfiesu ∈ L∞(Ω) if β1, β2 ∈ R,andu ∈ Lp′(Ω) with p′ asin Lemma7.7, otherwise.

Proof. For β1, β2 ∈ R we have C ⊂ L∞(Ω) and the assertionis obvious. Forβ1 = −∞, β2 = +∞ follows from (7.11)

0 = j′(u) = λu+G(u),

andthusu = −λ−1G(u) ∈ Lp′(Ω) by Lemma7.7.For β1 > −∞, β2 = +∞ weconcludein thesameway1u 6=β1j

′(u) = 0, andthus

1u6=β1u = −λ−11u6=β1G(u) ∈ Lp′(Ω).

Furthermore,1u=β1u = β11u=β1 ∈ L∞(Ω).

Thecaseβ1 = −∞, β2 < +∞ canbetreatedin thesameway. utWith the resultsdevelopedabove we have everythingat handto prove the semis-moothnessof thesuperpositionoperatorΠ arisingfrom equationreformulations

Π(u) = 0, Π(u) def= π(u, j′(u)) (7.12)

of problem(7.11),whereπ is anMCP-functionfor the interval [β1, β2]. In the fol-lowing, we distinguishingthetwo variantsof reformulationsthatwerediscussedinsection5.1.2.

126 7. Applications

First Reformulation

Here,we discussreformulationsbasedon a generalMCP-functionπ = φ[β1,β2] fortheintervalC = [β1, β2].

Theorem7.9. Theproblemassumptionsimply that Assumption5.10(a), (b) (withZ = 0) is satisfiedwith F = j′ for anyp ∈ [2,∞], anyp′ ≤ p with p′ ∈ [2,∞] ifn = 1, p′ ∈ [2,∞) if n = 2, andp′ ∈ [2, 2n/(n− 2)] if n ≥ 3, andanyr ∈ [1, p′].

In particular, if π satisfiestheAssumption5.10(c), (d), thenTheorem5.11yieldsthe∂Π-semismoothnessof the operator Π. Hereby, the differential ∂Π(u) con-sistsof all operatorsM ∈ L(Lp, Lr),

M = d1I + d2 · j′′(u), d ∈ L∞(Ω)2, d ∈ ∂π(u, j′(u)) on Ω. (7.13)

Proof. Theassertionsfollow immediatelyfrom theboundednessof Ω, Lemma7.7,andTheorem5.11. utConcerninghigherordersemismoothness,wehave:

Theorem7.10. Supposethat the operator y ∈ H10 (Ω) 7→ ϕ(y) ∈ H−1(Ω) is

threetimescontinuouslydifferentiable. Thiscan,e.g., besatisfiedif ϕ hassuitableproperties.

ThentheAssumption5.12(a), (b) withZ = 0 andα = 1 is satisfiedbyF = j′

for r = 2, anyp ∈ (2,∞], andall p′ ≤ p with p′ ∈ (2,∞] if n = 1, p′ ∈ (2,∞) ifn = 2, andp′ ∈ (2, 2n/(n− 2)] if n ≥ 3.

In particular, if π satisfiestheAssumption5.12(c), (d), thenTheorem5.13yieldsthe β-order ∂Π-semismoothnessof the operator Π(u) = π(u, j′(u)), where βis givenby Theorem3.45. Thedifferential ∂Π(u) consistsof all operatorsM ∈L(Lp, L2) of theform (7.13).

Proof. If y ∈ H10 7→ ϕ(y) ∈ H−1 is three times continuouslydifferentiable,

then alsoE and thus,by the implicit function theorem,y(u) is threetimes con-tinuouslydifferentiable.Hence,j′ : L2 → L2 is twice continuouslydifferentiableandthereforeits derivativeis locally Lipschitzcontinuous.Thesamethenholdstruefor u ∈ Lp 7→ j′(u) ∈ Lr.

Theassertionsnow follow fromtheboundednessofΩ, Lemma7.7,andTheorem5.13. utRemark 7.11. TheHessianoperatorj′′ canbeobtainedvia theadjoint representa-tion in appendixA.1. In section7.1.3it is describedhow finite elementdiscretiza-tionsof j, j′, j′′ and∂Φ, etc.,canbecomputed.

SecondReformulation

Wenow considerthecasewhere

Π(u) = u− P[β1,β2](u− λ−1j′(u))

is chosento reformulatetheproblemasequationΠ(u) = 0.


Theorem7.12. Theproblemassumptionsimply that Assumption5.14(a), (b) (withZ = 0) is satisfiedwith F = j′ for r = 2 and any p′ ∈ (2,∞] if n = 1,p′ ∈ (2,∞) if n = 2, andp′ ∈ (2, 2n/(n− 2)] if n ≥ 3.

In particular, Theorem5.15yieldsthe∂Π-semismoothnessof theoperatorΠ.Hereby, thedifferential∂Π(u) consistsof all operatorsM ∈ L(Lr, Lr),

M = I + λ−1d ·Gu(u),d ∈ L∞(Ω), d ∈ ∂P[β1,β2]

(−λ−1G(u))

on Ω.(7.14)

Proof. Theassertionsfollow immediatelyfrom theboundednessof Ω, Lemma7.7,andTheorem5.15. utA resultestablishinghigherordersemismoothnessanalogousto Theorem7.10canalsobeestablished,but we donot formulateit here.

Remark 7.13. Sincej′′(u) = λI + Gu(u), theadjoint representationof appendixA.1 canbeusedto computeGu(u).

Regularity

For the applicationof semismoothNewton methods,a regularity conditionlike inAssumption3.59(i) hasto hold.For theproblemunderconsideration,wecanestab-lish regularityby usingthesufficientconditionof Theorem4.8.Sincethisconditionwasestablishedfor NCPs(but canbeextendedto othersituations),we considerthecaseof theNCP, i.e.,β1 = 0, β2 =∞. To applyTheorem4.8,wehaveto verify theconditionsof Assumption4.6.

The assumptions(a)–(d) follow immediatelyfrom Lemma7.7 for p′ as in theLemmaandany p ∈ [p′,∞]. NoteherebythatG′(u) = j′′(u) − λI is selfadjoint.Assumptions(e) requiresthat theHessianoperatorj′′(u) is coerciveon thetangentspaceof thestronglyactiveconstraints,whichis aninfinite-dimensionalanalogueofthestrongsecondordersufficient conditionfor optimality. The remainingassump-tions(f)–(h) only concerntheNCP-functionandaresatisfiedfor φ = φFB aswellasφ(x) = x1−P[0,∞)(x1−λ−1x2), theNCP-functionusedin thesecondreformu-lation.

Application of SemismoothNewtonMethods

In conclusion,we have shown that problem(7.1) satisfiesall assumptionsthat arerequiredto prove superlinearconvergenceof our classof (projected)semismoothNewton methods.Hereby, both typesof reformulationsareappropriate,the oneofsection5.1.1andthesemismoothreformulationof section4.2, the latteryielding asmoothing-step-freemethod.Numericalresultsaregivenin section7.2.

128 7. Applications

7.1.2 All-at-Once Approach

We now describe,in somelessdetail, how mixed semismoothNewton methodscan be appliedto solve the all-at-onceKKT-system.The continuousinvertibilityof Ey(y, u) = −∆ + ϕ′(y)I ∈ L(H1

0 ,H−1) guaranteesthat Robinson’s regular-

ity conditionis satisfied,so that every solution(y, u) satisfiesthe KKT-conditions(5.24)–(5.26),where w ∈ W = H1

0 (Ω) is a multiplier. The LagrangefunctionL : Y × U ×W → R is givenby

L(y, u,w) = J(y, u) + 〈E(y, u), w〉H−1,H10

= J(y, u) + 〈−∆w, y〉H−1,H10

+ 〈ϕ(y), w〉H−1,H10

− (f,w)L2 − (gu,w)L2 .

Now, usingtheresultsof theprevioussections,weobtain

Lemma 7.14. The Lagrange functionL is twice continuouslydifferentiablewithderivatives

Ly(y, u,w) = Jy(y, u) +Ey(y, u)∗w = y − yd −∆w + ϕ′(y)w,Lu(y, u,w) = Ju(y, u) +Eu(y, u)∗w = λ(u− ud)− gw,Lw(y, u,w) = E(y, u),Lyy(y, u,w) = (1 + ϕ′′(y)w)I,Lyu(y, u,w) = 0, Luy(y, u,w) = 0, Luu(y, u,w) = 0.

SinceLw = E, wehaveLwy = Ey, etc.,seeLemma7.1 for formulas.Furthermore,Lu canbewritten in theform

Lu(y, u,w) = λu+G(y, u,w), G(y, u,w) = −gw − λud.Themapping(y, u,w) ∈ Y × U ×W 7→ G(y, u,w) ∈ Lp′(Ω) is continuousaffinelinear for all p′ ∈ [2,∞] if n = 1, p′ ∈ [2,∞) if n = 2, andp′ ∈ [2, 2n/(n− 2)] ifn ≥ 3. Asa consequence, themapping

(y, u,w) ∈ Y × Lp(Ω)×W 7→ Lu(y, u,w) ∈ Lr(Ω)

is continuousaffinelinear for all p ∈ [2,∞] andall r ∈ [1,minp, p′].Proof. The differentiability propertiesandthe form of the derivativesis an imme-diateconsequenceof Lemma7.1.Themappingpropertiesof Lu aredueto thefactthattheembeddingH1

0 → Lp′

is continuous. utFor KKT-tripleswe havethefollowing regularity result:

Lemma 7.15. EveryKKT-triple (y, u, w) ∈ Y × U × W of (7.11) satisfiesu ∈L∞(Ω) if β1, β2 ∈ R, andu ∈ Lp′(Ω) with p′ asin Lemma7.14, otherwise.

Proof. Theproof of Lemma7.8canbeeasilyadjusted. ut


FromLemma7.14we concludethatAssumption5.17(a)–(c)is satisfiedfor r = 2,all p ∈ [2,∞], andall p′ ≤ p asin the lemma.Hence,usingan MCP-functionπthat satisfiesAssumption5.17 (d), we can write the KKT conditionsin the form(5.27), andTheorem5.19 yields the semismoothnessof Σ. Furthermore,Lemma7.14impliesthatAssumption5.27is satisfiedfor p = p′, andwe thuscancomputesmoothingstepsasdescribedin Theorem5.29.Therefore,if thegeneralizeddiffer-ential is regular nearthe KKT-triple (y, u, w) ∈ Y × Lp(Ω) × W , p = p′, (cf.Lemma7.15),thesemismoothNewton methodsof section3.2.3areapplicableandconvergesuperlinearly.

In a similar way, we can dealwith the secondmixed reformulation,which isbasedonAssumption5.20.

7.1.3 Finite ElementDiscretization

For the discretizationof the stateequation,we follow [62, Ch. IV.2.5], [63, App.1.6.4].LetΩ ⊂ R2 beaboundedpolygonaldomainandlet T h bearegulartriangu-lationof Ω:

• T h = Thi : Thi is a triangle, i = 1, . . . ,mh.• ⋃Th∈T h Th = Ω, intThi ∩ intThj = ∅ for all i 6= j.

• For all i 6= j, Thi ∩ Thj is eithera commonedgeor a commonvertex or theemptyset.

• The parameterh denotesthe lengthof the longestedgeof all trianglesin thetriangulation.

Now, wedefine

V h = vh ∈ C0(Ω) : vh|T affine linearfor all T ∈ T h,V h0 = vh ∈ V h : vh|∂Ω = 0.

Further, denotebyΣh thesetof all verticesin thetriangulationT h andby

Σh0 = P ∈ Σh : P /∈ ∂Γ

thesetof all interior verticesof T h.For any P ∈ Σh

0 thereexistsa uniquefunctionβhP ∈ V h0 with βhP (P ) = 1 andβhP (Q) = 0 for all Q ∈ Σh,Q 6= P . Thesetβh = βhP : P ∈ Σh

0 is abasisof V h0 ,andwecanwrite any vh ∈ V h0 uniquelyin theform

vh =∑P∈Σh

0

vhPβhP , with vhP = vh(P ).

ThespaceHh ⊂ L∞(Ω) is definedby

Hh = uh ∈ L∞(Ω) : uh|T constantfor all T ∈ T .

130 7. Applications

Hereby, thespecificvaluesof uh on theedgesof the triangles(which arenull sets)are not relevant. The set of functionsηh = ηhT : T ∈ T h, ηhT = 1 on T andηhT = 0, otherwise,formsabasisof Hh, andfor all uh ∈ Hh holds

uh =∑T∈T h

uhT ηhT , where uh|T ≡ uhT .

For any P ∈ Σh0 , let Ωh

P be the polygon aroundP whoseboundaryconnectsmidpointsof edgesemanatingfrom P with midpointsof trianglescontainingPand this edge.By χhP , we denotethe characteristicfunction of ΩP , being equalto oneonΩh

P andvanishingonΩ \ ΩP . Finally, we introducethe linear operatorLh : C0(Ω) ∩H1

0 (Ω)→ L∞(Ω),

Lhv =∑P∈Σh

v(P )χhP .

Obviously,Lhv is constanton intΩP with valuev(P ).WechooseHh for thediscretecontrolspaceandV h0 for thediscretestatespace.

Now, wediscretizethestateequationasfollows:

(yh, vh)H10

+∫Ω

ϕ(Lhyh)(Lhvh)dx = (f + guh, vh)L2 ∀ vh ∈ V h0 . (7.15)

It is easyto seethat∫Ω

ϕ(Lhyh)(LhβhP )dx = ϕ(yhP )(LhβhP , LhβhP )L2

= µ(ΩP )ϕ(yhP ) =13

∑T3P

µ(T )ϕ(yhP ).

Theobjective functionJ is discretizedby

Jh(yh, uh) =12

∫Ω

(Lhyh − yd)2dx+λ

2

∫Ω

(uh − ud)2dx.

Remark 7.16. For thefirst integral in Jh we alsocouldhaveused∫Ω

(yh − yd)2dx,

but in coordinateform thiswouldresultin aquadratictermof theform 12yh

TMhyh,

with non-diagonalmatrixMh, Mhij = (βhi , β

hj )L2 , whichwouldmakethenumerical

computationsmoreexpensive.

ThediscretefeasiblesetisCh = Hh ∩ C.

Thus,we canwrite down thefully discretecontrolproblem:


minimizeyh∈V h

0 ,uh∈Hh

12

∫Ω

(Lhyh − yd)2dx+λ

2

∫Ω

(uh − ud)2dxsubjectto (yh, vh)H1

0+ (ϕ(Lhyh), Lhvh)L2

= (f + guh, vh)L2 ∀ vh ∈ V h0uh ∈ Ch.

(7.16)

Next, we intendto write (7.16)in coordinateform. To thisend,let

Σh0 =

P h1 , . . . , P

hnh

, βhi = βhPh

i, ηhl = ηhTh

l.

Further, wewrite yh ∈ Rnh

for thecoordinatesof yh ∈ V h0 with respectto thebasisβh = βhi anduh ∈ Rmh

for thecoordinatesof uh ∈ Hh with respectto thebasisηh = ηhl . WedefinethematricesAh,Sh ∈ Rnh×nh

,

Ahij = (βhi , β

hj )H1

0, Shij = (Lhβhi , L

hβhj )L2 , (7.17)

(notethatSh is diagonalandpositivedefinite),thevectorsfh, ϕ(yh) ∈ Rnh

,

fhi = (βhi , f)L2 , ϕ(yh)i = ϕ(yhi ),

andthematrixGh ∈ Rnh×mh

,

Ghil = (βhi , gη

hl )L2 .

Then(7.15)is equivalentto thenonlinearsystemof equations

Ahyh + Shϕ(yh) = fh + Ghuh. (7.18)

Further, in coordinateswecanwrite Jh as

Jh(yh,uh) =12yh

TShyh − yhd

TShyh +

λ

2uh

TMhuh − λuhd

TMhuh + γ,

wherethemassmatrixMh ∈ Rmh×mh

, thevectorsyhd ∈ Rnh

, uhd ∈ Rmh

, andthescalarγ aredefinedby

Mhkl = (ηhk , η

hl )L2 , (yhd)i =

1µ(ΩPi

)

∫ΩPi

yd(x)dx,

(Mhuhd)l = (ηhl , ud)L2 , γ =12‖yd‖2L2 +

λ

2‖ud‖2L2 .

Finally, we notethatuh ∈ Ch if andonly if its ηh-coordinatesuh satisfyuh ∈ Ch,where

Ch = uh ∈ Rmh

: uhl ∈ C, l = 1, . . . ,mh.Thus,we canwrite down thefully discretecontrolproblemin coordinateform:

132 7. Applications

minimizeyh∈Rnh

,uh∈RmhJh(yh,uh)

subjectto Ahyh + Shϕ(yh) = fh + Ghuh, uh ∈ Ch.(7.19)

It is advisableto considerproblem(7.19)only in conjunctionwith the coordinate-freeversion(7.16),since(7.16)still containsall the informationon theunderlyingfunctionspaceswhileproblem(7.19)doesnot.Toexplainthisin moredetail,wegiveaverysimpleexample(readersfamiliarwith discretizationsof controlproblemscanskip theexample):

Example7.17. Let usconsiderthetrivial problem

minimizeu∈L2(Ω)

j(u) def=12‖u‖2L2 .

Sincej′(u) = u, from any pointu ∈ L2 a gradientstepwith stepsize1 bringsustothesolutionu∗ ≡ 0. Of course,for aproperdiscretizationof thisproblem,weexpecta similar behavior. DiscretizingU = L2(Ω) by Hh asabove, andj by jh(uh) =j(uh) = ‖uh‖2L2/2, we have jh

′(uh) = uh andthus,after onegradientstepwithstepsize1, we have found the solution.Consequently, if uh aretheηh-coordinatesof uh, thentheηh-coordinatesjh

′(uh) of jh(uh) = uh arejh′(uh) = uh, andthe

step−jh′(uh) bringsusfrom uh to thesolution0.

However, thefollowing approachyieldsacompletelydifferentresult:In coordi-nateform, thediscretizedproblemreads

minimizeuh∈Rmh

jh(uh) with jh(uh) =12uh

TMhuh.

Differentiatingjh(uh) with respectto uh yields

d

duhjh(uh) = Mhuh = Mhjh

′(uh).

Since‖Mh‖ = O(h2), this Euclideangradientis very shortanda gradientstepofstepsizeonewill provide almostno progress.Therefore,it is crucial to work withgradientsthatarerepresentedwith respectto thecorrectinnerproduct,in our casetheoneinducedby thematrix Mh, which correspondsto the innerproductof Hh,thediscretizationof L2.

7.1.4 DiscreteBlack-Box-Approach

We proceedby discussingthe black-boxapproach,appliedto the discretecontrolproblem(7.16).It is straightforwardto deriveanaloguesof Lemmas7.1–7.7for thediscretecontrolproblem.In particular, thediscretestateequation(7.15)possessesauniquesolutionoperatoruh ∈ Hh 7→ yh(uh) ∈ V h0 which is twice continuouslydifferentiable.The reducedobjective function is jh(uh) = Jh(yh(uh), uh) where


yh = yh(uh) solves (7.15), or, in coordinateform, jh(uh) = Jh(yh(uh),uh),whereyh = yh(uh) solves(7.18).

Thediscreteadjointequationis givenby thevariationalequation

∀vh ∈ V h0 :

(vh, wh)H10

+ (ϕ′(Lhyh)Lhvh, Lhwh)L2 = 〈−Jhyh(yh, uh), vh〉H−1,H10.

Thecoordinateswh ∈ Rnh of thediscreteadjointstatewh ∈ V h0 arethusgivenby(Ah + Th(yh)

)wh = −Sh(yh − yhd ),

whereTh(yh) = Shdiag

(ϕ′(yh1 ), . . . , ϕ′(yhnh)

).

Thediscretereducedgradientjh′(uh) ∈ Hh satisfies

(jh′(uh), zh)L2 = (Jhuh(yh, uh), zh)L2 + (wh,−gzh)L2

= (λ(uh − ud)− gwh, zh)L2 .

Now observe that(∑k(M

h−1GhTwh)kηhk ,

∑lηhl z

hl

)L2

= zhTMhMh−1

GhTwh

= zhTGhTwh = (wh, gzh)L2 = (gwh, zh)L2 .

Hence,theηh-coordinatesof jh′(uh) are

jh′(uh) = λ(uh − uhd)−Mh−1

GhTwh.

As alreadyillustratedin Example7.17,thevectorjh′(uh) is not theusualgradient

of jh(uh) with respectto uh, which correspondsto thegradientrepresentationwithrespectto theEuclideaninnerproduct.In fact,wehave

d

duhjh(uh) = λMh(uh − uhd)−GhTwh = Mhjh

′(uh). (7.20)

Rather, jh′(uh) is the gradientrepresentationwith respectto the inner productof

Hh, which is representedby thematrixMh.Writing down thefirst-ordernecessaryconditionsfor thediscretereducedprob-

lem (7.16),we obtain

uh ∈ Ch, (jh′(uh), vh − uh)L2 ≥ 0 ∀ vh ∈ Ch. (7.21)

In coordinateform, this becomes

uh ∈ Ch, jh′(uh)TMh(vh − uh) ≥ 0 ∀ vh ∈ Ch. (7.22)

134 7. Applications

SinceMh is diagonalpositivedefinite,wecanwrite (7.21)equivalentlyas

uhl − PC(uhl − jh′(uh)l) = 0, l = 1, . . . ,mh.

This is thediscreteanalogueof thecondition

u− PC(u− j′(u)) = 0,

whichweusedto expressthecontinuousproblemin theform

Π(u) def= π(u, j′(u)) = 0, (7.23)

whereπ = φ[α,β] is a continuousMCP-functionfor the interval [α, β]. As in thefunctionspacecontext, we applyanMCP-functionπ = φ[α,β] to reformulate(7.22)equivalentlyin theform

Πh(uh) def=

π(uh1 , j

h′(uh)1)

...π(uhmh , jh

′(uh)mh

) = 0. (7.24)

This is thediscreteversionof theequationreformulation(7.12).If π is semismooththen,dueto thecontinuousdifferentiabilityof jh

′, alsoΠh is semismoothandfinite-

dimensionalsemismoothNewton methodscanbe applied.We expecta closerela-tionshipbetweentheresultingdiscretesemismoothNewton methodandthesemis-moothNewton methodfor the original problemin function space.This relationisestablishedin thefollowing considerations:

First,we have to identify thediscretecorrespondentto thegeneralizeddifferen-tial ∂Π(u) in Theorem7.9.Let B ∈ ∂Π(u). Thenthereexistsd ∈ (L∞)2 withd(x) ∈ ∂π(u(x), j′(u)(x)) onΩ suchthatB = d1I + d2 · j′′(u). Replacingu byuh andj by jh, asuitablediscretizationof B is obtainedby

Bh = dh1I + dh2 · jh′′(uh), (7.25)

dhi ∈ Hh, dh(x) ∈ ∂π(uh(x), jh′(uh)(x)), x ∈ Ω. (7.26)

Sinceuh andjh′(uh) areelementsofHh, they areconstantonany triangleTl ∈ T h

with valuesuhl and jh′(uh)l, respectively. Denotingby dhi the ηh-coordinatesof

dhi ∈ Hh, thefunctionsdhi areconstantonany triangleTl with valuesdhil. Therefore,(7.26)is equivalentto

(dh1l,dh2l) ∈ ∂π

(uhl , j

h′(uh)l), 1 ≤ l ≤ mh.

Let jh′′(uh) ∈ Rmh×mh

denotethe matrix representationof jh′′(uh) with re-

spectto theHh-innerproduct.More precisely, jh′′(uh)zh aretheηh-coordinatesof

jh′′(uh)zh; thus,for all zh, zh ∈ Hh andcorrespondingcoordinatevectorszh, zh,

wehave


(zh, jh′′(uh)zh)L2 = zh

TMhjh

′′(uh)zh.

Thematrix representationof Bh with respectto theHh innerproductis

Bh = Dh1 + Dh

2jh′′(uh),

whereDhi = diag(dhi ). In fact,

(ηhl , Bhzh)L2 = (ηhl , d

h1zh)L2 + (ηhl , d

h2 jh′′(uh)zh)L2

= (ηhl ,dh1lz

h)L2 + (ηhl ,dh2lj

h′′(uh)zh)L2

=(Mh(dh1lz

h))l+(Mh(dh2lj

h′′(uh)zh))l.

Therefore,the matrix representationof the discretecorrespondentto ∂Π(u) is∂Πh(uh), thesetconsistingof all matricesBh ∈ Rmh×mh

with

Bh = Dh1 + Dh

2jh′′(uh), (7.27)

whereDh1 andDh

2 arediagonalmatricessuchthat((Dh

1)ll, (Dh2)ll) ∈ ∂π(uhl , jh′(uh)l), l = 1, . . . ,mh.

Next, we show that there is a very closerelationshipbetween∂Πh and finite-dimensionalsubdifferentialsof the function Πh. To establishthis relation, let usfirst notethatthecoordinaterepresentationjh

′′(uh) of jh′′(uh) satisfies

jh′′(uh) =

d

duhjh′(uh).

In fact,wehave for all zh, zh ∈ Hh andcorrespondingcoordinatevectorszh, zh

zhTMhjh

′′(uh)zh = (zh, jh

′′(uh)zh)L2 = zh

T d2

duh2 jh(uh)zh

= zhT d

duh(Mhjh

′)(uh)zh = zh

TMh d

duhjh′(uh)zh,

wherewehaveused(7.20).Thisshowsthatfor therowsof ∂Πh holds

∂Πhl = ∂π d

duhl

(uhl

jh′(uh)l

)in thesenseof Proposition3.7andthat,by Propositions3.3and3.7,Πh

l is ∂Πhl -

semismoothif π is semismooth.Therefore,Πh is ∂Πh-semismoothby Proposi-tion 3.5. If π is α-ordersemismoothandjh

′is differentiablewith α-Holdercontin-

uousderivative, then the above reasoningyields that Πh is evenα-order∂Πh-semismooth.

Finally, thereis alsoa closerelationshipbetween∂Πh and∂CΠh. In fact,bythechainrule for Clarke’s generalizedgradientwe have

136 7. Applications

∂CΠh(uh) ⊂ ∂Πh(uh).

Underadditionalconditions(e.g.,if π or−π is regular),equalityholds.If wedonothaveequality, workingwith thedifferential∂Πh hastheadvantagethat∂π andthederivativesof its argumentscanbecomputedindependentlyof eachother, whereasin generalthecalculationof ∂CΠh(uh) is moredifficult.

Wecollecttheobtainedresultsin thefollowing theorem:

Theorem7.18. The discretization of the equation reformulation (7.23) of (7.1)in coordinate form is given by (7.24). Further, the multifunction ∂Πh, where∂Πh(uh) consistsof all Bh ∈ Rmh×mh

definedin (7.27), is thediscreteanalogueof thegeneralizeddifferential∂Π. Wehave

∂CΠh(uh) ⊂ ∂Πh(uh)

with equalityif, e.g., π or −π is regular.If π is semismooth,thenΠh is ∂Πh-semismoothandalsosemismoothin the

usualsense. Further, if π is α-order semismoothand if jh (and thus jh) is twicecontinuouslydifferentiablewithα-Holder continuoussecondderivative, thenΠh isα-order∂Πh-semismoothandalsoα-order semismoothin theusualsense.

Having establishedthe∂Πh-semismoothnessof Πh, we canuseany variantof thesemismoothNewtonmethodsin sections3.2.3–3.2.5to solve thesemismoothequation(7.24).We stressthat in finite dimensionsno smoothingstepis requiredtoobtainfastlocal convergence.However, sincethefinite-dimensionalproblem(7.24)is a discretizationof thecontinuousproblem(7.12),we should,if necessary, incor-poratea discreteversionof a smoothingstepto ensurethat the algorithmexhibitsmeshindependentbehavior.

Theresultinginstanceof Algorithm 3.9thenbecomes:

Algorithm 7.19. Inexact SemismoothNewton’sMethod

0. Chooseaninitial pointuh0 ∈ Rmh andsetk = 0.

1. Computethediscretestateyhk ∈ Rnh

by solvingthediscretestateequation

Ahyhk + Shϕ(yhk) = fh + Ghuhk .

2. Computethe discreteadjoint statewhk ∈ Rnh

by solving the discreteadjointequation (

Ah + Th(yhk))whk = −Sh(yh − yhd ).

3. Computethediscretereducedgradient

jhk′= λ(uhk − uhd)−Mh−1

GhTwh

andthevectorΠhk ∈ Rnh

, (Πhk)l = π

((uhk)l, j

hk

′l

).

4. If (Πhk

TMhΠh

k)1/2 ≤ ε, thenSTOPwith resultuh

∗ = uhk .


5. ComputeBhk ∈ ∂Πh(uhk) (detailsaregivenbelow).

6. Computeshk ∈ Rmh

by solving the semismoothNewton system(detailsaregivenbelow)

Bhkshk = −Πh

k ,

andsetuh,0k+1 = uhk + shk .

7. Performasmoothingstep(if necessary):uh,0k+1 7→ uhk+1.


Remark 7.20.

(a) Wecanallow for inexactnessin thematricesBhk , whichresultsin aninstanceof

Algorithm 3.13.In fact,aswasshown in Theorem3.15,besidestheuniformlyboundedinvertibility of thematricesBh

k we only needthat

infB∈∂Πh(uh

k)‖(B−Bh

k)shk‖ = o(‖shk‖)

as‖shk‖ → 0 to achievesuperlinearconvergence.

(b) We also can achieve that the iteration staysfeasiblewith respectto a closedconvex setKh which containsthe solutionof (7.24).This canbeachievedbyincorporatinga projectiononto Kh in the algorithmafter the smoothingstepandresultsin aninstanceof Algorithm 3.19.In thefollowing, weonly considertheprojection-freealgorithmandtheprojectedversionwith projectionontoCh,which is givenby coordinatewiseprojectionontoC.

(c) Theefficiency of thealgorithmcrucially dependson theefficient solvability oftheNewtonequationin step6. Weproposeanefficientmethodin section7.1.5.

(d) We observedin Lemma7.7thatj′(u) = λu+G(u), where

u ∈ U 7→ G(u) = −gw(u)− λud ∈ Lp′(Ω)

is locally Lipschitzcontinuouswith p′ > 2. Weconcludedthatasmoothingstepis givenby thescaledprojectedgradientstep

u 7→ PC(u− λ−1j′(u)) = PC(ud + λ−1gw(u)).

Therefore,a discreteversionof thesmoothingstepis givenby

uh 7→ PC(uh − λ−1jh

′(uh)

)= PC

(uhd + λ−1Mh−1

GhTwh). (7.28)

Due to the smoothingpropertyof G we alsocanapply a smoothing-step-freesemismoothNewtonmethodby choosing

π(x) = x1 − PC(x1 − λ−1x2)

for thereformulation,which resultsin

138 7. Applications

Π(u) = u− PC(−λ−1G(u)

)= u− PC

(ud + λ−1gw(u)

).

In thediscretealgorithm,this correspondsto

Πh(uh) = uh − PC(uh − λ−1jh

′(uh)

)= uh − PC

(uhd + λ−1Mh−1

GhTwh).

(7.29)

In section7.2,we presentnumericalresultsfor bothvariants,theonewith gen-eral MCP-functionπ andsmoothingstep(7.28),and the smoothing-step-freealgorithmwith Πh asdefinedin (7.29).

7.1.5 Efficient Solution of the NewtonSystem

WerecallthatamatrixBhk ∈ Rmh×mh

is containedin ∂Πh(uhk) if andonly if

Bhk = Dh

k1 + Dhk2j

h′′(uhk),

whereDhk1 andDh

k2 arediagonalmatricessuchthat((Dh

k1)ll, (Dhk2)ll

) ∈ ∂π((uhk)l, jh′(uhk)l). (7.30)

Further, for thechoicesof functionsπ we aregoingto use,namelyφFBC andφE,σC :x 7→ φEC(x1, σx2), σ > 0, thecomputationof ∂π, andthusof thematricesDh

ki, isstraightforward. Concerningthe calculationof ∂φE,σC , seeProposition5.6; for thecomputationof ∂φFBC , we refer to [54]. In bothcases,thereexist constantsci > 0suchthatfor all x ∈ R2 andall d ∈ ∂π(x) holds

0 ≤ d1, d2 ≤ c1, d1 + d2 ≥ c2.In particular, the matricesDh

ki are positive semidefinitewith uniformly boundednorms,andDh

k1 + Dhk2 is positivedefinitewith uniformly boundedinverse.

Weobservedearliertherelation

jh′′(uh) = Mh−1 d2

duh2jh(uh).

For the computationof the right hand side we use the adjoint representationofappendixA.1, applied to problem(7.19). The stateequationfor this problem isEh(yh,uh) = 0 with

Eh(yh,uh) = Ahyh + Shϕ(yh)− fh −Ghuh,

andtheLagrangefunctionis givenby

Lh(yh,uh) = Jh(uh) + whTEh(yh,uh).

Observe that


d

dyhEh(yh,uh) = Ah + Th(yh),

d

duhEh(yh,uh) = −Gh,

d2Lh

d(yh,uh)2(yh,uh,wh) =

(Sh + Sh diag(ϕ′′(yh)) diag(wh) 0

0 λMh

).

Therefore,introducingthediagonalmatrix

Zh(yh,wh) = Sh(I + diag(ϕ′′(yh)) diag(wh)

),

andomitting theargumentsfor brevity, weobtainby theadjointformula

d2

duh2 jh(uh) =

((dEh

dyh

)−1dEh

duh

−I

)Td2Lh

d(yh,uh)2

((dEh

dyh

)−1dEh

duh

−I

)= GhT (Ah + Th(yh))−1Zh(yh,wh)(Ah + Th(yh))−1Gh + λMh.

TheHessianjh′′(uh) with respectto theinnerproductof Hh is thusgivenby

jh′′(uh) = Mh−1

GhT (Ah +Th(yh))−1Zh(yh,wh)(Ah + Th(yh))−1Gh + λI.

Therefore,thematricesBh ∈ ∂Πh(uh) aregivenby

Bh = Dh + Dh2M

h−1GhT (Ah + Th(yh))−1Zh(yh,wh)(Ah + Th(yh))−1Gh,

whereDh1 andDh

2 satisfy(7.30)and

Dh def= Dh1 + λDh

2 .

Note thatDh is diagonal,positive definite,andDh aswell asDh−1arebounded

uniformly in uh.Sincecomputing(Ah + Th(yh))−1vh meanssolvingthelinearizedstateequa-

tion, it is notapriori clearthatNewton’sequationin step6 of Algorithm 7.19canbesolvedefficiently. It is alsoimportantto observethatthemaindifficultiesarecausedby the structureof the Hessianjh

′′, not so muchby the additionalfactorsDh

1 andDh

2 appearingin Bh. In otherwords,it is alsonot straightforwardhow theNewtonsystemfor theunconstrainedreducedcontrolproblemcanbesolvedefficiently.

However, thematrixBh is a discretizationof theoperator

(d1 + λd2)I + d2g · (−∆+ ϕ′I)−1[(1 + ϕ′′w)I](−∆+ ϕ′I)−1(gI).

Hence,one possibility to solve the discretizedsemismoothNewton systemeffi-ciently is to usethecompactnessof theoperator

(−∆+ ϕ′I)−1[(1 + ϕ′′w)I](−∆+ ϕ′I)−1[gI]

140 7. Applications

to apply multigrid methodsof the secondkind [72, Ch. 16]. Thesemethodsaresuitablefor solvingproblemsof theform

u = Ku+ f,

whereK : U 7→ V → U (compactembedding).Theapplicationof (−∆+ ϕ′I)−1

to a function, i.e., applicationof (Ah + Th(yh))−1 to a vector, canbe doneeffi-ciently by using,onceagain,multigrid methods.We believe that this approachhascomputationalpotential.In our computationshowever, we usea differentstrategythatwedescribenow.

To developthis approach,weconsidertheNewtonsystem

Bhsh = −Πh(uh) (7.31)

andderive an equivalentsystemof equationsthat, undercertainassumptions,canbesolvedefficiently. Hereby, we usetherelationsthatwe observedin section5.2.3betweenthesemismoothNewton systemof thereducedNewton systemandsemis-moothNewton systemobtainedfor the all-at-onceapproach.To this end,considerthesystem

d2

dyh2 Lh d2

dyhduh Lh d2

dyhdwh Lh 0

Dh2M

h−1 d2

duhdyh Lh Dh1 + Dh

2Mh−1 d2

duh2 Lh Dh2M

h−1 d2

duhdwh Lh −Πh

d2

dwhdyh Lh d2

dwhduh Lh d2

dwh2 Lh 0

.

Usingtheparticularform of Lh, this becomes Zh 0 Ah + Th 00 Dh −Dh

2Mh−1

GhT −Πh

Ah + Th −Gh 0 0

.

Performingthetransformation

Row 1→ Row 1− Zh(Ah + Th)−1 × Row 3

yieldstheequivalentsystem 0 Zh(Ah + Th)−1Gh Ah + Th 00 Dh −Dh

2Mh−1

GhT −Πh

Ah + Th −Gh 0 0

, (7.32)

andby thetransformation

Row 2→ Row 2 + (Dh2M

h−1GhT )(Ah + Th)−1 × Row 1,

wearrive at

7.1 DistributedControlof aNonlinearElliptic Equation 141 0 Zh(Ah + Th)−1Gh Ah + Th 00 Bh 0 −Πh

Ah + Th −Gh 0 0

.

This shows thatBh appearsasa Schurcomplementof (7.32).Hence,if we solve(7.32),we alsohaveasolutionof theNewtonsystem(7.31).

For deriving anefficient strategy for solving (7.32),we first observe thatDh isdiagonalandnonsingular. FurtherthediagonalmatrixZh is invertibleif andonly if

ϕ′′(yh)iwhi 6= 1 ∀ l = 1, . . . , nh. (7.33)

In particular, thisholdstrueif ϕ′′(yh)iwhi is smallfor all i. If, e.g.,thestateequation

is linear, thenϕ′′ ≡ 0. Further, if yh is sufficiently closeto thedatayhd , thentherighthandsideof theadjointequationis smallandthuswh is small.Bothcasesresultin apositivedefinitediagonalmatrixZh. If (7.33)happenstobeviolated,wecanperforma smallperturbationof Zh (but sufficiently largeto avoid numericalinstabilities)tomake it nonsingular.

With Dh andZh beinginvertible,we transform(7.32)accordingto

Row 3→ −Row 3 + (Ah + Th)Zh−1 × Row 1−GhDh−1 × Row 2,

andobtain Zh 0 Ah + Th 00 Dh −Dh

2Mh−1

GhT −Πh

0 0 Qh GhDh−1Πh

,

where

Qh = GhDh−1Dh

2Mh−1

GhT + (Ah + Th)Zh−1

(Ah + Th).

ThematrixDh−1Dh

2Mh−1

is diagonalandpositivedefinite.Hence,Qh is symmet-ric positivedefiniteif Zh is positivedefinite.Furthermore,Qh canbeinterpretedasthediscretizationof thedifferentialoperator

d2g2

d1 + λd2I + (−∆+ ϕ′(y)I)

(1

1 + ϕ′′(y)wI

)(−∆+ ϕ′(y)I),

which is elliptic if (1 + ϕ′′(y)w) is positiveonΩ.Hence,fast solvers(multigrid, preconditionedconjugategradient,etc.) canbe

usedto solve thesystem

Qhvh = GhDh−1Πh. (7.34)

Then,thesolutionsh of theNewtonsystem(7.31)is obtainedas

sh = −Πh + Dh−1Dh

2Mh−1

GhTvh.

142 7. Applications

7.1.6 DiscreteAll-at-Once Approach

Thedetailedconsiderationsof theblack-boxapproachcanbecarriedout in asimilarway for semismoothreformulationsof the KKT-systemof the discretizedcontrolproblem.We think thereis no needto discussthis in detail. In the discreteall-at-onceapproach,Lhuh = Mh−1(d/duh)Lh plays the role of jh

′, and the resulting

systemto solvehasthestructure Zh 0 Ah + Th −(d/dyh)Lh

0 Dh −Dh2M

h−1GhT −Πh

Ah + Th −Gh 0 −(d/dwh)Lh

,

seesection7.1.5. If a globalizationis used,it is importantto formulatethe meritfunctionby meansof thecorrectnorms:

12

[dLh

dyh

]TAh−1 dLh

dyh+

12ΠhTMhΠh +

12

[dLh

dyh

]TAh−1 dLh

dwh,

andto representgradientswith respectto thecorrectinnerproducts.

7.2 Numerical Results

We now presentnumericalresultsfor problem(7.1).Hereby, thedomainis theunitsquareΩ = (0, 1)× (0, 1). Forϕ we chooseϕ(y) = y3, which satisfiesthegrowthconditionwith s = 4. The choiceof the otherdatais orientedon [14, Ex. 5.1.1](therein,however, thestateequationis linearandcorrespondstoϕ ≡ 0):

β1 = −∞, β2 = 0,

yd(x) =16

sin(2πx1) sin(2πx2)e2x1 ,

ud ≡ 0, λ = 10−3.

(7.35)

Figure7.1 shows thecomputedoptimalcontrolon T 1/32 andFigure7.2 thecorre-spondingstate. Thecodewasimplementedin MatlabVersion6 Release12, usingsparsematrix computations.Although Matlab is quite efficient, it usually cannotcompetewith Fortranor C implementations,which shouldbe kept in mind whenevaluatingthe runtimesgivenbelow. The computationswereperformedunderSo-laris8 onaSunSPARC Ultra workstationwith asparcv9processoroperatingat360MHz.

Wepresentresultsfor

1. Reformulationsof theblack-boxVIP (7.11),

2. Reformulationsof theall-at-onceKKT-system(5.24)–(5.26),

to whichweapplytwo variantsof thesemismoothNewtonmethod,

7.2 NumericalResults 143

00.2

0.40.6

0.81

0

0.2

0.4

0.6

0.8

1−8

−7

−6

−5

−4

−3

−2

−1

0

Figure7.1Optimalcontrol u (h = 1/32).

1. Algorithm 3.9(noconstraints),

2. Algorithm 3.17with K = C,In bothcasesweconsiderthefollowing choicesof MCP-functions:

1. π(x) = x1 − P(−∞,0](x1 − λ−1x2) (smoothing-step-freealgorithm).

2. π(x) = −φFB(−x).We obtaineight (actuallysix, seebelow) variantsof algorithms,which aredenotedby A111–A222,wherethe threenumbersexpressthe choicesfor the threecriteriagiven above. For instance,A221 standsfor Algorithm 3.17, appliedto the KKT-system,with K = C andπ(x) = x1 − P(−∞,0](x1 − λ−1x2). Sincein the classAxy2 we computesmoothingstepsasdescribedin section4.1, andthe smoothingstepcontainsalreadya projectionontoC, we have A112=A122, A212=A222.Wewill usethenamesA112andA212 in thesequel.

7.2.1 Using Multigrid Techniques

For the efficient solution of the discretestateequation(neededin the black-boxapproach),and the linearizedstateequation(neededin the all-at-onceapproach),we usea conjugategradientmethodthat is preconditionedby onemultigrid (MG)

144 7. Applications

00.2

0.40.6

0.81

0

0.2

0.4

0.6

0.8

1−0.12

−0.1

−0.08

−0.06

−0.04

−0.02

0

Figure7.2Optimalstatey(u) (h = 1/32).

V-cycle with one red-blackGauß-Seideliterationas presmootherandoneadjointred-blackGauß-Seideliterationaspostsmoother. Standardreferenceson multigridmethodsinclude [23, 72, 73, 145]. Our semismoothNewton methodswith MG-preconditionedconjugategradientsolver of theNewton systemsbelongto theclassof Newton multilevel methods[44]. For othermultigrid approachesto variationalinequalitieswereferto [21, 82, 83, 99, 100, 101].

For thesolutionof thesemismoothNewton systemwe solve theSchurcomple-mentequation(7.34) by a multigrid-preconditionedconjugategradientmethodasjust described.

The grid hierarchyis generatedas follows: The coarsesttriangulationT 1 isshown in Figure7.3. GivenT 2h, thenext finer triangulationT h is obtainedby re-placingany trianglein T 2h with four triangles,introducingthe edgemidpointsofthecoarsetrianglesasnew vertices,seeFigure7.4,which displaysT 1/2. Table7.1shows theresultingnumberof interior verticesandthenumberof trianglesfor eachtriangulationlevel. Thereis a secondstrategy to usethemultilevel philosophy:Wecanperforma nestediterationover the discretecontrol problemson the grid hier-archy:We first (approximately)solve the discretecontrol problemon the coarsestlevel. We theninterpolatethis solutionto obtainaninitial point for thediscretecon-


Figure7.3CoarsesttriangulationT 1. Figure7.4SecondtriangulationT 1/2.

Numberof Numberofh interior vertices triangles

1/16 481 10241/32 1985 40961/64 8065 163841/128 32513 655361/256 130561 262144

Table 7.1Degreesof Freedomfor differentmeshsizes.

trol problemon the next finer level, which we againsolve approximately, andsoforth. As wewill see,this approachis veryefficient.

7.2.2 Black-Box Approach

We now presentnumericalresultsfor semismoothNewton methodsappliedto thefirst-ordernecessaryconditionsof thereducedproblem(7.9).We thusconsiderthethreealgorithmsA111, A121 andA112. The initial point is u0 ≡ −1. We do notusea globalizationsince(asit is oftenthecasefor controlproblems)theundampedsemismoothNewtonmethodconvergeswithoutdifficulties.Westressthatif thenon-monotonetrust-region methodof section6.4 is used,the globalizationparameterscanbechosenin sucha way thatthemethodessentiallybehaveslike thepureNew-ton method.

To beindependentof thechoiceof theMCP-function,we work with the termi-nationcondition

χ(uk) = ‖uk − PC(uk − j′(uk))‖L2 ≤ ε,or, in termsof thediscretizedproblem,[

uhk − PC(uhk − jhk′)]T

Mh[uhk − PC(uhk − jhk

′)]≤ ε2.

146 7. Applications

h k ‖uk − u‖L2 ‖uk − u‖L∞ χ(uk)

0 1.623e+00 6.416e+00 4.794e−031 9.454e−02 1.099e+00 1.022e−041

16 2 5.958e−04 1.354e−02 5.949e−043 3.611e−10 1.824e−09 3.552e−130 1.627e+00 6.477e+00 4.805e−031 9.191e−02 1.098e+00 9.934e−051

32 2 1.429e−03 5.833e−02 1.428e−033 8.267e−11 4.141e−10 7.712e−140 1.628e+00 6.482e+00 4.807e−031 9.052e−02 1.097e+00 9.769e−051

64 2 1.347e−03 5.959e−02 1.346e−033 6.616e−11 4.170e−10 1.254e−140 1.628e+00 6.487e+00 4.808e−031 9.019e−02 1.098e+00 9.732e−051

1282 1.247e−03 6.325e−02 1.246e−033 3.911e−08 1.001e−05 3.911e−084 7.098e−11 4.621e−10 1.285e−150 1.628e+00 6.488e+00 4.808e−031 8.988e−02 1.098e+00 9.697e−051

2562 1.309e−03 6.469e−02 1.308e−033 1.735e−07 8.885e−05 1.735e−074 8.935e−12 6.650e−11 1.450e−15

Table 7.2 Iterationhistoryof algorithmA111.

Exceptfrom this, the methodwe useagreeswith Algorithm 7.19. We work withε = 10−8. Smallervaluescanbechosenaswell, but it doesnot appearto beveryreasonableto chooseεmuchsmallerthanthediscretizationerror. Thenonlinearstateequationis solvedby a Newton iteration,where,in eachiteration,a linearizedstateequationhasto besolved.For thecomputationof j′ we solve theadjointequation.All PDEsolvesaredoneby amultigrid-cgmethodasdescribedabove.

In our first set of testswe chooseλ = 0.001 and considerproblemson thetriangulationsT h for h = 2−k, k = 4, 5, 6, 7, 8. SeeTable7.1for thecorrespondingnumberof trianglesandinterior nodes,respectively.

Theresultsarecollectedin Tables7.2–7.4.Hereby, Table7.2containstheresultsfor A111,Table7.3 theresultsfor A121,andTable7.4 theresultsfor A112.Listedaretheiterationk, theL2-distanceto the(discrete)solution(‖uk − u‖L2), theL∞-distanceto the(discrete)solution(‖uk − u‖L∞), andthenormof theprojectedgra-dient(χ(uk)). For all threevariantsof thealgorithmwe observe mesh-independentconvergencebehavior, andsuperlinearrateof convergenceof order>1. Only 3–4iterationsareneededuntil termination.

Table7.5 shows for all threealgorithmsthetotal numberof iterations(Iter.), ofstateequationsolves(State),of linearizedstateequationsolves(Lin. State),andofadjoint equationsolves(Adj. State),andthe total solutiontime in seconds(Time).The total numberof solvesof the semismoothNewton systemcoincideswith thenumberof iterationsIter. All solvesof the linearizedstateequationsareperformed



0 1.623e+00 6.416e+00 4.794e−031 9.454e−02 1.099e+00 1.022e−041

16 2 3.266e−05 1.309e−04 5.467e−083 2.210e−10 1.115e−09 4.921e−140 1.627e+00 6.477e+00 4.805e−031 9.191e−02 1.098e+00 9.934e−051

32 2 5.613e−05 2.547e−04 9.082e−083 5.024e−11 2.521e−10 1.086e−140 1.628e+00 6.482e+00 4.807e−031 9.052e−02 1.097e+00 9.769e−051

64 2 5.348e−05 2.404e−04 8.643e−083 6.774e−11 4.115e−10 2.206e−150 1.628e+00 6.487e+00 4.808e−031 9.019e−02 1.098e+00 9.732e−051

128 2 4.679e−05 2.091e−04 7.538e−083 7.116e−11 4.612e−10 8.230e−160 1.628e+00 6.488e+00 4.808e−031 8.988e−02 1.098e+00 9.697e−051

256 2 5.097e−05 2.295e−04 8.212e−083 1.405e−07 6.736e−05 1.405e−10


within theNewtonmethodfor thesolutionof thestateequation.For algorithmsA111andA121,atotalof Iter+1 statesolvesandIter+1 adjointstatesolvesarerequired.Algorithm A112 requiresin additiononestatesolvesandoneadjointstatesolveperiterationfor thecomputationof thesmoothingstep.Weseethatusuallytwo Newtoniterationsaresufficient to solve the nonlinearstateequation.Observe that the totalcomputingtime increasesapproximatelylinearly with thedegreesof freedom.Thisshows that we indeedachieve multigrid efficiency. We note that algorithmsA111andA121 aresuperiorto A112 in computingtime. Themainreasonfor this is thatA112requirestheextrastateequationandadjointequationsolvesfor thesmoothingstep.

In a secondtestwe focuson theimportanceof thesmoothingstep.To this end,wehaveruntheAlgorithmsA112andA122withoutsmoothingsteps(A112is with-out projectionwhereasA122 containsa projection).Theresultsareshown in Table7.6. We seethat A112 without smoothingstepsneedsan averageof 7 iterations,whereastheregularAlgorithm A112,seeTable7.5,needsonly 4 iterationsin aver-age.This shows that thesmoothingstephasindeedbenefits,but that thealgorithmstill exhibits reasonableefficiency if the smoothingstepis removed. If we do notperforma smoothingstep,but includea projection(A122 without smoothingstep),theperformanceof thealgorithmis not affectedby omitting thesmoothingstep,atleastfor the problemunderconsideration.We recall that the role of thesmoothingstepis to avoid large discrepanciesbetween‖uk − u‖Lp and‖uk − u‖Lr , i.e., toavoid large (peak-like) deviationsof uk from u on smallsets,seeExample3.52.It

148 7. Applications


0 1.623e+00 6.416e+00 4.794e−031 6.695e−01 2.447e+00 1.630e−031





2562 9.503e−02 3.796e−01 1.682e−043 1.770e−03 7.067e−03 2.954e−064 6.020e−07 2.442e−06 9.997e−10


is intuitively clearthataprojectionstepcanhelpin cuttingoff suchpeaks(but thereis noguarantee).

In our next test we show that lack of strict complementaritydoesnot affectthe superlinearconvergenceof the algorithms.Denotingby j the reducedobjec-tivefunctionfor thedata(7.35)andby u thecorrespondingsolution,wenow chooseud = λ−1j′(u). With thesenew data,the(new) gradientvanishesidenticallyonΩat u so that strict complementarityis violated.A representative run for this degen-eratedproblemis shown in Table7.7(A111,h = 1/128). Hereby, uhd wasobtainedfrom thediscretesolutionandthediscretegradient.Similar asin thenondegeneratecase,the algorithmsshow meshindependentbehavior, seeTable7.8. We have notincludedfurthertablesfor this problemsincethey would look essentiallylike thosefor thenondegenerateproblem.

7.2.3 All-at-Once Approach

Wenow presentnumericalexperimentsfor semismoothNewtonmethodsappliedtotheall-at-onceapproach.Sincethestateequationis nonlinear, theadvantageof thisapproachis that we do not have to solve the stateequationin every iteration.On


Alg. h Iter. State Lin. State Adj. State Time1/16 3 4 10 4 3.0s1/32 3 4 9 4 8.6s

A111 1/64 3 4 7 4 32.6s1/128 4 5 8 5 187.8s1/256 4 5 8 5 935.5s1/16 3 4 9 4 2.9s1/32 3 4 8 4 8.2s

A121 1/64 3 4 7 4 33.0s1/128 3 4 7 4 156.8s1/256 3 4 7 4 771.1s1/16 4 9 21 9 6.0s1/32 4 9 19 9 16.0s

A112 1/64 4 9 19 9 65.3s1/128 4 9 19 9 300.7s1/256 4 9 18 9 1428.1s

Table 7.5Performancesummaryfor thealgorithmsA111,A121,andA112.

h Iter. State Lin. State Adj. State TimeAlgorithm A112without smoothingstep

1/16 7 8 16 8 6.2s1/32 7 8 15 8 18.4s1/64 7 8 15 8 76.7s1/128 7 8 15 8 366.2s1/256 7 8 14 8 1834.2s

Algorithm A122without smoothingstep.1/16 4 5 13 5 4.5s1/32 4 5 11 5 11.2s1/64 4 5 11 5 50.0s1/128 4 5 11 5 263.6s1/256 4 5 10 5 1191.8s

Table 7.6Performancesummaryfor algorithmsA112andA122withoutsmoothingstep.

the otherhand,the mainwork is solving the Newton systemso that an increaseofiterationsin thesemismoothNewtonmethodcancompensatethis win of time.

Wechooseu0 ≡ −1, y0 ≡ 0, w0 ≡ 0. Betterchoicesfor y0 andw0 arecertainlypossible.Our terminationconditionis

χ(yk, uk, wk) =(‖Lu(yk, uk, wk)− PC(uk − Lu(yk, uk, wk))‖2L2

+ ‖Ly(yk, uk, wk)‖2H−1 + ‖E(yk, uk)‖2H−1

)1/2 ≤ εwith ε = 10−8. Theall-at-oncesemismoothNewtonsystemis solvedby reducingitto thesameSchurcomplementaswasusedfor solvingtheblack-boxNewtonequa-tion, andby applyingMG-preconditionedcg. Only the right handsideis different.Table7.9showstwo representativerunsof algorithmA212.Furthermore,Table7.10

150 7. Applications


0 1.628e+00 6.487e+00 2.053e-031 9.019e-02 1.098e+00 1.006e-041

128 2 1.657e-07 4.789e-07 1.536e-073 8.814e-12 2.260e-11 1.738e-17

Table 7.7 Iterationhistoryof algorithmA111 for adegenerateproblem.

h Iter. State Lin. State Adj. State Time1/16 3 4 9 4 2.7s1/32 3 4 9 4 7.9s1/64 3 4 7 4 32.7s1/128 3 4 7 4 157.6s1/256 3 4 7 4 767.1s

Table 7.8Performancesummaryof algorithmA111 for adegenerateproblem.

h k ‖uk − u‖L2 ‖uk − u‖L∞ χ(yk, uk, wk)

0 1.628e+00 6.487e+00 1.903e-011 6.797e-01 2.514e+00 3.225e-012 1.176e-01 4.743e-01 4.007e-021

128 3 3.025e-03 1.197e-02 1.068e-034 1.756e-06 6.915e-06 6.767e-075 3.000e-13 1.206e-12 8.823e-110 1.628e+00 6.488e+00 1.903e-011 6.797e-01 2.516e+00 3.225e-012 1.156e-01 4.645e-01 3.949e-021

256 3 2.935e-03 1.171e-02 1.041e-034 2.079e-06 8.583e-06 7.832e-075 6.203e-14 2.888e-13 5.604e-11


containsinformationon theperformanceof thealgorithmsA211, A221, andA212for differentmeshsizes.

In comparisonwith the black-boxalgorithms,we seethat all-at-onceapproachandblack-boxapproacharecomparablyefficient.As anadvantageof theall-at-onceapproachwe note that the smoothingstepcanbe performedwith minimum addi-tional cost,whereasin the black-boxapproachit requiresoneadditionalsolve ofboth,stateandandadjointequation.Webelievethatthemoreexpensiveto solvethestateequationis (dueto nonlinearity),themorefavorableis theall-at-onceapproach.

7.2.4 NestedIteration

Next, we presentnumericalresultsfor the nestediteration approach.Hereby, westarton thegrid T 1/2, solve theproblemwith terminationthresholdε = 10−5 andcomputefrom its solution an initial point for the problemon the next finer grid


Algorithmh A211 A221 A212

Iter. Time Iter. Time Iter. Time1/16 5 1.9s 5 1.9s 5 2.4s1/32 5 6.1s 5 6.1s 5 6.7s1/64 5 28.0s 5 27.9s 5 30.3s1/128 5 147.2s 5 147.1s 5 156.3s1/256 5 750.9s 5 752.5s 5 785.0s

Table 7.10Performancesummaryfor thealgorithmsA211,A221,andA212.

T 1/4, andsoon.On thefinestlevel we solve with terminationthresholdε = 10−8.Table 7.11 shows the numberof iterationsper level and the total executiontime

Lin. Adj. Lin. Adj.h Iter. State State State h Iter. State State State

1/2 1 2 6 2 1/32 2 3 6 31/4 2 3 7 3 1/64 1 2 4 21/8 2 3 6 3 1/128 1 2 4 21/16 2 3 6 3 1/256 1 2 4 2

Total Time: 360s

Table 7.11Performancesummaryfor nestediterationversionof algorithmA111.

for the nestedversionof algorithmA111. Comparisonwith Table7.11shows thatthenestedversionof A111 needslessthanhalf the time to solve the problemthantheunnestedversion(330vs. 935seconds).Theuseof nestediterationis thusverypromising.Furthermore,it is very robustsince,exceptfor thecoarsestproblem,theNewton iterationis startedwith a verygoodinitial point.

7.2.5 Discussionof the Results

Fromthepresentednumericalresultswe draw thefollowing conclusions:

• The proposedmethodsallow us to usefast iterative solvers for their imple-mentation.This leadsto runtimesof optimal order in the sensethat they areapproximatelyproportionalto thenumberof unknowns.

• Theclassof semismoothNewtonmethodsperformsveryefficiently andexhibitsmesh-independentbehavior. We observe superlinearconvergenceaspredictedby our theory.

• Both,black-boxandall-at-onceapproachleadto efficientandrobustalgorithmswhich arecomparablein runtime.If smoothingstepsareused,the all-at-onceapproachis advantageoussinceit doesnot requireadditionalstateandadjointstatesolvesto computethesmoothingstep.

152 7. Applications

• Lack of strict complementaritydoesnotaffect thefastconvergenceof thealgo-rithms.Thisconfirmsour theory, whichdoesnotrequirestrictcomplementarity.

• Thechoiceof theMCP-functionπ(x) = x1 − PC(x1 − λ−1x2) appearsto bepreferabletoπ(x) = −φFB(−x) for thisclassof problems,atleastin theblack-box-approach.Themainreasonfor this is theadditionalcostof thesmoothingstep.

• Theperformanceof theφFB-basedalgorithms,which from a theoreticalpointof view requireasmoothingstep,degradesby acertainmargin if thesmoothingstepis turnedoff. This, however, is compensatedif we turn on the projectionstep.Our numericalexperienceindicatesthat this effect is problemdependent.It shouldbementionedthat so far we never observeda severedeteriorationofperformancewhenswitchingoff thesmoothingstep.But we stressthatpatho-logical situationslike theonein Example3.52canoccur, andthatthey resultina stagnationof convergenceonfine grids(we havetried this,but donot includenumericalresultshere).

Weconcludethissectionby notingthatmany othercontrolproblemscanbehandledin a similar way. In particular, Neumannboundarycontrol canbe usedinsteadofdistributedcontrol.Furthermore,thecontrolof othertypesof PDEsby semismoothNewton methodsis possible,e.g.,Neumannboundarycontrolof thewave equation[104] andNeumannboundarycontrol of the heatequation[24, 143]. The optimalcontrolof theincompressibleNavier–Stokesequationsis consideredin section8.

7.3 ObstacleProblems

In this sectionwe studythe classof obstacleproblemsdescribedin section1.1.2.Obstacleproblemsof this or similar typearisein many applications,e.g.,potentialflow of perfectfluids, lubrication,wake problems,etc.,see,e.g.,[63] andthe ref-erencestherein.We describethe problemin termsof the obstacleproblemfor anelasticmembrane.

For q ∈ [2,∞), let g ∈ H2,q(Ω) representa (lower) obstaclelocatedover thenonemptyboundedopensetΩ ⊂ R2 with sufficiently smoothboundary, denotebyy ∈ H1

0 (Ω) the positionof a membrane,andby f ∈ Lq(Ω) external forces.Forcompatibilitywe assumeg ≤ 0 on∂Ω, which is assumedto besufficiently smooth.Theny ∈ H1

0 (Ω) solvesthevariationalinequality

y ≥ g on Ω,

a(y, v − y)− (f, v − y)L2 ≥ 0 ∀ v ∈ H10 (Ω), v ≥ g on Ω,

(7.36)

where

a : H10 (Ω)×H1

0 (Ω)→ R, a(y, z) =∑i,j

aij∂y

∂xi

∂z

∂xj,

aij = aji ∈ C1(Ω), anda beingH10 -elliptic, i.e.,

7.3 ObstacleProblems 153

a(y, y) ≥ ν‖y‖2H10∀y ∈ H1

0 (Ω)

with a constantν > 0. Theboundedbilinear form a inducesa boundedlinearop-eratorA ∈ L(H1

0 ,H−1) via a(v,w) = 〈v,Aw〉H1

0 ,H−1 for all v,w ∈ H1

0 (Ω).The ellipticity of a and the Lax–Milgram theoremimply thatA ∈ L(H1

0 ,H−1)

is a homeomorphismwith ‖A−1‖H−1,H10≤ ν−1, andregularity resultsimply that

A−1 ∈ L(L2,H2).Introducingtheclosedconvex set

F = y ∈ H10 (Ω) : y ≥ g on Ω

andtheobjective functionJ : H10 (Ω) 7→ R,

J(y) def=12a(y, y)− (f, y)L2 ,

wecanwrite (7.36)equivalentlyasoptimizationproblem

minimize J(y) subjectto y ∈ F . (7.37)

Theellipticity of a impliesthatJ is strictly convex with J(y)→∞ as‖y‖H10→∞.

Hence,using that F is a closedand convex subsetof the Hilbert spaceH10 (Ω),

we seethat (7.37) possessesa uniquesolution y ∈ F [49, Prop. II.1.2]. Further,regularity results[22, Thm.I.1] ensurethat y ∈ H1

0 (Ω) ∩H2,q(Ω).

7.3.1 Dual Problem

Since(7.37) is not posedin anLp-setting,we derive an equivalentdual problem,which,aswe will see,is posedin L2(Ω). Denotingby IF : H1

0 (Ω)→ R ∪ +∞,theindicatorfunctionof F , i.e.,IF (y)(x) = 0 for x ∈ F andIF(y)(x) = +∞ forx /∈ F , we canwrite (7.37)in theform

infy∈H1

0 (Ω)J(y) + IF (y). (7.38)

The corresponding(Fenchel–Rockafellar)dual problem[49, Ch. III.4] (we chooseF = IF ,G = J , Λ = I, u = y, andp∗ = −u in theterminologyof [49]) is

supu∈H−1(Ω)

−J∗(u)− I∗F (−u), (7.39)

whereJ∗ : H−1(Ω) → R ∪ +∞ andI∗F : H−1(Ω) → R ∪ +∞ are theconjugatefunctionsof F andIF , respectively:

J∗(u) = supy∈H1

0 (Ω)

〈y, u〉H10 ,H

−1 − J(u), (7.40)

I∗F (u) = supy∈H1

0 (Ω)

〈y, u〉H10 ,H

−1 − IF (y). (7.41)

(7.42)

154 7. Applications

Let y0 ∈ H10 (Ω) be suchthat IF (y0) = 0, e.g.,y0 = y. ThenJ is continuous

at y0 andIF is boundedat y0. Furthermore,sinceIF ≥ 0, the ellipticity impliesJ(y) + IF (y) → ∞ as‖y‖H1

0→ ∞. Therefore,[49, Thm. III.4.2] appliesso that

(7.38)and(7.39)possesssolutionsy (thisweknew already)andu, respectively, andfor any pair of solutionsholds

J(y) + IF (y) + J∗(u) + I∗F (−u) = 0.

Further, thefollowing extremalityrelationshold:

J(y) + J∗(u)− 〈u, y〉H−1,H10

= 0, (7.43)

IF (y) + I∗F (−u) + 〈u, y〉H−1,H10

= 0. (7.44)

This implies

u ∈ ∂J(y), (7.45)

−u ∈ ∂IF (y). (7.46)

In our caseJ is smooth,whichyields

u = J ′(y) = Ay − f. (7.47)

Weknow thattheprimalsolutiony is unique,andthusthedualsolutionu is unique,too, by (7.47). Further, by regularity, y ∈ H1

0 (Ω) ∩ H2,q(Ω), which, via (7.47),impliesu ∈ Lq(Ω).

Thesupremumin thedefinitionof J∗, see(7.40),is attainedfor y = A−1(f+u),with value

J∗(u) = 〈u, y〉H−1,H10− 1

2〈y,Ay〉H1

0 ,H−1 + 〈f, y〉H−1,H1

0

=12〈f + u,A−1(f + u)〉H−1,H1

0.

For u ∈ L2(Ω) we canwrite

J∗(u) =12(f + u,A−1(f + u))L2 .

Further, seealso[22, p. 19] and[49, Ch. IV.4],

I∗F (u) = supy∈H1

0

〈u, y〉H−1,H10− IF (y) = sup

y∈F〈u, y〉H−1,H1

0.

For u ∈ L2(Ω) we have

I∗F (u) = supy∈F

(u, y)L2 =

(g, u)L2 if u ≤ 0 onΩ,+∞ otherwise.

Therefore,usingtheregularityof y andu, wecanwrite (7.39)in theform


maximizeu∈L2(Ω)

− 12(f + u,A−1(f + u))L2 + (g, u)L2 subjectto u ≥ 0, (7.48)

andweknow thatu ∈ Lq(Ω). Werecallthatfrom thedualsolutionuwecanrecovertheprimal solutiony from theidentity (7.47):y = A−1(f + u).

In thefollowing wepreferto write (7.48)asaminimizationproblem:

minimizeu∈L2(Ω)

12(f + u,A−1(f + u))L2 − (g, u)L2 subjectto u ≥ 0. (7.49)

Example7.21. In thecaseA = −∆ theprimal problemis

minimizey∈H1

0 (Ω)

12‖y‖2H1

0− (f, y)L2 subjectto y ≥ g,

andthedual(minimization)problemreads

minimizeu∈L2(Ω)

12‖f + u‖2H−1 − (g, u)L2 subjectto u ≥ 0,

where‖u‖H−1 = ‖∆−1u‖H10

is thenormdualto ‖ · ‖H10.

Wecollectour resultsin thefollowing theorem.

Theorem7.22. Under the problemassumptions,the obstacleproblem(7.36)pos-sessesa uniquesolutiony ∈ H1

0 (Ω), andthissolutionis containedinH2,q(Ω). Thedual problem(7.39)possessesa uniquesolutionu ∈ H−1(Ω) aswell. Primal anddual solutionare linkedvia theequation

Ay = f + u.

In particular, u ∈ Lq(Ω), andthedual (minimization)problemcanbewritten in theform (7.49).

7.3.2 RegularizedDual Problem

Problem(7.49)is not coercive in thesensethatfor ‖u‖L2 → ∞ theobjective func-tion tendsto +∞. Hence,we considertheregularizedproblem

minimizeu∈L2(Ω)

jλ(u)def=

12(f + u,A−1(f + u))L2 +

λ

2‖u− ud‖2L2 − (g, u)L2

subjectto u ≥ 0 on Ω

(7.50)

with ud ∈ Lp′(Ω), p′ ∈ (2,∞), and(small) regularizationparameterλ > 0. Thisproblemhasthefollowing properties:

156 7. Applications

Theorem7.23. Theobjectivefunctionof problem(7.50)is stronglyconvex and

jλ(u)→∞ as‖u‖L2 →∞.In particular, (7.50)possessesa uniquesolutionuλ ∈ L2(Ω), andthis solutionliesin Lp

′(Ω). Thederivativeof jλ hastheform

j′λ(u) = λ(u− ud) + A−1(f + u)− g def= λu+G(u). (7.51)

Hereby, themappingG(u) = A−1(f +u)− g−λud mapsL2(Ω) continuouslyandaffinelinearly intoLp

′(Ω).

Proof. Obviously, jλ is asmoothquadraticfunctionandwith z = A−1(f + u),

jλ(u) =λ

2‖u− ud‖2L2 +

12a(z, z) − (g, u)L2

≥ λ

2‖u− ud‖2L2 − ‖g‖L2‖u‖L2 →∞

as‖u‖L2 → ∞. Therefore,sinceu ∈ L2(Ω) : u ≥ 0 is closedandconvex, weseethat(7.50)possessesauniquesolutionuλ ∈ L2(Ω).

Certainly, j′λ(u) is givenby (7.51),andthe fact thatA ∈ L(H10 ,H

−1) impliesthat

G : u ∈ L2(Ω) 7→ A−1(f + u)− g − λud ∈ H10 (Ω) + Lp

′(Ω) → Lp

′(Ω)

is continuousaffine linear. Fromtheoptimalityconditionsfor (7.50)we conclude

j′λ(uλ) = 0 on x ∈ Ω : uλ(x) 6= 0.Hence,

uλ = 1uλ 6=0uλ = −λ−11uλ 6=0G(uλ) ∈ Lp′(Ω).

utCorollary 7.24. Undertheproblemassumptions,F = j′λ satisfiesAssumption3.33(a), (b) for anyp ∈ [2,∞), anyp′ ∈ [2,∞) with p′ ≤ p andud ∈ Lp′(Ω), andany1 ≤ r < p′. Furthermore, F satisfiesAssumption4.1 for r = 2 andall p ∈ (2,∞)with ud ∈ Lp(Ω). Finally, F alsosatisfiesAssumption4.6(a)–(e) for all p ∈ [2,∞)andall p′ ∈ (2,∞).

Proof. The Corollary is an immediateconsequenceof Theorem7.23 and theL2-coercivity of jλ. utRemark 7.25. Corollary7.24establishesall assumptionsthatareneededto estab-lish the semismoothnessof NCP-functionbasedreformulations.In fact, for gen-eral NCP-functionsTheorem3.45 is applicable,whereasfor the special choiceπ(x) = x1 − P[0,∞)(x1 − λ−1x2) we canuseTheorem4.4.Furthermore,thesuffi-cientconditionfor regularityof Theorem4.8is applicable.Hence,wecanapplyourclassof semismoothNewtonmethodsto solveproblem(7.50).


Next, wederiveboundsfor theapproximationerrors‖uλ− u‖H−1 and‖yλ− y‖H10,

whereyλ = A−1(f + uλ).

Theorem7.26. Let u anduλ denotethesolutionsof (7.49)and(7.50), respectively.Theny = A−1(f+u) solvestheobstacleproblem(7.36)andwith yλ = A−1(f+uλ)holds,asλ→ 0+ :

‖uλ − u‖H−1 = o(λ1/2), (7.52)

‖yλ − y‖H10

= o(λ1/2). (7.53)

Proof. By Theorems7.22and7.23we know that the dual problem(7.49)andtheregularizeddualproblem(7.50)possessuniquesolutionsu, uλ ∈ Lp(Ω). Now

jλ(uλ) ≤ jλ(u) = j(u) +λ

2‖u− ud‖2L2 ≤ j(uλ) +

λ

2‖u− ud‖2L2

= jλ(uλ) +λ

2(‖u− ud‖2L2 − ‖uλ − ud‖2L2

).

Thisproves‖uλ − ud‖L2 ≤ ‖u− ud‖L2 . (7.54)

Further,

j(u) ≤ j(uλ) = jλ(uλ)− λ

2‖uλ − ud‖2L2 ≤ jλ(u)− λ

2‖uλ − ud‖2L2

= j(u) +λ

2(‖u− ud‖2L2 − ‖uλ − ud‖2L2

) ≤ j(u) +λ

2‖u− ud‖2L2 .

(7.55)

Therefore,

0 ≤ j(uλ)− j(u) ≤ λ

2‖u− ud‖2L2 = O(λ) as λ→ 0+.

Now let λk → 0+. Since

M = u ∈ L2(Ω) : u ≥ 0, ‖u− ud‖L2 ≤ ‖u− ud‖L2is closed,convex, andbounded,thereexistsa subsequenceandapoint u ∈ M suchthatuλk′ → u weakly in L2. Sincej is convex andcontinuous,it is weakly lowersemicontinuous,sothat

j(u) ≤ j(u) ≤ lim infk′→∞

j(uλk′ ) = lim infk′→∞

[j(u) +O(λk′)

]= j(u).

Henceu is a solutionof (7.49)andthereforeu = u, sinceu is theuniquesolution.By a subsequence-subsequenceargumentwe conclude

uλ → u weaklyin L2(Ω) as λ→ 0+. (7.56)

Sinceu 7→ ‖u − ud‖L2 is convex andcontinuous,henceweakly lower semicontin-uous,we obtainfrom (7.54)and(7.56)

158 7. Applications

‖u− ud‖L2 ≤ lim infλ→0+

‖uλ − ud‖L2 ,

‖u− ud‖L2 ≥ lim supλ→0+

‖uλ − ud‖L2 ,

whichproves‖uλ − ud‖L2 → ‖u− ud‖L2 as λ→ 0+. (7.57)

SinceL2 is aHilbert space,(7.56)and(7.57)imply

uλ → u in L2 as λ→ 0+. (7.58)

Hence,(7.55)impliesj(uλ)− j(u) = o(λ).

Sinceu solves(7.49),thereholds(j′(u), uλ − u)L2 ≥ 0. Therefore,

j(uλ)− j(u) = (j′(u), uλ − u)L2 +12(uλ − u, j′′(u)(uλ − u))L2

≥ 12(uλ − u, j′′(u)(uλ − u))L2 =

12(uλ − u, A−1(uλ − u))L2 .

Hence,with v = uλ − u andw = A−1v,

‖v‖2H−1 = ‖Aw‖2H−1 ≤ ‖A‖2H10 ,H

−1‖w‖2H10≤ ‖A‖2H1

0 ,H−1κ

−1〈w,Aw〉H10 ,H

−1

≤ κ−1‖A‖2H10 ,H

−1〈v,w〉L2 ≤ 2κ−1‖A‖2H10 ,H

−1(j(uλ)− j(u))= 2κ−1‖A‖2H1

0 ,H−1o(λ).

This proves(7.52).The solutionof the obstacleproblemis y = A−1(f + u). Foryλ = A−1(f + uλ) holds:

‖yλ − y‖2H10

= ‖A−1(uλ − u)‖2H10

= ‖w‖2H10≤ κ−1〈w,Aw〉H1

0 ,H−1

= κ−1(uλ − u, A−1(uλ − u))L2 ≤ 2κ−1(j(uλ)− j(u))= 2κ−1o(λ).

Theproof is complete. utRemark 7.27. Theparameterλ hasto bechosensufficiently smallto ensurethattheerroris not largerthanthediscretizationerror. Our approachwill beto successivelyreduceλ.

7.3.3 Discretization

We usethe samefinite elementspacesas in section7.1.3.A straightforward dis-cretizationyieldsthediscreteobstacleproblem(in coordinateform)

minimizeyh∈Rnh

12yh

TAhyh − fh

Tyh subjectto yh ≥ gh. (7.59)


Hereby, gh ∈ Rnh

, ghi = g(P hi ), approximatesthe obstacle.Furthermore,fhi =(βhi , f)L2 , andAh

ij = (Aβhi , βhj )H−1,H1

0. Thecorrespondingdualproblemis

minimizeuh∈Rnh

12(fh + Shuh)TAh−1

(fh + Shuh)− ghTShuh

subjectto uh ≥ 0.(7.60)

Hereby, Sh ∈ Rnh×nh

is definedasin (7.17).Thediscreteregularizeddualproblemthenis givenby

minimizeuh∈Rnh jhλ(u

h) def=12(fh + Shuh)TAh−1

(fh + Shuh)

+λ

2(uh − uhd)

TSh(uh − uhd)− ghTShuh

subjectto uh ≥ 0,

(7.61)

where,e.g.,[Shuhd ]i = (Lhβhi , Lhud)L2 . Fromthesolutionuhλ of (7.61)wecompute

yhλ via Ahyhλ = fh + Shuhλ.Thegradientof jhλ

′andtheHessianjhλ

′′of jhλ with respectto theSh-innerproduct

aregivenby

jhλ′(uh) = Ah−1

(fh + Shuh) + λ(uh − uhd)− gh,

jhλ′′(uh) = Ah−1

Sh + λI.

Choosinga LipschitzcontinuousandsemismoothNCP-functionφ, we reformulate(7.61)in theform

Φh(uh) def=

φ(uh1 , j

hλ

′(uh)1)

...φ(uhnh , jhλ

′(uh)nh

) = 0. (7.62)

This is thediscretecounterpartof thesemismoothreformulationin functionspace

Φ(u) def= φ(u, j′λ(u)

)= 0.

As in section 7.1.4, we can argue that an appropriatediscretizationof ∂Φ is∂Φh(uh), thesetof all matricesBh ∈ Rnh×nh

with

Bh = Dh1 + Dh

2 jhλ

′′(uh), (7.63)

whereDh1 andDh

2 arediagonalmatricessuchthat((Dh

1)ll, (Dh2 )ll) ∈ ∂φ(uhl , jhλ′(uh)l), l = 1, . . . , nh.

Again,we havetheinclusion

160 7. Applications

∂CΦh(uh) ⊂ ∂Φh(uh)

with equalityif φ or−φ is regular. With thesameargumentationasin thederivationof Theorem7.18we canshow thatΦh is ∂Φh-semismooth(andthusalsosemis-mooth in the usualsense).Semismoothnessof higherordercanbe proved analo-gously. Hence,we canapplyour semismoothNewton methodsto solve (7.62).Thedetailsof theresultingalgorithm,whicharenotgivenhere,parallelAlgorithm 7.62.The centraltaskis to solve the semismoothNewton system(we suppressthe sub-scriptk)

[Dh1 + Dh

2 jhλ

′′(uh)]sh = −Φh(uh).

Usingthestructureof jhλ′′

andthat(Dh1 + λDh

2) is diagonalandpositivedefinitefor

ourchoicesof φ, weseethatthis is equivalentto sh = Sh−1

Ahvh, wherevh solves

[Ah + Sh(Dh1 + λDh

2)−1Dh2 ]vh = −Sh(Dh

1 + λDh2)−1Φh(uh).

Thiscanbeviewedasadiscretizationof thethePDE

Av +d2

d1 + λd2v =

−1d1 + λd2

Φ(u).

Therefore,we canapplya multigrid methodto computevh, from which sh canbeobtainedeasily.

7.3.4 Numerical Results

Weconsiderthefollowing problem:

Ω = (0, 1)× (0, 1),

g = −14

+12

sin(πx1) sin(πx2)

f = −5 sin(2πx1) sin(2πx2)(

12

+ e2x1+x2

).

(7.64)

Thetriangulationis thesameasin section7.2.1.Again, thecodewasimplementedin MatlabVersion6 Release12,usingsparsematrix computations,andwasrun un-derSolaris8 onaSunSPARC Ultra workstationwith asparcv9processoroperatingat 360MHz. To obtainsufficiently accuratesolutions,the regularizationparameterhasto bechosenappropriately. Hereby, we usea nestediterationapproachandde-termineλ in dependenceon the currentmeshsize.It is known [63, App. I.3] that,underappropriateconditions,thedescribedfinite elementdiscretizationleadsto ap-proximationerrors‖yh− y‖H1

0= O(h). Sincewehaveshown in Theorem7.26that

‖yλ − y‖H10

= o(λ1/2), we chooseλ of theorderh2, moreprecisely, we work with

λ = λh =h2

10.


Wethensolveproblem(7.61)for h = 1/2 until

χ(uk) = ‖uk − P[0,∞)(uk − j′λ(uk))‖L2 ≤ ε (7.65)

with ε = 10−5 (in thecorrespondingdiscretenorms),interpolatethiscoarsesolutionto obtainan initial point on T 1/4, solve this problem(now with λ = λ1/4) until(7.65)is satisfied,interpolateagain,andrepeatthisprocedureuntil wehavereachedthe finest grid on which we iterateuntil (7.65) holds with ε = 10−8. To furtherreducethe effect of regularization,we always useasud the interpolatedsolutionfrom thenext coarsergrid (thesamepoint thatwe useasinitial point).OnT 1/2 wechooseud = u0 ≡ 0. Theobstacleis shown in Figure7.5,thestatesolutionyλ for

PDE PDEh λ Iter. Solves h λ Iter. Solves

hfinal = 1/64

1/2 2.500e−02 1 2 1/32 9.766e−05 3 41/4 6.250e−03 2 3 1/64 2.441e−05 4 51/8 1.563e−03 2 31/16 3.906e−04 3 4

‖y∗ − y‖H10

= 2.375e−03

‖y∗ − yλ‖H10

= 1.978e−10

Total Time: 13.5s

hfinal = 1/128

1/2 2.500e−02 1 2 1/32 9.766e−05 3 41/4 6.250e−03 2 3 1/64 2.441e−05 3 41/8 1.563e−03 2 3 1/128 6.104e−06 4 51/16 3.906e−04 3 4

‖y∗ − y‖H10

= 8.671e−04

‖y∗ − yλ‖H10

= 3.572e−10

Total Time: 54.6s

hfinal = 1/256

1/2 2.500e−02 1 2 1/32 9.766e−05 3 41/4 6.250e−03 2 3 1/64 2.441e−05 3 41/8 1.563e−03 2 3 1/128 6.104e−06 3 41/16 3.906e−04 3 4 1/256 1.526e−06 4 5

‖y∗ − y‖H10

= 3.024e−04

‖y∗ − yλ‖H10

= 5.594e−11

Total Time: 245.9s


λ = λ1/64 is displayedin Figure7.6,andthedualsolutionuλ is depictedin Figure7.7.Notethatx : u(x) 6= 0 is thecontactregion,andthat for our choiceof λ the

162 7. Applications

Algorithm A111

k ‖yk − yλ‖H10

‖yk − y‖H10

χ(uk)

0 1.701e-03 1.862e-03 5.812e-051 5.648e-04 6.199e-04 3.273e-012 2.682e-05 3.034e-04 1.706e-023 2.333e-09 3.024e-04 7.343e-074 5.594e-11 3.024e-04 8.139e-11

Table 7.13Iterationhistoryof algorithmA111on thefinal level h = hfinal = 1/256.

00.2

0.40.6

0.81

0

0.2

0.4

0.6

0.8

1−0.4

−0.2

0

0.2

0.4

0.6

Figure7.5Theobstacleg (h = 1/64).

solutionu is approximatedup to a fractionof thediscretizationerrorby uλ. It canbeseenthatu is discontinuousat theboundaryof thecontactregion.

In the numericaltestsit turnedout that it is not advantageousto let λ−1 be-cometoo large in the smoothingsteps.Hence,we set γ = min105, λ−1 andwork with smoothingstepsof the form Sk(u) = P[0,∞)(u − γj′λ(u)). On theother hand,even very small λ doesnot causeany problemsin the NCP-functionφ(x) = x1−P[0,∞)(x1−λ−1). Weconsidertwo methods:Thesmoothing-step-freeAlgorithm A111 with φ(x) = x1 − P[0,∞)(x1 − λ−1), andAlgorithm A112 withφFB andsmoothingstepasjustdescribed.It turnsout thatwithoutglobalizationthe


PDE PDEh λ Iter. Solves h λ Iter. Solves

hfinal = 1/64

1/2 2.500e−02 4 9 1/32 9.766e−05 4 91/4 6.250e−03 3 7 1/64 2.441e−05 5 111/8 1.563e−03 4 91/16 3.906e−04 4 9

‖y∗ − y‖H10

= 2.374e−03

‖y∗ − yλ‖H10

= 1.631e−07

Total Time: 29.3s

hfinal = 1/128

1/2 2.500e−02 4 9 1/32 9.766e−05 4 91/4 6.250e−03 3 7 1/64 2.441e−05 4 91/8 1.563e−03 4 9 1/128 6.104e−06 6 131/16 3.906e−04 4 9

‖y∗ − y‖H10

= 8.670e−04

‖y∗ − yλ‖H10

= 3.069e−08

Total Time: 142.9s

hfinal = 1/256

1/2 2.500e−02 4 9 1/32 9.766e−05 4 91/4 6.250e−03 3 7 1/64 2.441e−05 4 91/8 1.563e−03 4 9 1/128 6.104e−06 4 91/16 3.906e−04 4 9 1/256 1.526e−06 5 11

‖y∗ − y‖H10

= 3.024e−04

‖y∗ − yλ‖H10

= 2.609e−11

Total Time: 613.9s


projectedvariantA121 tendsto cycle whenλ becomesvery small.Sinceincorpo-ratingaglobalizationrequiresadditionalevaluationsof jλ and/orits gradient,whichis expensive dueto thepresenceof A−1, we do not presentnumericalresultsfor aglobalizedversionof A121.

In Table7.12(A111)andTable7.14(A112)weshow, for eachlevel of thenestediteration,thevalueof λ, thenumberof iterationsperformedon this level (Iter), andthenumberof PDEsolves.Furthermore,the(discrete)distance‖y∗ − y‖H1

0of the

(discrete)computedsolutiony∗ to the(discrete)solutiony correspondingto λ = 0andthe (discrete)distance‖y∗ − yλ‖H1

0of the (discrete)computedsolutiony∗ to

the (discrete)solutionyλ correspondingto λ = h2final/10 areshown. The total run-

time is alsogiven.We seethaton eachlevel only a few Newton iterationsareper-formed.In Table7.13the iterationhistoryof A111 on thefinestlevel is shown forhfinal = 1/256. Obviously, the convergenceis superlinearwith rate>1, and weobserve mesh-independentperformanceof the methods.Furthermore,the runtime

164 7. Applications

00.2

0.40.6

0.81

0

0.2

0.4

0.6

0.8

1−0.4

−0.2

0

0.2

0.4

0.6

Figure7.6Computedstateyλ (h = 1/64).

increasesapproximatelylinearlywith thenumberof unknowns.In conclusion,it canbe seenthat,similar asfor thecontrol problemin section7.1, the algorithmsofferall the favorablepropertiesthat are predictedby our theory. For this application,thesmoothing-step-freealgorithmwith theprojection-basedNCP-functionleadstosignificantlyshortersolutiontimesthanthealgorithmwith Fischer–Burmeisterfunc-tion andsmoothingstep.This is mainlycausedby theadditionalPDEsolvesneededfor the smoothingsteps.As for theuseof multigrid methods,it would be interest-ing to investigateif insteadof multilevel Newton methodsalsononlinearmultigridmethodscansuccessfullybeusedandinvestigated.

Furthermore,we stressthatmany othervariationalinequalitiescanbetreatedina similar way. In particular, this appliesto certainkindsof the following problems:problemswith constraintson the boundary, time-dependentVIPs, quasivariationalinequalities[12, 13], andVIPsof thesecondkind.


00.2

0.40.6

0.81

0

0.2

0.4

0.6

0.8

1−10

0

10

20

30

40

50

60

Figure7.7Computeddualsolutionuλ (h = 1/64).

8. Optimal Control of the IncompressibleNavier–StokesEquations

8.1 Intr oduction

TheNavier–Stokesequationsdescribeviscousfluid flow andarethusof centralin-terestfor many simulationsof practicalimportance(e.g.,in aerodynamics,hydrody-namics,medicine,weatherforecast,environmentalandoceansciences).Currently,significanteffortsaremadeto developandanalyzeoptimalcontroltechniquesfor theNavier-Stokesequations.In particular, controlof the incompressibleNavier-Stokesequationshasbeeninvestigatedintensively in, e.g.,[1, 16,17, 43, 58,67, 68, 69, 70,71, 75, 79,80]. Ouraimis to show thatourclassof semismoothNewtonmethodscanbeappliedto theconstraineddistributedcontrolof theincompressibleNavier-Stokesequations.

We considerinstationaryincompressibleflow in two spacedimensions.ThesetΩ ⊂ R2 occupiedby thefluid is assumedto benonempty, open,andboundedwithsufficiently smoothboundary∂Ω. By t ∈ [0, T ], T > 0, we denotetime andbyx = (x1, x2)T the spatialposition.For the time-spacedomainwe introducethenotationQ = (0, T ) × Ω. Thestateof thefluid is determinedby its velocity fieldy = (y1, y2)T andits pressureP , bothdependingon t andx. Throughout,we workin dimensionlessform.

TheNavier–Stokesequationscanbewritten in theform

yt − ν∆y + (y · ∇)y +∇P = Ru+ f in Q,

∇ · y = 0 in Q,

y = 0 in (0, T )× ∂Ω,

y(0, ·) = y0 in Ω.

(8.1)

Hereby, ν = 1/Re, whereRe > 0 is theReynoldsnumber, y0 is agiveninitial stateat time t = 0 satisfying∇ · y0 = 0, u(t, x) is thecontrol,R is a linearoperatorandf(t, x) aregivendata.Theprecisefunctionalanalyticsettingis givenin section8.2below. In (8.1)thefollowing notationis used:

∇ · u = (u1)x1 + (u2)x2 , ∆u =(∆u1

∆u2

)=(

(u1)x1x1 + (u1)x2x2

(u2)x1x1 + (u2)x2x2

),

(u · ∇)v =(u1(v1)x1 + u2(v1)x2

u1(v2)x1 + u2(v2)x2

), ∇P =

(Px1

Px2

).

168 8. OptimalControlof theIncompressibleNavier–StokesEquations

We perform time-dependentcontrol on the right hand side. To this end, let begiven a nonemptyand boundedopensetΩc ⊂ Rk and a control operatorR ∈L(L2(Ωc)l,H−1(Ω)2), andchooseascontrolspaceU = L2(Qc)l, Qc = (0, T ) ×Ωc.

Example8.1. For time-dependentcontrolof theright handsideonasubsetΩc ⊂ Ωof thespatialdomain,we canchooseR ∈ L(L2(Ωc)2,H−1(Ω)2),

(Rv)(x) = v(x) for x ∈ Ωc, (Rv)(x) = 0, otherwise.

Givena closedconvex feasiblesetC ⊂ U , the control problemconsistsin findinga controlu ∈ C which, togetherwith thecorrespondingsolution(y, P ) of thestateequation(8.1), minimizesthe objective functionJ(y, u). Specifically, we considertracking-typeobjective functionsof theform

J(y, u) =12

∫ T

0

∫Ω

‖Ny − zd‖22dxdt+λ

2

∫ T

0

∫Ωc

‖u− ud‖22dωdt. (8.2)

Hereby, N : H10 (Ω)2 7→ L2(Ω)m, m ≥ 1, is a boundedlinear operator, zd ∈

L2(Q)m is adesiredcandidatestateobservationto whichwewould likeNy to driveby optimalcontrol,λ > 0 is a regularizationparameter, andud ∈ Lp′(Qc)l, p′ > 2,aregivendata.

8.2 Functional Analytic Settingof the Control Problem

In our analysiswe will considerweak solutionsof the Navier–Stokes equations.To make this precise,we first introduceseveral function spaceswhich provide astandardframework for theanalysisof theNavier-Stokesequations[60, 107, 134].

8.2.1 Function Spaces

Wework in thefollowing spaces:

V = v ∈ C∞0 (Ω)2 : ∇ · v = 0,H = closureof V in L2(Ω)2,V = closureof V in H1

0 (Ω)2,Lp(X) = Lp(0, T ;X), W = v ∈ L2(V ) : vt ∈ L2(V ∗),C(X) = C(0, T ;X) = v : [0, T ] 7→ X, v continuous .

with innerproductsandnorms

(v,w)H = (v,w)L2(Ω)2 =∫Ω

(∑iviwi

)dx

(v,w)V = (v,w)H10 (Ω)2 =

∫Ω

(∑i,j

[vi]xj[wi]xj

)‖y‖Lp(X) =

(∫ T

0

‖y(t)‖pXdt)1/p

, ‖y‖L∞(X) = esssup0<t<T

‖y(t)‖X ,

8.2 FunctionalAnalytic Settingof theControlProblem 169

‖v‖W =

(∫ T

0

(‖v‖2V + ‖vt‖2V ∗)dt

)1/2

, ‖y‖C(X) = sup0≤t≤T

‖y(t)‖X.

Hereby, thedualspaceV ∗ of V is chosenin suchaway that

V → H = H∗ → V ∗

is aGelfandtriple. Thefollowing relationsbetweentheintroducedspaceshold:

W → C(H) → L∞(H), Lp(V )∗ = Lq(V ∗),1p

+1q

= 1, 1 < p, q <∞,Lp(V ) → Lq(V ), 1 ≤ q ≤ p ≤ ∞, .

8.2.2 The Control Problem

For thestatespaceandcontrolspace,respectively, wechoose

Y = W statespace, U = L2(Qc)l controlspace.

Thedataof thecontrolproblemare:

• theinitial statey0 ∈ H.

• theright handsidedataf ∈ L2(H−1(Ω)2).• theright handsidecontroloperatorR ∈ L(L2(Ωc)l,H−1(Ω)2) suchthat

w ∈ W 4/3 def= v ∈ L2(V ) : vt ∈ L4/3(V ∗) 7→ R∗w ∈ Lp′(Qc)l

is well definedandcontinuouswith p′ > 2.

• the objective function J : Y × U → R as definedin (8.2), with datazd ∈L2(Q)m, ud ∈ Lp′(Qc)l, observationoperatorN ∈ L(H1

0 (Ω), L2(Ω)m), andregularizationparameterλ > 0.

• thefeasiblesetC ⊂ U , which is nonemptyclosed,andconvex. In orderto applythesemismoothNewtonmethod,we will assumelaterin this chapterthat

C = u ∈ U : u(t, ω) ∈ C, (t, ω) ∈ Qc, (8.3)

whereC ⊂ Rl is aclosedconvex set.

Remark 8.2. For thechoiceof R discussedin Example8.1and2 < p′ < 7/2, wecanusetheembeddingW 4/3 → Lp

′(Ω)2 establishedin Lemma8.12below, to see

thatw ∈W 4/3 7→ R∗w = w|Qc

∈ Lp′(Q)2

is continuous.


For theweakformulationof theNavier-Stokesequationsit is convenientto introducethetrilinear form

b : V × V × V → R,

b(u, v,w) =∫Ω

wT (u · ∇)vdx =∫Ω

wT vxudx =∫Ω

∑i,jui(vj)xi

wjdx,

The variationalform of (8.1) is obtainedby applying test functionsv ∈ V to themomentumequation:

d

dt(y, v)H + ν(y, v)V + b(y, y, v)

= 〈Ru+ f, v〉H−1(Ω)2,H10 (Ω)2 ∀v ∈ V in (0, T ), (8.4)

y(0, ·) = y0 in Ω. (8.5)

Noteherebythattheincompressibilitycondition∇ · y = 0 is absorbedin thedefini-tion of thestatespaceW . Further, thepressuretermdropsout since∇ · v = 0 andthusintegrationby partsyields

〈∇P, v〉H−1(Ω)2,H10 (Ω)2 = −(P,∇ · v)L2(Ω)2 = 0.

Furthermore,theinitial condition(8.5)makessensefor y ∈W , sinceC(H) →W .For thewell-definednessof (8.4),andalsofor our analysis,it is importantto knowthefollowing factsaboutthetrilinear form b.

Lemma 8.3. Thereexistsa constantc > 0 such that, for all u, v,w ∈ V ,

b(u, v,w) = −b(u,w, v), (8.6)

|b(u, v,w)| ≤ c‖u‖L4(Ω)2‖v‖V ‖w‖L4(Ω)2 , (8.7)

|b(u, v,w)| ≤ c‖u‖1/2H ‖u‖1/2V ‖v‖V ‖w‖1/2H ‖w‖1/2V ≤ c‖u‖V ‖v‖V ‖w‖V . (8.8)

Proof. (sketched) Equation(8.6)resultsfrom integrationby partsandusing∇ · u =0, (8.7) follows by applyingHolder’s inequality, see[134, Ch. III Lem. 3.4], and(8.8) follows from V → H andtheestimate[134, Ch. III Lem.3.3]

‖v‖L4(Ω) ≤ 21/4‖v‖1/2L2(Ω)‖∇v‖1/2L2(Ω)2 ∀ v ∈ H10 (Ω). (8.9)

utEquations(8.4)and(8.5)canbewritten asoperatorequation

E(y, u) = 0 (8.10)

with E : W × U → Z∗, Z∗ def= L2(V ∗) × H. For convenience,we introducethefollowing operators:For all y, v,w ∈ V , all u ∈ L2(Ωc)l, andall z ∈ L2(Ω)m

8.3 Analysisof theControlProblem 171

A ∈ L(V, V ∗), 〈Av,w〉V ∗,V = (v,w)V ,B ∈ L(V,L(V, V ∗)), 〈B(y)v,w〉V ∗,V = b(y, v,w),

Rπ ∈ L(L2(Ωc)l, V ∗), 〈Rπu, v〉V ∗,V = 〈Ru, v〉H−1(Ω)2,H10 (Ω)2 ,

Nπ ∈ L(V,L2(Ω)m), (Nπv, z)L2(Ω)m = (Nv, z)L2(Ω)m .

Further, wedefinefπ ∈ L2(V ∗) by

〈fπ, v〉V ∗,V = 〈f, v〉H−1(Ω)2,H10 (Ω)2 ∀ v ∈ V.

Usingthesenotations,theoperatorE assumestheform

E(y, u) =(E1(y, u)E2(y, u)

)=(yt + νAy + B(y)y −Rπu− fπ

y(0, ·)− y0).

Thus,we canwrite theoptimalcontrolproblemin abstractform:

minimize J(y, u) subjectto E(y, u) = 0 and u ∈ C. (8.11)

8.3 Analysisof the Control Problem

8.3.1 StateEquation

Concerningexistenceand uniquenessof solutionsto the stateequation(8.4) and(8.5),wehave:

Proposition8.4. For all u ∈ U andy0 ∈ H, there existsa uniquey = y(u) ∈ Wsuch thatE(y, u) = 0. Furthermore, with r(u) = Rπu+ fπ,

‖y‖C(H) ≤ ‖y0‖H +1√ν‖r(u)‖L2(V ∗), (8.12)

‖y‖L2(V ) ≤ 1√ν‖y0‖H +

1ν‖r(u)‖L2(V ∗), (8.13)

‖y‖W ≤ c(‖y0‖H + ‖r(u)‖L2(V ∗) + ‖y0‖2H + ‖r(u)‖2L2(V ∗)

). (8.14)

Theconstantc dependsonlyonν.

Proof. Theexistenceanduniquenessis establishedin, e.g.,[107,Thm.3.3],togetherwith thefollowing energy equality

12‖y(t)‖2H + ν

∫ t

0

‖y(s)‖2V ds =12‖y0‖2H +

∫ t

0

〈r(u)(s), y(s)〉V ∗,V ds, (8.15)

which holdsfor all t ∈ [0, T ] andis obtainedby choosingv = y(t) astestfunctionin (8.4), integratingfrom 0 to t, andusing


2∫ t

0

〈yt(s), y(s)〉V ∗,V ds = ‖y(t)‖2H − ‖y(0)‖2H .

By theCauchy–SchwarzandYounginequalitieswe have∫ t

0

∣∣〈r(u)(s), y(s)〉V ∗,V ∣∣ds ≤ ∫ t

0

‖r(u)(s)‖V ∗‖y(s)‖V ∗,V ds

≤ 12ν

∫ t

0

‖r(u)(s)‖2V ∗ds+ν

2

∫ t

0

‖y(s)‖2V ds.

Hence,(8.15)yields

‖y(t)‖2H + ν

∫ t

0

‖y(s)‖2V ds ≤ ‖y0‖2H +1ν

∫ t

0

‖r(u)(s)‖2V ∗ds,

which proves(8.12)and(8.13).Thestateequation(8.4) yields for all v ∈ L2(V ),using(8.6),(8.8),andHolder’s inequality∫ T

0

∣∣〈yt, v〉V ∗,V ∣∣dt ≤ ∫ T

0

(ν|(y, v)V |+ |b(y, y, v)| + |〈r(u), v〉V ∗,V |

)dt

≤∫ T

0

(ν‖y‖V + c‖y‖H‖y‖V + ‖r(u)‖V ∗

)‖v‖V dt≤ (ν‖y‖L2(V ) + c‖y‖L∞(H)‖y‖L2(V ) + ‖r(u)‖L2(V ∗)

) ‖v‖L2(V ).

With theYounginequality, (8.12),and(8.13)follows (8.14). utWe know alreadythat thestateequationpossessesa uniquesolutiony(u). Our aimis to show thatthereducedcontrolproblem

minimize j(u) def= J(y(u), u) subjectto u ∈ B (8.16)

canbesolvedby thesemismoothNewtonmethod.In particular, wemustshow thatjis twicecontinuouslydifferentiable.Thiswill bedonebasedontheimplicit functiontheorem,whichrequiresto investigatethedifferentiabilitypropertiesof theoperatorE. In this context, it is convenientto introducethetrilinear form

β : V × V × V → R, β(u, v,w) = b(u, v,w) + b(v, u,w). (8.17)

The following estimatesareusedseveral times.In their derivation,andthroughoutthe restof this chapter(if not stateddifferently),c denotesa genericconstantthatmaydiffer from instanceto instance.

From(8.6),(8.8),andV → H follows for all u, v,w ∈ V|β(u, v,w)| ≤ |b(u,w, v)| + |b(v,w, u)|

≤ c‖u‖1/2H ‖u‖1/2V ‖v‖1/2H ‖v‖1/2V ‖w‖V (8.18)

≤ c‖u‖1/2H ‖u‖1/2V ‖v‖V ‖w‖V . (8.19)


Further, (8.18)andHolder’s inequalitywith exponents(∞, 4,∞, 4, 2) yield for allu, v ∈ L2(V ) ∩ L∞(H)← W andall w ∈ L2(V )∫ T

0

|β(u, v,w)|dt ≤ c∫ T

0

‖u‖1/2H ‖u‖1/2V ‖v‖1/2H ‖v‖1/2V ‖w‖V dt

≤ c‖u‖1/2L∞(H)

‖u‖1/2L2(V )

‖v‖1/2L∞(H)

‖v‖1/2L2(V )

‖w‖L2(V ). (8.20)

In particular, for all u, v ∈W andw ∈ L2(V ),∫ T

0

|β(u, v,w)|dt ≤ c‖u‖W‖v‖W ‖w‖L2(V ). (8.21)

Finally, (8.19)andHolder’s inequalitywith exponents(∞, 4, 4, 2) give for all u ∈L2(V ) ∩ L∞(H), v ∈ L4(V ), andw ∈ L2(V )∫ T

0

|β(u, v,w)|dt ≤ c∫ T

0

‖u‖1/2H ‖u‖1/2V ‖v‖V ‖w‖V dt

≤ c‖u‖1/2L∞(H)‖u‖1/2L2(V )‖v‖L4(V )‖w‖L2(V ). (8.22)

Wenow prove thatthestateequationis infinitely Frechetdifferentiable.

Proposition8.5. Lety0 ∈ H and(y, u) ∈W×U . ThentheoperatorE : W×U →Z∗ is twice continuouslydifferentiablewith Lipschitz continuousfirst derivative,constantsecondderivative, andvanishingthird andhigherderivatives.Thederiva-tivesaregivenby:

E′1(y, u)(v,w) = vt + νAv +B(y)v +B(v)y − Rπw, (8.23)

E′2(y, u)(v,w) = v(0, ·), (8.24)

E′′1 (y, u)(v,w)(v, w) = B(v)v +B(v)v, (8.25)

E′′2 (y, u)(v,w)(v, w) = 0. (8.26)

Proof. SinceE2 is linearandcontinuous,theassertionsonE′2 andE′′2 areobvious.Thus,weonly haveto considerE1. If E1 is differentiable,thenformaldifferentiationshows thatE′1 hasthe form statedin (8.23).This operatormaps(v,w) ∈ W × UcontinuouslytoL2(V ∗). In fact,for all z ∈ L2(V ), weobtainusing(8.21)∫ T

0

∣∣〈vt + νAv +B(y)v +B(v)y −Rπw, z〉V ∗,V∣∣dt ≤

≤∫ T

0

(‖vt‖V ∗‖z‖V + ν‖v‖V ‖z‖V + |β(y, v, z)| + ‖Rπw‖V ∗‖z‖V)dt

≤ (‖vt‖L2(V ∗) + ν‖v‖L2(V ) + c‖y‖W‖v‖W + ‖Rπ‖U,L2(V ∗)‖w‖U)‖z‖L2(V ).

Next, weshow thatE1 is differentiablewith its derivativegivenby (8.23).Usingthelinearityof A,B(v), v 7→ B(v), andRπ, weobtainfor all y, v ∈W , u,w ∈ U


E1(y + v, u+ w)− E1(y, u)− (vt + νAv + B(y)v +B(v)y −Rπw)= B(y + v)(y + v) −B(y)y − B(y)v −B(v)y = B(v)v.

For all z ∈ L2(V ) holdsby (8.6),(8.8),andHolder’s inequality∫ T

0

∣∣〈B(v)v, z〉V ∗,V∣∣dt =

∫ T

0

|b(v, v, z)|dt ≤∫ T

0

c‖v‖V ‖v‖H‖z‖V dt≤ c‖v‖L2(V )‖v‖L∞(H)‖z‖L2(V ) ≤ c‖v‖2W‖z‖L2(V ),

whichprovestheFrechetdifferentiabilityofE1. NotethatE′1 dependsaffinelinearlyon (y, u) ∈W × U . It remainsto show thatthemapping

E′1 : W × U 7→ L(W × U,L2(V ∗))

is continuousat (0, 0). But this follows from

|〈E′1(y, u)(v,w) −E′1(0, 0)(v,w), z〉V ∗,V | = |β(y, v, z)| ≤ c‖y‖W‖v‖W ‖z‖L2(V ).

for all y, v ∈ W , all u,w ∈ U , andall z ∈ L2(V ), wherewe have used(8.21).As a consequence,E′1 is affine linear and continuous,thus Lipschitz, andE1 istwice continuouslydifferentiablewith constantsecondderivativeasgivenin (8.25).Further, sinceE′′ is constant,it follows thatE(k) = 0 for all k ≥ 3. utThenext resultconcernsthelinearizedstateequationTheproof canbeobtainedbystandardmethods;theinterestedreaderis referredto [79, 80].

Proposition8.6. Let y0 ∈ H and(y, u) ∈ W × U . ThentheoperatorEy(y, u) ∈L(W,Z∗) is a homeomorphism,or, in moredetail: For all y ∈W , g ∈ L2(V ∗), andv0 ∈ H, thelinearizedNavier-Stokesequations

vt + νAv + B(y)v + B(v)y = g in L2(V ∗)v(0, ·) = v0 in H

(8.27)

possessa uniquesolutionv ∈W . Furthermore, thefollowingestimateholds:

‖vt‖L2(V ∗) + ‖v‖L2(V ) + ‖v‖L∞(H) ≤ c‖v‖W (8.28)

≤ c(‖y‖L2(V ), ‖y‖L∞(H))(‖g‖L2(V ∗) + ‖v0‖H) (8.29)

≤ c(‖y‖W )(‖g‖L2(V ∗) + ‖v0‖H), (8.30)

where thefunctionsc(·) dependlocally Lipschitzon their arguments.

Proposition8.7. Themapping

(y, u) ∈ W × U 7→ Ey(y, u)−1 ∈ L(Z∗,W )

is Lipschitz continuouson boundedsets.More precisely, there existsa locally Lips-chitzcontinuousfunctionc such that,for all (yi, ui) ∈ W×U , i = 1, 2, thefollowingholds:

‖Ey(y1, u1)−1 −Ey(y2, u2)−1‖Z∗,W ≤ c(‖y1‖W , ‖y2‖W )‖y1 − y2‖W .


Proof. Let z = (g, v0) ∈ Z∗ = L2(V ∗) × H be arbitraryandset,for i = 1, 2,vi = Ey(yi, ui)−1z. Then,with y12 = y1 − y2, u12 = u1 − u2, andv12 = v1 − v2,wehavev12(0) = 0 and

0 = (E1)y(y1, u1)v1 − (E1)y(y2, u2)v2= (v12)t + νAv12 +B(y1)v1 + B(v1)y1 −B(y2)v2 − B(v2)y2= (v12)t + νAv12 +B(y2)v12 +B(v12)y2 +B(y12)v1 +B(v1)y12= (E1)y(y2, u12)v12 + B(y12)v1 + B(v1)y12,

0 = (E2)y(y1, u1)v1 − (E2)y(y1, u1)v2 = v12(0, ·).

Therefore,

Ey(y2, u12)v12 =(−B(y12)v1 − B(v1)y12

0

),

andthus,by Proposition8.6and(8.21)

‖v12‖W ≤ c(‖y2‖W )(‖B(y12)v1 +B(v1)y12‖L2(V ∗))≤ c(‖y2‖W )‖v1‖W ‖y12‖W≤ c(‖y2‖W )c(‖y1‖W )(‖g‖L2(V ∗) + ‖v0‖H)‖y12‖W≤ c(‖y1‖W , ‖y2‖W )‖y12‖W‖z‖Z∗ ,

wherec(·) arelocally Lipschitzcontinuousfunctions. ut

8.3.2 Control-to-StateMapping

In this sectionwe show that the control-to-statemappingu ∈ U 7→ y(u) ∈ W isinfinitely differentiableandthaty(u), y′(u), andy′′(u) areLipschitzcontinuousonboundedsets.

Theorem8.8. Thesolutionoperator u ∈ U 7→ y(u) ∈ W of (8.10) is infinitelycontinuouslydifferentiable. Further, there exist locally Lipschitz continuousfunc-tionsc(·) such that for all u, u1, u2, v, w ∈ U holds

‖y(u)‖W ≤ c(‖y0‖H , ‖r‖L2(V ∗)), (8.31)

‖y′(u)‖W ≤ c(‖y0‖H , ‖r‖L2(V ∗)), (8.32)

‖y1 − y2‖W ≤ c(‖y0‖H , ‖r1‖L2(V ∗), ‖r2‖L2(V ∗))‖u1 − u2‖U , (8.33)

‖(y′1 − y′2)v‖W ≤ c(‖y0‖H , ‖r1‖L2(V ∗), ‖r2‖L2(V ∗))· ‖Rπ(u1 − u2)‖L2(V ∗)‖Rπv‖L2(V ∗), (8.34)

‖(y′′1 − y′′2 )(v,w)‖W ≤ c(‖y0‖H , ‖r1‖L2(V ∗), ‖r2‖L2(V ∗))· ‖Rπ(u1 − u2)‖L2(V ∗)‖Rπv‖L2(V ∗)‖Rπw‖L2(V ∗), (8.35)

with r = Rπu+ fπ, ri = Rπui + fπ, yi = y(ui), y′i = y′(ui), andy′′i = y′′(ui).


Proof. SinceE is infinitely continuouslydifferentiableby Proposition8.5 andthepartialderivativeEy(y(u), u) ∈ L(W,Z∗) is ahomeomorphismaccordingtoPropo-sition8.6,theimplicit functiontheoremyieldsthatu ∈ U 7→ y(u) ∈W is infinitelycontinuouslydifferentiable.

Theestimate(8.31)is justarestatementof (8.14)in Proposition8.4.Using(8.31)andProposition8.6,weseethatthederivativeu ∈ U 7→ y′(u) ∈ L(U,W ) satisfies,settingy = y(u), for all v ∈ U ,

‖y′(u)v‖W = ‖Ey(y, u)−1Eu(y, u)v‖W ≤ ‖Ey(y, u)−1‖Z∗,W ‖Eu(y, u)v‖Z∗≤ c(‖y‖W )‖Eu(y, u)v‖Z∗ ≤ c(‖y0‖H , ‖r‖L2(V ∗))‖Rπv‖L2(V ∗)

with c(·) beinglocally Lipschitz.Thisproves(8.32).Using(8.32),we obtainfor all u1, u2 ∈ U , settingu12 = u1 − u2 andu(τ) =

τu1 + (1− τ)u2,

‖y1 − y2‖W =∫ 1

0

‖y′(u(τ))u12‖W dτ

≤∫ 1

0

c(‖y0‖H , ‖r(u(τ))‖L2(V ∗)

)‖Rπu12‖L2(V ∗)dτ

≤ c(‖y0‖H , ‖r1‖L2(V ∗), ‖r2‖L2(V ∗))‖Rπ(u1 − u2)‖L2(V ∗)

with a locally Lipschitzfunctionc. Therefore,(8.33)is shown.FromProposition8.7,(8.31),and(8.33),weobtain,for all v ∈ U ,

‖(y′1 − y′2)v‖W = ‖Ey(y1, u1)−1Eu(y1, u1)v − Ey(y2, u2)−1Eu(y2, u2)v‖W≤ c(‖y1‖W , ‖y2‖W )‖y1 − y2‖W ‖Rπv‖L2(V ∗)

≤ c(‖y0‖H , ‖r1‖L2(V ∗), ‖r2‖L2(V ∗))‖Rπ(u1 − u2)‖L2(V ∗)‖Rπv‖L2(V ∗)

with c(·) beinglocally Lipschitzcontinuous.This establishes(8.34).Finally, differentiatingtheequationE(y(u), u) = 0 twiceyields,for all u, v,w ∈

U , with y = y(u),

Ey(y, u)y′′(u)(v,w) +Eyy(y, u)(y′(u)v, y′(u)w)+ Eyu(y, u)(y′(u)v,w) + Euy(y, u)(v, y′(u)w) + Euu(y, u)(v,w) = 0.

Now, weusethatEuv = (−Rπv, 0)T is constantto concludethat

y′′(u)(v,w) = −Ey(y, u)−1Eyy(y, u)(y′(u)v, y′(u)w)

= −Ey(y, u)−1(B(y′(u)v)y′(u)w + B(y′(u)w)y′(u)v

).

Fromthis,Proposition8.7,(8.33),and(8.34)weseethat(8.35)holdstrue. ut

8.3.3 Adjoint Equation

Next, givenacontrolu ∈ U andastatey ∈W , weanalyzetheadjointequation


Ey(y, u)∗(wh

)= g, (8.36)

which canbeusedfor therepresentationof thegradientj′(u). In fact,seeappendixA.1, wehavewith y = y(u)

j′(u) = Ju(y, u) + Eu(y, u)∗(w

h

), where Ey(y, u)∗

(w

h

)= −Jy(y, u).

Proposition8.9. 1. For everyu ∈ U andy ∈ W , theadjoint equation(8.36)pos-sessesa uniquesolution(w,h) ∈ Z = L2(V )×H for all g ∈W ∗. Moreover,

‖w‖L2(V ) + ‖h‖H ≤ c‖(w,h)‖Z ≤ c(‖y‖W )‖g‖W ∗, (8.37)

wherec(·) is locally Lipschitz.2. Assumenow that g ∈ L4/3(V ∗) ∩ W ∗. Thenthe adjoint equationcan be

written in theform

− d

dt(w, v)H + ν(w, v)V + β(y, v,w) = 〈g, v〉V ∗,V ∀ v ∈ V on (0, T ), (8.38)

w(T, ·) = 0 onΩ, (8.39)

h− w(0, ·) = 0 onΩ. (8.40)

Furthermore,wt ∈ L4/3(V ∗) ∩W ∗, w ∈ C(V ∗), and

‖wt‖W ∗ ≤ c(‖y‖W )‖g‖W ∗, (8.41)

‖wt‖L4/3(V ∗) ≤ c(‖y‖W )‖g‖W ∗ + ‖g‖L4/3(V ∗) (8.42)

with c(·) beinglocally Lipschitzcontinuous.

Proof. 1. From Proposition8.6 we know thatEy(y, u) ∈ L(W,Z∗) is a homeo-morphismandthusalsoEy(y, u)∗ ∈ L(Z,W ∗) is a homeomorphism.Hence,theadjointequationpossessesauniquesolution(w,h) ∈ Z = L2(V )×H thatdependslinearly andcontinuouslyong ∈W ∗. Moreprecisely, Proposition8.6yields

‖w‖L2(V ∗) + ‖h‖H ≤ c‖(w,h)‖Z = c‖(Ey(y, u)∗)−1g‖Z≤ c‖(Ey(y, u)∗)−1‖W ∗,Z‖g‖W ∗

= c‖Ey(y, u)−1‖Z∗,W ‖g‖W ∗ ≤ c(‖y‖W )‖g‖W ∗,

wherec(·) dependslocally Lipschitzon‖y‖W .2. For the restof the proof we assumeg ∈ W ∗ ∩ L4/3(V ∗). We proceedby

showing that theadjointequationcoincideswith (8.38).Using the trilinear form βdefinedin (8.17),theadjointstate(w,h) ∈ L2(V )×H satisfiesfor all v ∈W :∫ T

0

(〈vt, w〉V ∗,V +ν(v,w)V +β(y, v,w)−〈g, v〉V ∗,V)dt+(v(0), h)H = 0. (8.43)


In particular, weobtainfor v ∈W replacedbyϕv with ϕ ∈ C∞0 (0, T ) andv ∈ V :

− d

dt(w, v)H + ν(w, v)V + β(y, v,w) = 〈g, v〉V ∗,V ∀ v ∈ V on (0, T ),

in the senseof distributions,which is (8.38). As a result of (8.22), we have thatz ∈ L4(V ) 7→ β(y, z,w) is linear and continuousand thereforean elementofL4(V )∗ = L4/3(V ∗). For v ∈ V this implies β(y, v,w) ∈ L4/3(0, T ). Further,〈g, v〉V ∗,V ∈ L4/3(0, T ) and(w, v)V ∈ L2(0, T ), hence

d

dt(w, v)H = ν(w, v)V + β(y, v,w) − 〈g, v〉V ∗,V ∈ L4/3(0, T ).

This shows that (w, v)H ∈ H1,4/3(0, T ). For all v ∈ V andall ϕ ∈ C∞([0, T ])holdsϕv ∈ W . We choosetheseparticulartestfunctionsin (8.43)andintegratebyparts(which is allowedsinceC∞([0, T ]) → H1,4(0, T )) Thisgives

0 =∫ T

0

((v,w)Hϕ′ +

(ν(v,w)V + β(y, v,w) − 〈g, v〉V ∗,V

)ϕ)dt+ (v, h)Hϕ(0)

=∫ T

0

(− d

dt(w, v)H + ν(w, v)V + β(y, v,w) − 〈g, v〉V ∗,V

)ϕdt

+ (v, h − w(0))Hϕ(0) + (v,w(T ))Hϕ(T )

Theintegral vanishes,since(8.38)wasalreadyshown to hold.Consideringall ϕ ∈C∞([0, T ]) with ϕ(0) = 0 proves(8.39),whereas(8.40)follows by consideringallϕ ∈ C∞([0, T ]) with ϕ(T ) = 0.

Finally, wesolve(8.38)for wt andapply(8.21)to derive,for all z ∈W ,

〈wt, z〉W ∗,W ≤∫ T

0

(ν|(w, z)V |+ |β(y, z,w)|)dt+ |〈g, z〉W ∗,W |

≤ ν‖w‖L2(V )‖z‖L2(V ) + c‖y‖W‖w‖L2(V )‖z‖W + ‖g‖W ∗‖z‖W .

Further, for all z ∈ L4(V ),∫ T

0

〈wt, z〉V ∗,V dt ≤∫ T

0

∣∣ν(w, z)V + β(y, z,w) − 〈g, z〉V ∗,V∣∣dt

≤ (ν‖w‖L4/3(V ) + c‖y‖W‖w‖L2(V ) + ‖g‖L4/3(V ∗))‖z‖L4(V ),

wherewehaveusedHolder’s inequalityand(8.22).Applicationof (8.37)completestheproof of (8.41)and(8.42).Theassertionw ∈ C(V ∗) follows from theembed-dingw ∈ L2(V ) : wt ∈ L4/3(V ∗) → C(V ∗). utOur next aim is to estimatethedistanceof two adjointstates(wi, hi), i = 1, 2, thatcorrespondto differentstatesyi andright handsidesgi.


Proposition8.10. For given yi ∈ W and gi ∈ W ∗ ∩ L4/3(V ∗), i = 1, 2, let(wi, hi) ∈ L2(V ) × H denotethe correspondingsolutionsof the adjoint equa-tion (8.36) with state yi and right hand side gi. Thenwi ∈ L2(V ) ∩ C(V ∗),(wi)t ∈ W ∗ ∩ L4/3(V ∗), hi = wi(0), and

‖w1 − w2‖L2(V ) + ‖(w1 − w2)t‖L4/3(V ∗) + ‖h1 − h2‖H≤ c(‖y1‖W , ‖y2‖W )

(‖g1 − g2‖W ∗ + ‖g1‖W ∗‖y1 − y2‖W)

+ ‖g1 − g2‖L4/3(V ∗),

(8.44)

wherec(·) is locally Lipschitzcontinuous.

Proof. The existenceandregularity resultsarethosestatedin Proposition8.9. In-troducingthe differencesw12 = w1 − w2, h12 = h1 − h2, y12 = y1 − y2, andg12 = g1 − g2, wehavew12(T ) = 0 andh12 = w12(0) onΩ and,on (0, T ),

− d

dt(w12, v)H + ν(w12, v)V + β(y1, v, w1)− β(y2, v, w2) = 〈g12, v〉V ∗,V .

Rearrangingtermsyields

− d

dt(w12, v)H + ν(w12, v)V + β(y2, v, w12) = 〈g12, v〉V ∗,V − β(y12, v, w1).

Therefore,(w12, h12) is solutionof theadjointequationfor thestatey2 andtherighthandside

g = g12 − `, ` : v 7→ β(y12, v, w1).

From(8.21),(8.22)weknow that` ∈W ∗ ∩ L4/3(V ∗) and

‖`‖W ∗ + ‖`‖L4/3(V ∗) ≤ c‖y12‖W ‖w1‖L2(V ).

Therefore,by Proposition8.9

‖w12‖L2(V ) + ‖(w12)t‖L4/3(V ∗) + ‖h12‖H ≤ c(‖y2‖W )‖g‖W ∗ + ‖g‖L4/3(V ∗)

≤ c(‖y2‖W )(‖g12‖W ∗ + c‖w1‖L2(V )‖y12‖W

)+ ‖g12‖L4/3(V ∗)

+ c‖w1‖L2(V )‖y12‖W≤ c(‖y2‖W )

(‖g12‖W ∗ + ‖w1‖L2(V )‖y12‖W)

+ ‖g12‖L4/3(V ∗)

≤ c(‖y2‖W )(‖g12‖W ∗ + c(‖y1‖W )‖g1‖W ∗‖y12‖W

)+ ‖g12‖L4/3(V ∗)

≤ c(‖y1‖W , ‖y2‖W )(‖g12‖W ∗ + ‖g1‖W ∗‖y12‖W

)+ ‖g12‖L4/3(V ∗),

wherec(·) is locally Lipschitz.Theproof is complete. ut

8.3.4 Propertiesof the ReducedObjectiveFunction

We will now show that thereducedobjective functionj meetsall requirementsthatare neededto apply semismoothNewton methodsfor the solution of the controlproblem(8.16).Wehave,sinceJ is quadratic,


Ju(y, u) = λ(u− ud), Jy(y, u) = Nπ∗(Nπy − zd),Juu(y, u) = λI, Juy(y, u) = 0,Jyu(y, u) = 0, Jyy(y, u) = Nπ∗Nπ.

Since,u ∈ U 7→ y(u) ∈ W is infinitely differentiableandy, y′, andy′′ areLipschitzcontinuouson boundedsets,seeTheorem8.8,we obtainthatj(u) = J(y(u), u) isinfinitely differentiablewith j, j′, and j′′ beingLipschitz continuouson boundedsets.

Further, usingtheadjointrepresentationof thegradient,andthefactthatEuv =(−Rπv, 0)T , wehave,with y = y(u),

j′(u) = Ju(y, u)−Rπ∗w = λ(u− ud)−R∗w, (8.45)

wherew solvestheadjointequation(8.38),(8.39)with right handside

g = −Jy(y, u) = −Nπ∗(Nπy − zd) ∈ L2(V ∗) →W ∗ ∩ L4/3(V ∗). (8.46)

Therefore,wehave:

Theorem8.11. Thereducedobjectivefunctionj : U = L2(Qc)l → R is infinitelydifferentiablewith j, j′, and j′′ beingLipschitz continuouson boundedsets.Thereducedgradienthastheform

j′(u) = λu+G(u),G(u) = −R∗w − λud,

wherew is theadjoint state. In particular, theoperatorG mapsL2(Qc)l Lipschitzcontinuouslyon boundedsetsto Lp

′(Qc)l. Further, G : L2(Qc)l → L2(Qc)l is

continuouslydifferentiablewithG′(u) = G′(u)∗ beingboundedon boundedsetsinL(L2(Qc)l, Lp

′(Qc)l)

Proof. Thepropertiesof j follow from Theorem8.8and(8.45).TheLipschitzcon-tinuity assertionon G follows from (8.44), (8.33), and (8.46). Further, G(u) =j′(u) − λu is, consideredas a mappingL2(Qc)l → L2(Qc)l, continuouslydif-ferentiablewith derivative G′(u) = j′′(u) − λI. In particular, we seethatG′ isself-adjoint.Now considerG′(u) for all u ∈ Bρ = ρBL2(Qc)l . On this setG maps

Lipschitz continuouslyinto Lp′(Qc)l. Denotingthe Lipschitz rank by Lρ, we now

prove ‖G′(u)‖L2(Qc)l,Lp′ (Qc)l ≤ Lρ for all u ∈ Bρ. In fact, for all u ∈ Bρ andall

v ∈ L2(Qc)l wehaveu+ tv ∈ Bρ for t > 0 smallenoughandthus

‖G′(u)v‖Lp′ (Qc)l = limt→0+

t−1‖G(u+ tv)−G(u)‖Lp′(Qc)l ≤ Lρ‖v‖L2(Qc)l .

utFor illustration,weconsiderthecasewhereΩc ⊂ Ω, l = 2, and

(Rv)(x) = v(x) for x ∈ Ωc, (Rv)(x) = 0, otherwise.

Weneedthefollowing embedding:

8.4 Applicationof SemismoothNewton Methods 181

Lemma 8.12. For all 1 ≤ p < 7/2 andall v ∈ L2(V ) with vt ∈ L4/3(V ∗) holds

‖v‖Lp(Q)2 ≤ c(‖vt‖L4/3(V ∗) + ‖v‖L2(V )

).

Proof. In [7] it is provedthatfor all 1 ≤ q < 8 holds

W 4/3 = v ∈ L2(V ) : vt ∈ L4/3(V ∗) → Lq(H)

(theembeddingis evencompact).We proceedby showing that for all p ∈ [1, 7/2)thereexistsq ∈ [1, 8) suchthatLq(H)∩L2(V ) → Lp(Q)2. Dueto theboundednessof Q it sufficesto considerall p ∈ [2, 7/2). Recall thatV → Ls(Ω)2 for all s ∈[1,∞). Now let r = 4, r′ = 4/3,

θ = 1− 32p∈ [1/4, 4/7) and s =

67− 2p

∈ [2,∞).

Thenholds

θ

2+

1− θs

=1p,

1r

+1r′

= 1,

q = θpr = 4p− 6 ∈ [2, 8), (1− θ)pr′ = 2.

Thus,we canapplytheinterpolationinequalityandHolder’s inequalityto conclude

‖v‖pLp(Q)2 =∫ T

0

‖v‖pLp(Ω)2dt ≤ c∫ T

0

‖v‖θpL2(Ω)2‖v‖(1−θ)pLs(Ω)2dt

≤ c(∫ T

0

‖v‖θprL2(Ω)2dt

)1/r (∫ T

0

c‖v‖(1−θ)pr′Ls(Ω)2 dt

)1/r′

= c‖v‖θpLq(H)2‖v‖(1−θ)pL2(Ls(Ω)2)

≤ c(‖vt‖L4/3(V ∗) + ‖vt‖L2(V )

)θp‖v‖(1−θ)pL2(V )

≤ c(‖vt‖L4/3(V ∗) + ‖vt‖L2(V )

)p.

utFor 2 < p′ < 7/2 wethushave that

w ∈W 4/3 → Lp′(Q)2 7→ R∗w = w|Qc

∈ Lp′(Qc)2

is continuous,sothatTheorem8.11is applicable.

8.4 Application of SemismoothNewtonMethods

Wenow considerthereducedproblem(8.16)with feasiblesetof theform (8.3),andreformulateits first ordernecessaryoptimality conditionsin form of thenonsmoothoperatorequation


Π(u) = 0,

Π(u)(t, ω) = u(t, ω)− PC(u(t, ω)− λ−1j′(u)(t, ω)

), (t, ω) ∈ Qc.

Let usassumethatPC is semismooth.Then,for r = 2 andany p′ asspecified,The-orem8.11shows thatAssumption5.14is satisfiedby F = j′. Therefore,Theorem5.15is applicableandyieldsthe∂CΠ-semismoothnessofΠ : L2(Qc)l → L2(Qc)l.

If wepreferto work with a reformulationby meansof adifferentLipschitzcon-tinuousandsemismoothfunctionπ,

π(x) = 0 ⇐⇒ x1 − PC(x1 − x2) = 0,

in theformπ(u, j′(u)

)= 0,

wecanuseTheorem5.11toestablishthesemismoothnessof theresultingoperatorasa mappingLp(Qc)l → L2(Qc)l for any p ≥ p′. Therefore,ourclassof semismoothNewtonmethodsis applicableto bothreformulations.

We alsocanapply thesufficient conditionfor regularity of Theorem4.8.Sincethis conditionwasestablishedin theframework of NCPs,we considernow thecaseU = L2(Qc) andC = [0,∞). Then,we immediatelyseethat Theorem8.11pro-videseverythingto verify Assumption4.6, provided that j′′(u) is coercive on thetangentspaceof thestronglyactive constraintsasassumedin (e) andthat theusedNCP-functionπ = φ satisfies(f)–(h). Thecoercivity conditioncanbeinterpretedasastrongsecondordersufficientconditionfor optimality, see[46, 143].

We arecurrentlyworking on a finite-elementdiscretizationof the flow controlproblemandhopeto have numericalresultsavailablesoon.In the implementationof themethodwe planto usea preconditionediterative method(gmres,cg,etc.)forthe solutionof the semismoothNewton system.Hereby, dependingon the particu-lar problem,reductiontechniquescanbeusedto symmetrizethesemismoothNew-ton system,which makesconjugategradientmethodsapplicable.Theencouragingnumericalresultsby Hinze andKunisch[80] for second-ordermethodsappliedtotheunconstrainedcontrolof theNavier-Stokesequationmake usconfidentthat thesemismoothNewtonmethodcanbesolvedefficiently.

9. Optimal Control of the CompressibleNavier–StokesEquations

9.1 Intr oduction

In this chapterwe apply our classof semismoothNewton methodsto a boundarycontrolproblemgovernedby thetime-dependentcompressibleNavier–Stokesequa-tions.The underlyingNavier–Stokessolver andthe adjoint codefor the computa-tion of the reducedgradientweredevelopedin joint work with ScottCollis (RiceUniversity),MatthiasHeinkenschloss(RiceUniversity),KavehGhayour(RiceUni-versity), andStefan Ulbrich (TU Munchen)aspart of the the Rice AeroAcousticControl (RAAC) project,which was initiated and is directedby Scott Collis andMatthiasHeinkenschloss.Themajoraimof this projectis to put forwardanoptimalcontrol framework for the control of aeroacousticnoisewherethe acousticsourceis predictedby theunsteady, compressible,Navier-Stokesequations.A particularlyinterestingapplicationis the control of the soundarisingfrom Blade-Vortex Inter-action (BVI), which can occur for rotorcraft undercertainflight conditions(e.g.,during landing).Hereby, vorticesshedby a precedingbladehit a subsequentbladewhich resultsin ahighamplitude,impulsivenoise.This loudnoiserestrictscivil ro-torcraftuseseverely, andthusmakesactivenoisecontrolonthebladesurfacehighlydesirable.For moredetailswe referto [34, 35] andthereferencestherein.

9.2 The Flow Control Problem

In the following, we will not considernoisecontrol.Rather, we contentourselveswith solvingamodelproblemto investigatetheviability of ourapproachfor control-ling thecompressibleNavier–Stokesequations.Thismodelconsistsin two counter-rotatingviscousvorticesaboveaninfinite wall which,dueto theself-inducedveloc-ity field, propagatedownwardandinteractwith thewall. As controlmechanismweusesuctionandblowing on partof thewall, i.e., we control thenormalvelocity ofthefluid on this partof thewall. As computationaldomainwe usea rectangle

Ω = (−L1, L1)× (0, L2).

Thewall is locatedatx2 ≡ 0, whereastheleft, right, andupperpartof theboundaryare“transparent”in thesensethatwe posenonreflectingboundaryconditionsthere.Ω is occupiedby a compressiblefluid whosestateis describedby y = (ρ, v1, v2, θ)

184 9. OptimalControlof theCompressibleNavier–StokesEquations

with densityρ(t, x), velocitiesvi(t, x), i = 1, 2, andtemperatureθ(t, x). Hereby,t ∈ I def= (0, T ) is the time andx = (x1, x2) denotesthespatiallocation.Thestatesatisfiesthe

CompressibleNavier–StokesEquations (CNS):

∂

∂tF 0(y) +

2∑i=1

∂

∂xiF i(y) =

2∑i=1

∂

∂xiGi(y,∇y) on I ×Ω,

y(0, ·) = y0 on Ω.

Hereby, we have written CNSin conservative form. Boundaryconditionsarespeci-fied below. Wehaveusedthefollowing notation:

F 0(y) =

ρρv1ρv2ρE

, F 1(y) =

ρv1

ρv21 + pρv1v2

(ρE + p)v1

, F 2(y) =

ρv2ρv1v2ρv2

2 + p(ρE + p)v2

,

Gi(y,∇y) =1

Re

0τ1iτ2i

τ1iv1 + τ2iv2 +κ

(γ − 1)M2Prθxi

.

Thepressurep, thetotalenergy perunit massE, andthestresstensorτ aregivenby

p =ρθ

γM2, E =

θ

γ(γ − 1)M2+

12(v2

1 + v22),

τii = 2µ(vi)xi+ λ(∇ · v), τ12 = τ21 = µ((v1)x2 + (v2)x1).

Hereµ andλ arethefirst andsecondcoefficient of viscosity, κ is the thermalcon-ductivity, M is thereferenceMachnumber, Pr is thereferencePrandtlnumber, andReis thereferenceReynoldsnumber. Theboundaryconditionson thewall are

∂θ/∂n = 0, v1 = 0, v2 = u on Σc = I × (−L1, L1)× 0,andon therestof theboundarywe posenonreflectingboundaryconditionsthatarederivedfrom inviscidcharacteristicboundaryconditions.

At the initial time t = 0 two counter-rotating viscousvorticesare locatedinthe centerof Ω. Without control (v2 = u ≡ 0), the vorticesmove downward andinteractwith thewall, whichcausesthemto bounceback,seeFigure9.1.Ouraim isto performcontrolby suctionandblowing onthewall in suchawaythattheterminalkinetic energy is minimized.To this end,wechoosetheobjective function

J(y, u) =∫Ω

[ρ2(v2

1 + v22)]t=T

dx+α

2‖u‖2H1(Σc)

.

The first term is the kinetic energy at the final time t = T , whereasthe secondterm is anH1-regularizationwith respectto (t, x1). Here,we write α > 0 for the

9.3 Adjoint-BasedGradientComputation 185

regularizationparameterto avoid confusionwith thesecondcoefficientof viscosity.As controlspace,we chooseU = H1(I,H1

0 (−L1, L1)). We stressthat themathe-maticalexistenceanduniquenesstheoryfor thecompressibleNavier–Stokesequa-tions,see[81, 108, 111] for stateof theart references,seemsnot yet to becompleteenoughto admitarigorouscontroltheory. Therefore,ourchoiceof thecontrolspaceis guidedmoreby formal andheuristicargumentsthanby rigorouscontrol theory.If theH1-regularizationis omittedor replacedby anL2-regularization,thecontrolexhibits increasinglyheavy oscillationsin time andspaceduringthecourseof opti-mization,which indicatesthat theproblemis ill-posedwithout a sufficiently strongregularization.

In theRAAC project,we consideredso far only theunconstrainedflow controlproblemandworkedwith a nonlinearconjugategradientmethodfor its solution.Inthe following, we want to solve the sameproblem,but with the control subjecttopointwiseboundconstraints.We thenapplyour inexactsemismoothNewton meth-odsanduseBFGS-updates[41, 42] to approximatetheHessianof thereducedob-jective function. Therefore,in the following we restrict the control by pointwiseboundconstraints(with therealisticinterpretationthatwe areonly allowedto injector draw off fluid with a certainmaximumspeed),andarrive at the following flowcontrolproblem:

minimize J(y, u) def=∫Ω

[ρ2(v2

1 + v22)]t=T

dx+α

2‖u‖2H1(Σc)

subjectto y solvesCNSfor theboundaryconditionsassociatedwith u,

umin ≤ u ≤ umax.

(9.1)

9.3 Adjoint-Based Gradient Computation

For our computationswe usethe following resultsthat wereobtainedjointly withScottCollis,KavehGhayour, MatthiasHeinkenschloss,andStefanUlbrich [34, 35]:

1. A Navier–Stokessolver, written in Fortran90by ScottCollis [36], wasportedtotheparallelcomputerSGI Origin 2000andadjustedto therequirementsof optimalcontrol. For spacediscretizationfinite differencesare usedwhich are sixth orderaccuratein theinterior of thedomain.Thetime discretizationis doneby anexplicitRunge–Kuttamethod.Thecodewasparallelizedon thebasisof OpenMP.

2. Two differentvariantsof adjoint-basedgradientcomputationwereconsidered:

(a) The first approachderives the adjoint Navier–Stokesequationsincluding ad-joint wall boundaryconditions[35]. Thederivationof adjointboundarycondi-tionsfor thenonreflectingboundaryconditionsturnsout to beadelicatematterandis not yet completelydone.Hence,in this approachwe have usedthe(ap-propriatelyaugmented)adjointboundaryconditionsof theEulerequation.Thegradientcalculationthenrequiresthesolutionof theNavier–Stokesequations,followed by the solutionof the adjoint Navier–Stokesequationsbackward in


time. Sincethe discretizedadjoint equationis usuallynot the exact adjoint ofthediscretestateequation,this approach,whichusuallyis calledoptimize, thendiscretize(OD), only yieldsinexactdiscretegradientsin general.

(b) In asecondapproachwehaveinvestigatedtheadjoint-basedcomputationof gra-dientsby applyingthereversemodeof automaticdifferentiation(AD). Hereby,weusedtheAD-softwareTAMC [59], asource-to-sourcecompiler, whichtrans-latesFortran90routinesto their correspondingadjointFortran90routines.Thisapproachyields exact (up to roundoff errors)discretegradientsandis termeddiscretize, thenoptimize(DO).

For the computationalresultsshown below, the DO methoddescribedin (b) wasused.This approachhastheadvantageof providing exactdiscretegradients,whichis very favorablewhendoing optimization.In fact, descentmethodsbasedon in-exactgradientsrequirea controlmechanismover theamountof inexactness,whichis not a trivial task in OD basedapproaches.Secondly, the useof exact gradientsis very helpful in verifying the correctnessof the adjoint code,sincepotentialer-rorscanusuallybefoundimmediatelyby comparingdirectionalderivativeswith thecorrespondingfinite differencequotients.

Whenworking with theOD approach,which hastheadvantagethat thesourcecodeof theCNS-solver is not required,thediscretizationof stateequation,adjointequation,andobjectivefunctionhaveto becompatible(in asensenotdiscussedhere,see,e.g.,[34,74]) to obtaingradientsthataregoodapproximations(i) of theinfinite-dimensionalgradients,and(ii) of theexactdiscretegradients.Hereby, requirement(ii) is importantfor asuccessfulsolutionof thediscretecontrolproblem,whereas(i)crucially influencesthequality of thecomputeddiscreteoptimalcontrol,measuredin termsof the infinite-dimensionalcontrolproblem.This secondissuealsoappliesto theDO approach,but for DO it is only importantto usecompatiblediscretizationsfor stateequationandobjective function.With respectto this interestingtopic, wehaveused[74] asa guideline,to whichwereferfor furtherreference.

9.4 SemismoothBFGS-NewtonMethod

Theimplementationof thesemismoothNewtonmethodusesBFGS-approximationsof the Hessianmatrix. The resultingsemismoothNewton systemshave a similarstructureasthosearisingin thestepcomputationof thesuccessfulLimited-MemoryBFGSmethodL-BFGS-B by Byrd, Lu, NocedalandZhu [25, 148]. Hence,in ourimplementationwedecidedto follow thedesignof L-BFGS-B(thecomputationsforthis chapterweredonebeforewedevelopedour trust–region theoryin section6).

9.4.1 Quasi-NewtonBFGS-Approximations

In thissection,wefocusontheuseof BFGS-approximationsin semismoothNewtonmethodsfor the discretizedcontrol problem.We stress,however, that convergence

9.4 SemismoothBFGS-Newton Method 187

resultsfor quasi-Newton methodsin infinite-dimensionalHilbert spacesareavail-able[64, 94, 131]. Usingasimilar notationasin chapter7, thesemismoothNewtonsystemfor thediscretecontrolproblemassumestheform (written in coordinatesinthediscreteL2-space)

[Dh1 ]k + [Dh

2 ]kHhkshk = −Φh(uhk)

with Hhk = jh

′′(uhk) anddiagonalmatrices[Dhi ]k, |([Dh

1 ]k + [Dh2 ]k)jj | ≥ κ.

For theapproximationof theHessianHhk wework with

Limited-Memory BFGS-Matrices (l ≈ 10):

Bhk = Bh

0 −WhkZ

hkW

hk

T ∈ Rnh×nh

, Whk ∈ Rn

h×2l, Zhk ∈ R2l×2l,

wherewehaveusedthecompactrepresentationof [26], to whichwereferfor details.The matrix Bh

0 is the initial BFGS-matrixandshouldbe chosensuchthat (a) theproduct(Bh

0)−1vh canbecomputedreasonablyefficient,sincethis is neededin theBFGS-updates,and(b) the innerproductinducedby Bh

0 approximatestheoriginalinfinite-dimensionalinner producton U sufficiently well. In the caseof our flowcontrol problem,we have U = H1(I,H1

0 (−L1, L1)), andusea finite differenceapproximationof the underlyingLaplaceoperatorto obtain Bh

0 . Comparedwiththestateandadjointsolves,thesolutionof the2-D Helmholtzequationrequiredtocompute(Bh

0)−1vh is negligible. The inverseof Mhk = [Dh

1 ]k + [Dh2 ]kBh

k canbecomputedby theSherman–Morrison–Woodbury formula:

(Mhk)−1 = Ch

k + Chk [D

h2 ]kWh

k

(I − ZhkW

hk

TChk [D

h2 ]kWh

k

)−1ZhkW

hk

TChk ,

whereChk = ([Dh

1 ]k + [Dh2 ]kBh

0)−1.

9.4.2 The Algorithm

Wenow giveasketchof thealgorithm:

1. TheHessianmatrix of thediscreteobjective function is approximatedby Lim-ited-MemoryBFGS-matrices.Hereby, we chooseBh

0 suchthat it representsafinite differenceapproximationof theinnerproductonU .

2. Theglobalizationis similarasin thewell-acceptedL-BFGS-Bmethodof Byrd,Lu, NocedalandZhu [25, 148]:

i. At thecurrentpointuhk ∈ Bh, theobjectivefunctionjh is approximatedbyaquadraticmodelqhk .

ii. Startingfrom uhk , a generalizedCauchypoint uh,ck ∈ Bh is computedbyanArmijo-type linesearchfor qhk alongtheprojectedgradientpath

PBh(uhk − tjhk′), t ≥ 0.

iii. ThesemismoothNewtonmethodis usedto computeaNewtonpointuh,nk .


iv. By approximateminimizationof qhk alongtheprojectedpath

PBh(uh,ck + t(uh,nk − uh,ck )), t ∈ [0, 1],

thepointuh,qk is computed.

v. Thenew iterateuhk+1 is obtainedby approximateminimizationof jhk onthe

line segment[uhk ,uh,qk ], usingthealgorithmby More–Thuente[114].

Remark 9.1. We shouldmentionthatwe have not developeda convergencetheoryfor theabovealgorithm.We alsopoint out that thecontrolproblemunderconsider-ation doesnot fit directly in the framework underwhich we analyzedsemismoothNewton methods.In particular, the problemis not posedin Lp. Nevertheless,wethink that the developedtheory is encouragingenoughto try to apply the methodalsoto problemsfor whichacompletetheoryis not (yet) available.

9.5 Numerical Results

We now presentnumerical resultsfor the describedsemismoothBFGS-Newtonmethodwhen appliedto the flow control problem(9.1). Here are the main factsaboutproblemandimplementation:

• Thespacediscretizationis doneby a high orderfinite differencemethodon an128× 96 cartesianmesh.

• For the time discretizationthe standard4-stageRunge–Kutta methodis used,with 600 time stepsandT = 24. This allows parallelizationwithin eachtimestep.

• We computeexactdiscretegradientsby solvingtheadjointof thediscretestateequation,which is obtainedby the reversemodeof automaticdifferentiationusingTAMC [59].

• As optimizationmethod,we usethe semismoothBFGS-Newton methodde-scribedabove.

• Parameter:Re = 50, Pr = 1, M = 0.5, γ = 1.4; regularizationparameterα = 0.005; boundsumin = −0.2, umax = 0.2.

• As NCP-functionweuseavariantof thepenalizedFischer–Burmeisterfunction[28].

• The resultingproblemhasover 75,000control variablesandover 29,000,000statevariablesandthusis very largescale.

• ThecomputationswereperformedonanSGIOrigin 2000with 16R12000pro-cessorsand10GBmemory. Weusedfour processors.

Figure 9.1 displaysthe state(the densityρ is shown) of the uncontrolledsystem(v2|Σc

= u ≡ 0). Weseethatthevorticeshit thewall andbounceback.Theterminalstate,atwhichweevaluatethekineticenergy, is shown in thelast,magnifiedpicture.Theresultingterminalkineticenergy in theno-controlcaseis


No control(v2|Σc= u ≡ 0): Ekin|t=T = J(y(0), 0) = 7.9.

Figure9.2 shows the state(representedby the densityρ) whenoptimal control isapplied.Hereby, the optimal control wasobtainedby 100 iterationsof the BFGS-Newton method.The resultingterminalkinetic energy in the optimal control caseandtheobjective functionvalue(Ekin|t=T + regularization), respectively, are

Optimalcontrol(v2|Σc= u∗): Ekin|t=T = 0.059, J(u∗, y(u∗)) = 0.085,

whereu∗ denotesthe computedoptimal control,which is displayedin Figure9.3.It canbeseenin Figure9.3 that the lower boundbecomesactive. In fact, theupperboundalso is active at a few points,but this is not apparentfrom the picture.By

−15−10

−50

510

150

4

8

12

16

20

24

−0.2

0

0.2

x1

t

Figure9.3Computedoptimalcontrolu∗.

applyingoptimalcontrolthevorticesaresuccessfullyabsorbed.If wehaddisplayedthe kinetic energy insteadof the density, the vorticeswould be almostinvisible atthe terminaltime in theoptimalcontrol case,sincetheoptimalcontrol reducestheterminalkinetic energy to lessthanonehundredthof its valuewithout control. Incomparisonwith our computationalexperiencefor theunconstrainedcontrolprob-lem, the semismoothNewton methodperformscomparablyefficient. This showsthe efficiency of semismoothNewton methodsfor the solutionof very large scaleproblems.

A. Appendix

A.1 Adjoint Approachfor Optimal Control Problems

In this appendixwe describethe adjoint approachfor the computationof gradientandHessianof thereducedobjective function.Hereby, we considertheabstractop-timal controlproblem

minimizey∈Y,u∈U

J(y, u) subjectto E(y, u) = 0, u ∈ Uad (A.1)

with feasiblesetUad ⊂ U , objective function

J : Y × U → R,

andstateequationoperatorE : Y × U →W ∗.

Thecontrol spaceU andthestatespaceY areBanachspaces,andW ∗ is the dualof a reflexive BanachspaceW . We assumethe existenceof a neighborhoodV ofUad suchthat, for all u ∈ V , the stateequationE(y, u) = 0 possessesa uniquesolution y = y(u). Then the control problem(A.1) is equivalent to the reducedcontrolproblem

minimize j(u) subjectto u ∈ Uad, (A.2)

wherej : U ⊃ V → R, j(u) = J(y(u), u) is thereducedobjective function.

A.1.1 Adjoint Representationof the ReducedGradient

We now describethe adjoint approachfor the computationof j′(u). To this end,we assumethat J andE are Frechetdifferentiablenear(y(u), u) and that u 7→y(u) is Frechetdifferentiablenearu. Accordingto theimplicit functiontheorem,thelatterholds,e.g.,if E is continuouslydifferentiablenear(y(u), u) andif thepartialderivative Ey(y(u), u) is continuouslyinvertible. Under the given hypothesesthefunctionj is differentiablenearu.

We introducea Lagrangemultiplier w ∈ W for thestateequationin (A.1) anddefinetheLagrangefunctionL : Y × V ×W → R,

L(y, u,w) = J(y, u) + 〈E(y, u), w〉W ∗,W .

192 A. Appendix

SinceE(y(u), u) = 0 onV , we have

L(y(u), u,w) = J(y(u), u) = j(u) ∀ u ∈ V, w ∈ W.

Hence,

j′(u) = yu(u)∗Ly(y(u), u,w) + Lu(y(u), u,w) ∀ u ∈ V, w ∈W. (A.3)

Theideanow is to choosew ∈W suchthat

Ly(y(u), u,w) = 0.

This equationis calledadjoint equationand its solutionw = w(u) ∈ W is theadjoint state. Thus,written in detail, the adjoint statew = w(u) is the solutionoftheadjointequation

Jy(y(u), u) +Ey(y(u), u)∗w = 0.

If we assumethat Ey(y(u), u) is continuouslyinvertible, the adjoint statew isuniquelydetermined.Forw = w(u) weobtain

j′(u) = yu(u)∗Ly(y(u), u,w(u)) + Lu(y(u), u,w(u))= Lu(y(u), u,w(u)) = Ju(y(u), u) +Eu(y(u), u)∗w(u).

Theidentityj′(u) = Ju(y(u), u) + Eu(y(u), u)∗w(u)

is calledadjoint representationof thereducedgradientj′(u).Therefore,thederivativej′(u) canbecomputedasfollows:

1. Computethestatey = y(u) ∈ Y by solvingthestateequation

E(y, u) = 0.

2. Computetheadjointstatew = w(u) ∈ W by solvingtheadjointequation

Ey(y, u)∗w = −Jy(y, u).

3. Computej′(u) = Ju(y, u) + Eu(y, u)∗w.

Remark A.1. If thestateequationis aninitial valueproblem,thentheadjointequa-tion is reversein time. For the derivation of adjoint equationsfor varioustypesofcontrolproblemsgovernedby PDEs,wereferto Lions [106].

A.1.2 Adjoint Representationof the ReducedHessian

The adjoint approachcanbe continuedto obtainadjoint formulasfor the Hessianoperatorj′′(u). To this end,we assumethatJ andE aretwice differentiablenear

A.1 Adjoint Approachfor OptimalControlProblems 193

(y(u), u) andthatu 7→ y(u) is twice differentiablenearu. By (A.3), we have, forall w ∈ W andall v1, v2 ∈ U , writing y = y(u),

j′′(u)(v1, v2) = Lyu(y, u,w)(yu(u)v1, v2) + Lyy(y, u,w)(yu(u)v1, yu(u)v2)+ 〈Ly(y, u,w), yuu(u)(v1, v2)〉Y ∗,Y + Luy(y, u,w)(v1, yu(u)v2)+ Luu(y, u,w)(v1, v2).

If wechoosew = w(u), thenLy(y(u), u,w) = 0, andthus

j′′(u) = T (u)∗L′′(y,u)(y(u), u,w(u))T (u), (A.4)

whereL′′(y,u) denotesthesecondpartialderivativewith respectto (y, u), and

T (u) =(yu(u)IU

)=(−Ey(y(u), u)−1Eu(y(u), u)

IU

).

Hereby, in thesecondexpressionfor T (u) we assumethatEy(y(u), u) is continu-ouslyinvertibleandusethat,sinceE(y(·), ·) ≡ 0, thereholds

Ey(y(u), u)yu(u) +Eu(y(u), u) = 0.

Remark A.2. It is interestingto notethatin thecasewhereEy(y(u), u) is continu-ously invertible,themappingT (u) is a continuouslinearhomeomorphismfrom Uto the null spaceof E′(y(u), u). In fact, it is obvious thatE′(y(u), u)T (u) = 0.Conversely, if Ey(y(u), u)h+Eu(y(u), u)v = 0, then

h = −Ey(y(u), u)−1Eu(y(u), u)v,

andthus (h

v

)= T (u)v.

Therefore,j′′(u) is the restrictionof the HessianL′′(y,u)(y(u), u,w(u)) of the La-grangianto thenull spaceof E′(y(u), u), parameterizedby v ∈ U 7→ T (u)v.

Usually, the formula (A.4) is not usedto computethe completeHessianoperator.Rather, it is usedto computedirectionalderivativesj′′(u)v of j′. Hereis therequiredprocedure:

1. Computethestatey = y(u) ∈ Y by solvingthestateequation

E(y, u) = 0.

2. Computetheadjointstatew = w(u) ∈ W by solvingtheadjointequation

Ey(y, u)∗w = −Jy(y, u).3. Computez = z(u) ∈ Y assolutionof thelinearizedstateequation

Ey(y, u)z = −Eu(y, u)v.4. Computeh = h(u) ∈W by solvingtheadjointsystem

Ey(y, u)∗h = −Lyy(y, u,w)z − Lyu(y, u,w)v.

5. Setj′′(u)v := Eu(y, u)∗h+ Luy(y, u,w)z + Luu(y, u,w)v.

194 A. Appendix

A.2 Several Inequalities

For convenience,we recall several well-known inequalities,which are frequentlyusedthroughoutthis work.

Lemma A.3 (Holder’s inequality). Let pi ∈ [1,∞], i = 1 . . . , n, andp ∈ [1,∞]satisfy

1p1

+ · · · + 1pn

=1p.

Then,for all fi ∈ Lpi(Ω) holdsf = f1f2 · · · fn ∈ Lp(Ω) and

‖f‖Lp ≤ ‖f1‖Lp1 · · · ‖fn‖Lpn .

Thefollowing estimateis frequentlyusedin chapter3. It follows immediatelyfromHolder’s inequality.

Lemma A.4. LetΩ bebounded,1 ≤ p ≤ q ≤ ∞, and

cp,q(Ω) def= µ(Ω)q−ppq if p < q <∞,

cp,∞(Ω) def= µ(Ω)1/p if p <∞,cp,q(Ω) def= 1 if p = q.

Thenfor all v ∈ Lq(Ω) holds

‖v‖Lp ≤ cp,q(Ω)‖v‖Lq .

Lemma A.5 (Young’s inequality). Let begivena, b ≥ 0, η > 0, andp, q ∈ (1,∞)with 1/p+ 1/q = 1. Then,setting0p = 0, holds

ab ≤ η

pap +

η−q/p

qbq.

A.3 Elementary Propertiesof Multifunctions

A multifunctionΓ : X ⊃ V ⇒ Y betweenBanachspacesX andY assignstoevery x ∈ V a subsetΓ (x) ⊂ Y of Y , which canbe empty. Γ is calledclosed-valued(compact-valued,nonempty-valued,etc.)if for all x ∈ V theimagesetΓ (x)is closed(compact,nonempty, etc.).

Definition A.6. [32, 129]A multifunctionΓ : V ⇒ Rl definedonV ⊂ Rk is uppersemicontinuousatx ∈ Vif for all ε > 0 thereexistsδ > 0 suchthat

Γ (x′) ⊂ z + h : z ∈ Γ (x), ‖h‖ < ε for all x′ ∈ V , ‖x′ − x‖ < δ.

A.4 NemytskijOperators 195

Definition A.7. [32, 129] A multifunctionΓ : V ⇒ Rl definedon themeasurablesetV ⊂ Rk is calledmeasurable [129, p. 160] if it is closed-valuedandif for allclosed(or open,or compact,see[129,Prop.1A]) setsC ⊂ Rl thepreimage

Γ−1(C) = x ∈ V : Γ (x) ∩ C 6= ∅

is measurable.

Thefollowing theoremis important:

TheoremA.8 (MeasurableSelection).[32, Thm.3.1.1]Let Γ : V ⊂ Rk ⇒ Rl be measurableand nonempty-valued.Thenthere existsameasurablefunctionγ : V → Rl such thatγ(x) ∈ Γ (x) for all x ∈ V .

Furtherresultsonset-valuedanalysiscanbefoundin [11, 32, 129].

A.4 Nemytskij Operators

In this appendixwe establishseveral resultson superposition(or Nemytskij) oper-atorsinvolving differentiableouterfunctions.Theseresultsareusedin theproof ofthe continuousdifferentiability of the merit function u 7→ ‖Φ(u)‖2L2/2 in section6 aswell asin theanalysisof thenonlinearelliptic controlproblemin section7.1.ConcerningNemytskijoperators,wealsoreferto [8, 9, 10]

PropositionA.9. LetΩ ⊂ Rn bemeasurablewith finite measure and1 ≤ p, q <∞. Let f : Rm → R be continuousand considerF (u)(x) = f(u(x)) for u ∈Lp(Ω)m. Assumethat

|f(u)| ≤ c1 + c2‖u‖p/q2 ∀ u ∈ Rm (A.5)

with constantsci ≥ 0. ThenF : Lp(Ω)m → Lq(Ω) is continuousandboundedwith

‖F (u)‖Lq ≤ C1 + C2‖u‖p/q[Lp]m .

with constantsCi ≥ 0.

Proof. See[147, Prop.26.6]. utPropositionA.10. LetΩ ⊂ Rn be measurablewith finite measure and 1 ≤ q <p < ∞. Let f : Rm → R becontinuouslydifferentiableandconsiderF (u)(x) =f(u(x)) for u ∈ Lp(Ω)m. Assumethat

‖f ′(u)‖2 ≤ c′1 + c′2‖u‖p−q

q

2 ∀ u ∈ Rm (A.6)

with constantsc′i ≥ 0. ThenF : Lp(Ω)m → Lq(Ω) is continuouslyFrechetdiffer-entiablewithF ′(u)v = f ′(u)v.

196 A. Appendix

Proof. Wehave

|f(u)| ≤ |f(0)|+∫ 1

0

|f ′(tu)u|dt ≤ |f(0)|+ ‖u‖2∫ 1

0

(c′1 + c′2‖tu‖

p−qq

2

)dt

≤ |f(0)|+ c′1‖u‖2 +c′2qp‖u‖

pq

2 ≤ c1 + c2‖u‖pq

2

with constantsci ≥ 0. Hence,by PropositionA.9, F : Lp → Lq is continuous.Further, with r = pq/(p− q) thereholds

p

r=p− qq

,

sothatu ∈ Lp(Ω)m → fui(u) ∈ Lr(Ω) is continuousby PropositionA.9. Hence,

‖f ′(u)v‖Lq ≤ C‖f ′(u)‖[Lr]m‖v‖[Lp]m ,

showing thatM(u) : v ∈ [Lp]m 7→ f ′(u)v ∈ Lq satisfiesM(u) ∈ L([Lp]m, Lq).Theestimate

‖f ′(u1)v − f ′(u2)v‖Lq ≤ C‖f ′(u1)− f ′(u2)‖[Lr]m‖v‖[Lp]m ,

provesthatM : [Lp]m → L([Lp]m, Lq) is continuous.Further,

‖F (u+ v)− F (u)−M(u)v‖Lq = ‖f(u+ v)− f(u)− f ′(u)v‖Lq

=∥∥∥∥∫ 1

0

[f ′(u+ tv)− f ′(u)]vdt∥∥∥∥Lq

≤∫ 1

0

‖[f ′(u+ tv)− f ′(u)]v‖Lqdt

≤∫ 1

0

‖f ′(u+ tv) − f ′(u)‖[Lr]m‖v‖[Lp]mdt

= o(‖v‖[Lp]m) as‖v‖[Lp]m → 0,

sothatF is continuouslyFrechetdifferentiablewith F ′ = M . utPropositionA.11. LetΩ ⊂ Rn bemeasurablewith finite measureand1 ≤ p, q <∞, p > 2q. Let f : R → R be twice continuouslydifferentiableand considerF (u)(x) = f(u(x)) for u ∈ Lp(Ω). Assumethat

|f ′′(u)| ≤ c′′1 + c′′2 |u|p−2q

q (A.7)

with constantsc′′i ≥ 0. ThenF : Lp(Ω) → Lq(Ω) is twice continuouslyFrechetdifferentiable

F ′(u)v = f ′(u)v, F ′′(u)(v,w) = f ′′(u)vw. (A.8)

Proof. As in theproofof PropositionA.10 we obtainconstantsc′i ≥ 0 with

|f ′(u)| ≤ c′1 + c′2|u|p−q

q .

A.4 NemytskijOperators 197

Hence,by PropositionA.10, F : Lp → Lq is continuouslydifferentiablewithderivativeF ′(u)v = f ′(u)v. Now considerg(u) = f ′(u). From(A.7) andProposi-tion A.10 weobtainthatfor r = pq/(p− q) > q theoperator

G : Lp(Ω)→ Lr(Ω), G(u) = g(u(x)) = f ′(u(x)),

is continuouslydifferentiablewith derivativeG′(u)v = g′(u)v = f ′′(u)v. Now,definetheoperatorb(u; v,w) = f ′′(u)vw. Then

‖b(u; v,w)‖Lq ≤ ‖f ′′(u)v‖Lr‖w‖Lp ≤ ‖G′(u)‖Lp,Lr‖v‖Lp‖w‖Lp .

Therefore,b(u; ·, ·) is a continuousbilinear operatorLp × Lp → Lq that dependscontinuouslyonu ∈ Lp. Further,

‖F ′(u+ w)v − F ′(u)v − b(u; v,w)‖Lq = ‖f ′(u+ w)v − f ′(u)v − f ′′(u)vw‖Lq

≤ ‖f ′(u+ w) − f ′(u)− f ′′(u)w‖Lr‖v‖Lp

= ‖G(u+ w)−G(u)−G′(u)w‖Lr‖v‖Lp

= o(‖w‖Lp)‖v‖Lp as‖v‖Lp , ‖w‖Lp → 0.

This provesthatF : Lp → Lq is twice continuouslydifferentiablewith derivativesasin (A.8). ut

Notations

GeneralNotations

‖ · ‖Y Norm of theBanachspaceY .

(·, ·)Y Innerproductof theHilbert spaceY .

Y ∗ Dualspaceof theBanachspaceY .

〈·, ·〉Y ∗,Y Dualpairingof theBanachspaceY andits dualspaceY ∗.

〈·, ·〉 Dualpairing〈u, v〉 =∫Ωu(ω)v(ω)dω.

L(X,Y ) Spaceof boundedlinearoperatorsM : X → Y from theBanachspaceX to theBanachspaceY , equippedwith thenorm‖ · ‖X,Y .

‖ · ‖X,Y StrongoperatornormonL(X,Y ), i.e.,

‖M‖X,Y = sup‖Mx‖Y : x ∈ X, ‖x‖X = 1.M∗ Adjoint operatorof M ∈ L(X,Y ), i.e.,M∗ ∈ L(Y ∗,X∗) and

〈Mx, y′〉Y,Y ∗ = 〈x,M∗y′〉X,X∗ for all x ∈ X, y′ ∈ Y ∗.BY Openunit ball about0 in theBanachspaceY .

BY Closedunit ball about0 in theBanachspaceY .

Bnp Openunit ball about0 in (Rn, ‖ · ‖p).Bnp Closedunit ball about0 in (Rn, ‖ · ‖p).∂Ω Boundaryof thedomainΩ.

cl M Topologicalclosureof thesetM .

co M Convex hull of thesetM .

coM Closedconvex hull of thesetM .

µ Lebesguemeasure.

1Ω′ Characteristicfunction of a measurablesetΩ′ ⊂ Ω, taking the valueoneonΩ′ andzeroon its complementΩ \Ω′.

200 A. Appendix

Derivatives

F ′ Frechet-derivative of theoperatorF : X → Y , i.e.,F ′(x) ∈ L(X,Y )and‖F (x+ s)− F (x)− F ′(x)s‖Y = o(‖s‖X) as‖s‖X → 0.

Fx PartialFrechet-derivativeof theoperatorF : X × Y → Z with respectto x ∈ X.

F ′′ SecondFrechetderivative.

Fxy SecondpartialFrechetderivative.

∂Bf B-differentialof thelocally Lipschitzfunctionf : Rn → Rm.

∂f Clarke’s generalizedJacobianof thelocally Lipschitzcontinuousfunc-tion f : Rn → Rm.

∂Cf Qi’s C-subdifferentialof thelocally Lipschitzfunctionf : Rn → Rm.

∂∗f Generalizeddifferentialof anoperatorf : X → Y , seesection3.2.

∂Ψ Generalizeddifferentialof a superpositionoperatorΨ(u) = ψ(G(u)),seesection3.3.

Function Spaces

Lp(Ω) p ∈ [1,∞); Banachspaceof equivalenceclassesof Lebesguemeasur-

ablefunctionsu : Ω → R suchthat‖u‖Lpdef=(∫Ω|u(x)|pdx)1/p <∞.

L2(Ω) is aHilbert spacewith innerproduct(u, v)L2 =∫Ωu(x)v(x)dx.

L∞(Ω) Banachspaceof equivalenceclassesof Lebesguemeasurablefunc-tions u : Ω → R that are essentially bounded on Ω, i.e.,‖u‖L∞ def= esssup

x∈Ω|u(x)| <∞.

C∞0 (Ω) Spaceof infinitely differentiablefunctionsu : Ω → R, Ω ⊂ Rn open,with compactsupportclx : u(x) 6= 0 ⊂ Ω.

Hk,p(Ω) k ≥ 0, p ∈ [1,∞]; Sobolev spaceof functionsu ∈ Lp(Ω), Ω ⊂ Rnopen,such that Dαu ∈ Lp(Ω) for all weak derivatives up to orderk, i.e., for all |α| ≤ k. HerebyDα = ∂α1

∂xα11· · · ∂αn

∂xαnn

and |α| =

α1 + · · · + αn. Hk,p(Ω) is a Banachspacewith norm ‖u‖Hk,p =(∑|α|≤k ‖Dαu‖pLp

)1/pandsimilarly for p =∞.

Hk(Ω) k ≥ 0; shortnotationfor theHilbert spaceHk,2(Ω).

Hk0 (Ω) k ≥ 1; closureof C∞0 (Ω) in Hk(Ω).

H−k(Ω) k ≥ 1; dual spaceof Hk0 (Ω) with respectto the distributional dual

pairing.

Severalvector-valuedfunctionspacesareintroducedin section8.2.

References

[1] F. AbergelandR. Temam,Onsomecontrol problemsin fluid mechanics, Theor. Com-put.Fluid Dyn., 1 (1986),pp.303–326.

[2] W. Alt, TheLagrange-Newtonmethodfor infinite-dimensionaloptimizationproblems,Numer. Funct.Anal. Optim.,11 (1990),pp.201–224.

[3] , Parametric optimizationwith applicationsto optimal control and sequentialquadratic programming, Bayreuth.Math.Schr., (1991),pp.1–37.

[4] , Sequentialquadratic programmingin Banach spaces, in Advancesin optimiza-tion (Lambrecht,1991),Springer, Berlin, 1992,pp.281–301.

[5] W. Alt andK. Malanowski, TheLagrange-Newtonmethodfor nonlinearoptimalcon-trol problems, Comput.Optim.Appl., 2 (1993),pp.77–100.

[6] W. Alt, R. Sontag,and F. Troltzsch,An SQPmethodfor optimal control of weaklysingular Hammerstein integral equations, Appl. Math. Optim., 33 (1996),pp. 227–252.

[7] H. Amann,Compactembeddingsof vector-valuedSobolev and Besov spaces, Glas.Mat. Ser. III, 35(55)(2000),pp.161–177.

[8] J. Appell, Upper estimatesfor superpositionoperators and someapplications, Ann.Acad.Sci.Fenn.Ser. A I Math.,8 (1983),pp.149–159.

[9] , Thesuperpositionoperator in functionspaces—asurvey, Exposition.Math.,6(1988),pp.209–270.

[10] J.Appell andP. P. Zabrejko, Nonlinearsuperpositionoperators, CambridgeUniversityPress,Cambridge,1990.

[11] J.-P. Aubin andH. Frankowska,Set-valuedanalysis, BirkhauserBostonInc., Boston,MA, 1990.

[12] C. BaiocchiandA. Capelo,Variational andquasivariationalinequalities, JohnWiley& SonsInc.,New York, 1984.

[13] A. Bensoussanand J.-L. Lions, Impulsecontrol and quasivariational inequalities,Gauthier-Villars, Montrouge,1984.

[14] M. Bergounioux,M. Haddou,M. Hintermuller, andK. Kunisch,A comparisonof aMoreau–Yosida-basedactivesetstrategy and interior point methodsfor constrainedoptimalcontrol problems, SIAM J.Optim.,11 (2000),pp.495–521.

[15] M. Bergounioux,K. Ito, andK. Kunisch,Primal-dualstrategyfor constrainedoptimalcontrol problems, SIAM J.ControlOptim.,37 (1999),pp.1176–1194.

[16] T. Bewley, R. Temam,andM. Ziane,Existenceanduniquenessof optimal control totheNavier-Stokesequations, C. R. Acad.Sci.ParisSer. I Math.,330(2000),pp.1007–1011.

[17] T. R.Bewley, R.Temam,andM. Ziane,A general frameworkfor robustcontrol in fluidmechanics, Phys.D, 138(2000),pp.360–392.

[18] S. C. Billups, Algorithmsfor complementarityproblemsand generalized equations,PhDthesis,ComputerSciencesDepartment,Universityof Wisconsin,Madison,Wis-consin,1995.

202 References

[19] J. F. BonnansandC. Pola,A trust region interior point algorithm for linearly con-strainedoptimization, SIAM J.Optim.,7 (1997),pp.717–731.

[20] J. M. Borwein andQ. J. Zhu, A survey of subdifferential calculuswith applications,NonlinearAnal., 38 (1999),pp.687–773.

[21] A. BrandtandC. W. Cryer, Multigrid algorithmsfor thesolutionof linear complemen-tarity problemsarising fromfreeboundaryproblems, SIAM J.Sci.Statist.Comput.,4(1983),pp.655–684.

[22] H. Brezis,Problemesunilateraux, J.Math.PuresAppl. (9), 51 (1972),pp.1–168.[23] W. L. Briggs, V. E. Henson,andS. F. McCormick,A multigrid tutorial, Societyfor

IndustrialandAppliedMathematics(SIAM), Philadelphia,PA, seconded.,2000.[24] J.BurgerandM. Pogu,Functionalandnumericalsolutionof a control problemorigi-

natingfromheattransfer, J.Optim.TheoryAppl., 68 (1991),pp.49–73.[25] R. H. Byrd, P. Lu, J. Nocedal,andC. Y. Zhu, A limited memoryalgorithmfor bound

constrainedoptimization, SIAM J.Sci.Comput.,16 (1995),pp.1190–1208.[26] R. H. Byrd, J.Nocedal,andR. B. Schnabel,Representationsof quasi-Newtonmatrices

and their usein limited memorymethods, Math. Programming,63 (1994),pp. 129–156.

[27] P. H. Calamaiand J. J. More, Projectedgradient methodsfor linearly constrainedproblems, Math.Programming,39 (1987),pp.93–116.

[28] B. Chen,X. Chen,and C. Kanzow, A penalizedFischer-BurmeisterNCP-function,Math.Program.,88 (2000),pp.211–216.

[29] B. Chenand N. Xiu, A global linear and local quadratic noninterior continuationmethodfor nonlinearcomplementarityproblemsbasedonChen-Mangasariansmooth-ing functions, SIAM J.Optim.,9 (1999),pp.605–623.

[30] X. Chen,Z. Nashed,andL. Qi, Smoothingmethodsandsemismoothmethodsfor non-differentiableoperator equations, SIAM J.Numer. Anal., 38 (2000),pp.1200–1216.

[31] X. Chen,L. Qi, andD. Sun,Global and superlinearconvergenceof the smoothingNewtonmethodandits applicationto general boxconstrainedvariational inequalities,Math.Comp.,67 (1998),pp.519–540.

[32] F. H. Clarke, Optimizationand nonsmoothanalysis, JohnWiley & SonsInc., NewYork, 1983.

[33] F. H. Clarke, Y. S.Ledyaev, R. J. Stern,andP. R. Wolenski,Nonsmoothanalysisandcontrol theory, Springer-Verlag,New York, 1998.

[34] S. S. Collis, K. Ghayour, M. Heinkenschloss,M. Ulbrich, and S. Ulbrich, Towardsadjoint-basedmethodsfor aeroacousticcontrol, in 39thAerospaceScienceMeeting&Exhibit, January8–11,2001,Reno,Nevada,AIAA Paper2001–0821,2001.

[35] , Numericalsolution of optimal control problemsgovernedby the compress-ible Navier–Stokesequations, in Proceedingsof theInternationalConferenceonOpti-mal Controlof Complex Structures,G. Leugering,J. Sprekels,andF. Troltzsch,eds.,BirkhauserVerlag,2001,to appear.

[36] S.S.Collis andS.K. Lele,A computationalinvestigationof recepvitityin high-speedflow neara sweptleading-edge, TechnicalReportTF-71,Flow PhysicsandComputa-tion Division, Departmentof MechanicalEngineering,StanfordUniversity, Stanford,California,1996.

[37] B. D. CravenandB. M. Glover, An approach to vectorsubdifferentials, Optimization,38 (1996),pp.237–251.

[38] T. DeLuca,F. Facchinei,andC.Kanzow, A semismoothequationapproach to thesolu-tion of nonlinearcomplementarityproblems, Math.Programming,75(1996),pp.407–439.

[39] , A theoretical and numericalcomparisonof somesemismoothalgorithmsforcomplementarityproblems, Comput.Optim.Appl., 16 (2000),pp.173–205.

[40] J.E. Dennis,Jr. andJ. J.More, A characterizationof superlinearconvergenceanditsapplicationto quasi-Newtonmethods, Math.Comp.,28 (1974),pp.549–560.

References 203

[41] J.E. Dennis,Jr. andJ.J.More,Quasi-Newtonmethods,motivationandtheory, SIAMRev., 19 (1977),pp.46–89.

[42] J.E.Dennis,Jr. andR.B. Schnabel,Numericalmethodsfor unconstrainedoptimizationandnonlinearequations, Prentice-HallInc., EnglewoodClif fs, N.J.,1983.

[43] M. DesaiandK. Ito, Optimal controls of Navier-Stokesequations, SIAM J. ControlOptim.,32 (1994),pp.1428–1446.

[44] P. DeuflhardandM. Weiser, Local inexactNewtonmultilevelFEM for nonlinearellip-tic problems, in Computationalsciencefor the21stCentury, M.-O. Bristeau,G. Etgen,W. Fitzigibbon,J.-L. Lions, J. Periaux,andM. Wheeler, eds.,Wiley, 1997,pp. 129–138.

[45] S.P. DirkseandM. C. Ferris,ThePATH solver:A non-monotonestabilizationschemefor mixedcomplementarityproblems, OptimizationMethodsandSoftware,5 (1995),pp.123–156.

[46] J. C. DunnandT. Tian, Variantsof theKuhn-Tucker sufficientconditionsin conesofnonnegativefunctions, SIAM J.ControlOptim.,30 (1992),pp.1361–1384.

[47] G. Duvaut and J.-L. Lions, Inequalitiesin mechanicsand physics, Springer-Verlag,Berlin, 1976.GrundlehrenderMathematischenWissenschaften,219.

[48] B. C. Eaves,On thebasictheoremof complementarity, Math.Programming,1 (1971),pp.68–75.

[49] I. EkelandandR. Temam,Convex analysisandvariational problems, North-HollandPublishingCo.,Amsterdam,1976.

[50] F. Facchinei,A. Fischer, andC. Kanzow, Regularity propertiesof a semismoothrefor-mulationof variational inequalities, SIAM J.Optim.,8 (1998),pp.850–869.

[51] F. Facchinei,H. Jiang,andL. Qi, A smoothingmethodfor mathematicalprogramswithequilibriumconstraints, Math.Program.,85 (1999),pp.107–134.

[52] F. Facchineiand C. Kanzow, A nonsmoothinexact Newton methodfor the solutionof large-scalenonlinearcomplementarityproblems, Math. Programming,76 (1997),pp.493–512.

[53] F. FacchineiandJ.Soares,A new merit functionfor nonlinearcomplementarityprob-lemsanda relatedalgorithm, SIAM J.Optim.,7 (1997),pp.225–247.

[54] M. C. Ferris,C. Kanzow, andT. S. Munson,Feasibledescentalgorithmsfor mixedcomplementarityproblems, Math.Programming,(1999),pp.475–497.

[55] A. Fischer, A special Newton-typeoptimizationmethod, Optimization, 24 (1992),pp.269–284.

[56] , Solutionof monotonecomplementarityproblemswith locally lipschitzianfunc-tions, Math.Programming,76 (1997),pp.513–532.

[57] M. FukushimaandJ.-S.Pang,Somefeasibility issuesin mathematicalprogramswithequilibriumconstraints, SIAM J.Optim.,8 (1998),pp.673–681.

[58] A. V. Fursikov, Optimalcontrol of distributedsystems.Theoryandapplications, Amer-icanMathematicalSociety, Providence,RI, 2000.

[59] R. GieringandT. Kaminski,Recipesfor adjointcodeconstruction, ACM Transactionsof MathematicalSoftware,24 (1998),pp.437–474.

[60] V. Girault and P.-A. Raviart, Finite elementmethodsfor Navier-Stokes equations,Springer-Verlag,Berlin, 1986.

[61] B. M. Glover andD. Ralph,First order approximationsto nonsmoothmappingswithapplicationto metricregularity, Numer. Funct.Anal. Optim.,15 (1994),pp.599–620.

[62] R. Glowinski, Numerical methodsfor nonlinear variational problems, Springer-Verlag,New York, 1984.

[63] R. Glowinski, J.-L. Lions, andR. Tremolieres,Numericalanalysisof variational in-equalities, North-HollandPublishingCo.,Amsterdam,1981.

[64] A. Griewank,Thelocal convergenceof Broyden-likemethodsonLipschitzianproblemsin Hilbert spaces, SIAM J.Numer. Anal., 24 (1987),pp.684–705.

204 References

[65] L. Grippo, F. Lampariello,and S. Lucidi, A nonmonotoneline search techniqueforNewton’smethod, SIAM J.Numer. Anal., 23 (1986),pp.707–716.

[66] W. A. GruverandE.Sachs,Algorithmicmethodsin optimalcontrol, Pitman(AdvancedPublishingProgram),Boston,Mass.,1981.

[67] M. D. Gunzburger, L. Hou, andT. P. Svobodny, Analysisand finite elementapprox-imation of optimal control problemsfor the stationaryNavier-StokesequationswithdistributedandNeumanncontrols, Math.Comp.,57 (1991),pp.123–151.

[68] M. D. Gunzburger, L. S.Hou,andT. P. Svobodny, Analysisandfiniteelementapprox-imation of optimal control problemsfor the stationaryNavier-StokesequationswithDirichlet controls, RAIRO Model. Math.Anal. Numer., 25 (1991),pp.711–748.

[69] M. D. GunzburgerandS.Manservisi,Thevelocitytrackingproblemfor Navier-Stokesflowswith boundeddistributedcontrols, SIAM J.ControlOptim.,37(1999),pp.1913–1945.

[70] , Analysisandapproximationof thevelocitytrackingproblemfor Navier-Stokesflowswith distributedcontrol, SIAM J.Numer. Anal., 37 (2000),pp.1481–1512.

[71] , Thevelocitytracking problemfor Navier-Stokesflowswith boundarycontrol,SIAM J.ControlOptim.,39 (2000),pp.594–634.

[72] W. Hackbusch,Multigrid methodsandapplications, Springer-Verlag,Berlin, 1985.[73] W. HackbuschandU. Trottenberg (eds.),Multigrid methods, Springer-Verlag,Berlin,

1982.[74] W. W. Hager, Runge-Kutta methodsin optimal control and the transformedadjoint

system, Numer. Math.,87 (2000),pp.247–282.[75] M. Heinkenschloss,Formulationandanalysisof a sequentialquadratic programming

methodfor the optimal Dirichlet boundarycontrol of Navier-Stokesflow, in Optimalcontrol(Gainesville,FL, 1997),Kluwer Acad.Publ.,Dordrecht,1998,pp.178–203.

[76] M. HeinkenschlossandF. Troltzsch,Analysisof the Lagrange-SQP-Newton methodfor thecontrol of a phasefieldequation, ControlCybernet.,28 (1999),pp.177–211.

[77] M. Heinkenschloss,M. Ulbrich, and S. Ulbrich, Superlinearand quadratic conver-genceof affine-scalinginterior-pointNewtonmethodsfor problemswith simpleboundswithoutstrict complementarityassumption, Math.Program.,86 (1999),pp.615–635.

[78] M. Hintermuller, K. Ito, andK. Kunisch,Theprimal-dualactivesetstrategy assemi-smoothnewton method, Bericht Nr. 214 desSpezialforschungsbereichsF003 Opti-mierungundKontrolle,Karl-FranzensUniversitat Graz,Austria,2001.

[79] M. Hinze,Optimalandinstantaneouscontrol of theinstationaryNavier–Stokesequa-tions, Habilitationsschrift,FachbereichMathematik,TechnischeUniversitat Berlin,Berlin, Germany, 2000.

[80] M. HinzeandK. Kunisch,Secondordermethodsfor optimalcontrol of time-dependentfluid flow, BerichtNr. 165desSpezialforschungsbereichsF003OptimierungundKon-trolle, Karl-FranzensUniversitat Graz,Austria,1999.

[81] D. Hoff, Discontinuoussolutionsof theNavier–Stokesequationsfor multidimensionalflowsof heat-conductingfluids, Arch. RationalMech.Anal., 139(1997),pp.303–354.

[82] R. H. W. Hoppe,Unemethodemultigrille pour la solutiondesproblemesd’obstacle,RAIRO Model. Math.Anal. Numer., 24 (1990),pp.711–735.

[83] R.H. W. HoppeandR. Kornhuber, Adaptivemultilevelmethodsfor obstacleproblems,SIAM J.Numer. Anal., 31 (1994),pp.301–323.

[84] A. D. Ioffe, Nonsmoothanalysis:differential calculusof nondifferentiablemappings,Trans.Amer. Math.Soc.,266(1981),pp.1–56.

[85] V. Jeyakumar, Simple characterizations of superlinear convergence for semis-moothequationsvia approximateJacobians, Applied MathematicsResearchReportAMR98/28, Schoolof Mathematics,University of New SouthWales,Sydney, NewSouthWales,Australia,1998.

References 205

[86] , Solving B-differentiable equations, Applied MathematicsResearchReportAMR98/27, Schoolof Mathematics,University of New SouthWales,Sydney, NewSouthWales,Australia,1998.

[87] V. JeyakumarandD. T. Luc, ApproximateJacobianmatricesfor nonsmoothcontinu-ousmapsandC1-optimization, SIAM J.ControlOptim.,36 (1998),pp.1815–1832.

[88] H. Jiang,M. Fukushima,L. Qi, andD. Sun,A trust regionmethodfor solvinggeneral-izedcomplementarityproblems, SIAM J.Optim.,8 (1998),pp.140–157.

[89] H. JiangandL. Qi, A new nonsmoothequationsapproach to nonlinearcomplementar-ity problems, SIAM J.ControlOptim.,35 (1997),pp.178–193.

[90] H. JiangandD. Ralph,SmoothSQPmethodsfor mathematicalprogramswith nonlin-earcomplementarityconstraints, SIAM J.Optim.,10 (2000),pp.779–808.

[91] L. V. Kantorovich and G. P. Akilov, Functional analysis, PergamonPress,Oxford,seconded.,1982.

[92] C.Kanzow andH. Pieper, Jacobiansmoothingmethodsfor nonlinearcomplementarityproblems, SIAM J.Optim.,9 (1999),pp.342–373.

[93] C. Kanzow and M. Zupke, Inexact trust-region methodsfor nonlinear complemen-tarity problems, in Reformulation:Nonsmooth,Piecewise Smooth,SemismoothandSmoothingMethods(Lausanne,1997),M. FukushimaandL. Qi, eds.,Kluwer Acad.Publ.,Dordrecht,1999,pp.211–233.

[94] C. T. Kelley andE. W. Sachs,A new proof of superlinearconvergencefor Broyden’smethodin Hilbert space, SIAM J.Optim.,1 (1991),pp.146–150.

[95] , Multilevel algorithmsfor constrainedcompactfixedpoint problems, SIAM J.Sci.Comput.,15 (1994),pp.645–667.

[96] , A trust region methodfor parabolic boundarycontrol problems, SIAM J.Op-tim., 9 (1999),pp.1064–1081.Dedicatedto JohnE. Dennis,Jr., on his60thbirthday.

[97] N. Kikuchi andJ. T. Oden,Contactproblemsin elasticity: a studyof variational in-equalitiesandfinite elementmethods, Societyfor IndustrialandAppliedMathematics(SIAM), Philadelphia,PA, 1988.

[98] D. KinderlehrerandG. Stampacchia,An introductionto variational inequalitiesandtheir applications, AcademicPressInc.,New York, 1980.

[99] R. Kornhuber, Monotonemultigrid methodsfor elliptic variational inequalities.I, Nu-mer. Math.,69 (1994),pp.167–184.

[100] , Monotonemultigrid methodsfor elliptic variational inequalities.II , Numer.Math.,72 (1996),pp.481–499.

[101] , Adaptivemonotonemultigrid methodsfor nonlinearvariational problems, B.G. Teubner, Stuttgart,1997.

[102] B. Kummer, Newton’s methodfor nondifferentiablefunctions, in Advancesin Mathe-maticalOptimization,J. Guddatet al., eds.,Akademie-Verlag,Berlin, 1988,pp. 114–125.

[103] , Newton’s methodbasedon generalizedderivativesfor nonsmoothfunctions:convergenceanalysis, in Advancesin Optimization(Lambrecht,1991),W. Oettli andD. Pallaschke,eds.,Springer, Berlin, 1992,pp.171–194.

[104] I. LasieckaandR. Triggiani, Regularity theoryof hyperbolicequationswith nonho-mogeneousNeumannboundaryconditions.II. General boundarydata, J. DifferentialEquations,94 (1991),pp.112–164.

[105] C.-J.Lin andJ. J. More, Newton’s methodfor large bound-constrainedoptimizationproblems, SIAM J.Optim.,9 (1999),pp.1100–1127.Dedicatedto JohnE. Dennis,Jr.,onhis 60thbirthday.

[106] J.-L. Lions, Optimal control of systemsgovernedby partial differential equations.,Springer-Verlag,New York, 1971.

[107] P.-L. Lions, Mathematicaltopicsin fluid mechanics.Vol. 1, TheClarendonPressOx-ford UniversityPress,New York, 1996.

206 References

[108] , Mathematicaltopicsin fluid mechanics.Vol. 2, The ClarendonPressOxfordUniversityPress,New York, 1998.

[109] Z.-Q. Luo, J.-S.Pang,andD. Ralph,Mathematicalprogramswith equilibrium con-straints, CambridgeUniversityPress,Cambridge,1996.

[110] O. L. Mangasarian,Equivalenceof the complementarityproblemto a systemof non-linear equations, SIAM J.Appl. Math.,31 (1976),pp.89–92.

[111] A. MatsumuraandT. Nishida,Theinitial valueproblemfor theequationsof motionofviscousandheat-conductivegases, J.Math.Kyoto Univ., 20 (1980),pp.67–104.

[112] G. P. McCormickandK. Ritter, Methodsof conjugatedirectionsversusquasi-Newtonmethods, Math.Programming,3 (1972),pp.101–116.

[113] R. Mif flin, Semismoothandsemiconvex functionsin constrainedoptimization, SIAMJ.ControlOptim.,15 (1977),pp.959–972.

[114] J. J. More andD. J. Thuente,Line search algorithmswith guaranteedsufficient de-crease, ACM Trans.Math.Software,20 (1994),pp.286–307.

[115] T. S. Munson,AlgorithmsandEnvironmentsfor Complementarity, PhD thesis,Com-puterSciencesDepartment,Universityof Wisconsin,Madison,Wisconsin,2000.

[116] T. S. Munson,F. Facchinei,M. C. Ferris, A. Fischer, and C. Kanzow, The Semis-moothalgorithm for large scalecomplementarityproblems, MathematicalProgram-ming TechnicalReportMP-TR-99-07,ComputerSciencesDepartment,UniversityofWisconsin,Madison,Wisconsin,1999.

[117] P. D. Panagiotopoulos,Inequalityproblemsin mechanicsandapplications.Convex andnonconvex energy functions, BirkhauserBostonInc., Boston,Mass.,1985.

[118] J.-S.PangandL. Qi, Nonsmoothequations:motivationandalgorithms, SIAM J.Op-tim., 3 (1993),pp.443–465.

[119] H.-D. Qi, L. Qi, andD. Sun,SolvingKKT systemsvia the trust region andtheconju-gategradientmethods, Applied MathematicsResearchReportAMR99/19,SchoolofMathematics,University of New SouthWales,Sydney, New SouthWales,Australia,1999.

[120] L. Qi, Convergenceanalysisof somealgorithms for solving nonsmoothequations,Math.Oper. Res.,18 (1993),pp.227–244.

[121] , C-differential operators, C-differentiability and generalizedNewton methods,ResearchReportAMR96/5, Schoolof Mathematics,Universityof New SouthWales,Sydney, New SouthWales,Australia,1996.

[122] L. Qi andJ. Sun,A nonsmoothversion of Newton’s method, Math. Programming,58(1993),pp.353–367.

[123] D. Ralph,Rank-1supportfunctionalsandtherank-1generalizedJacobian,piecewiselinear homeomorphisms, PhD thesis,ComputerSciencesDepartment,University ofWisconsin,Madison,Wisconsin,1990.

[124] , Global convergenceof dampedNewton’s methodfor nonsmoothequationsviathepathsearch, Math.Oper. Res.,19 (1994),pp.352–389.

[125] K. Ritter, Aquasi-Newtonmethodfor unconstrainedminimizationproblems, in Nonlin-earprogramming,2 (Proc.SpecialInterestGroupMath.ProgrammingSympos.,Univ.Wisconsin,Madison,Wis., 1974),AcademicPress,New York, 1974,pp.55–100.

[126] S.M. Robinson,Stabilitytheoryfor systemsof inequalities.II. Differentiablenonlinearsystems, SIAM J.Numer. Anal., 13 (1976),pp.497–513.

[127] , Normalmapsinducedby linear transformations, Math.Oper. Res.,17 (1992),pp.691–714.

[128] , Newton’s methodfor a class of nonsmoothfunctions, Set-Valued Anal., 2(1994),pp.291–305.

[129] R. T. Rockafellar, Integral functionals,normal integrandsandmeasurableselections,in NonlinearOperatorsandthe Calculusof Variations(SummerSchool,Univ. LibreBruxelles,Brussels,1975),J. P. Gossezet al., eds.,Springer, Berlin, 1976,pp. 157–207.LectureNotesin Math.,Vol. 543.

References 207

[130] R. T. Rockafellarand R. J.-B. Wets, Variational analysis, Springer-Verlag, Berlin,1998.

[131] E. W. Sachs,Broyden’s methodin Hilbert space, Math. Programming,35 (1986),pp.71–82.

[132] S.Scholtes,Introductionto piecewisedifferentiableequations, Habilitationsschrift,In-stitut fur StatistikundMathematischeWirtschaftstheorie,Universitat Karlsruhe,Karl-sruhe,Germany, 1994.

[133] A. Shapiro,On conceptsof directionaldifferentiability, J. Optim. TheoryAppl., 66(1990),pp.477–487.

[134] R. Temam, Navier–Stokes equations, North-Holland Publishing Co., Amsterdam,third ed.,1984.

[135] L. Thibault,Ongeneralizeddifferentialsandsubdifferentialsof Lipschitzvector-valuedfunctions, NonlinearAnal., 6 (1982),pp.1037–1053.

[136] P. L. Toint, Global convergenceof a classof trust-regionmethodsfor nonconvex mini-mizationin Hilbert space, IMA J.Numer. Anal., 8 (1988),pp.231–252.

[137] , Non-monotonetrust-region algorithmsfor nonlinear optimizationsubjecttoconvex constraints, Math.Programming,77 (1997),pp.69–94.

[138] F. Troltzsch,An SQPmethodfor the optimal control of a nonlinear heat equation,ControlCybernet.,23 (1994),pp.267–288.

[139] M. Ulbrich, SemismoothNewton methodsfor operator equationsin functionspaces,TechnicalReportTR00-11,Departmentof ComputationalandApplied Mathematics,Rice University, Houston,Texas77005-1892,2000.Acceptedfor publication(in re-visedform) in SIAM J.Optimization.

[140] , Non-monotonetrust-region methodsfor bound-constrainedsemismoothequa-tionswith applicationstononlinearmixedcomplementarityproblems, SIAM J.Optim.,11 (2001),pp.889–917.

[141] , On a nonsmoothNewton methodfor nonlinearcomplementarityproblemsinfunctionspacewith applicationsto optimalcontrol, in Complementarity:Applications,Algorithms and Extensions,M. C. Ferris,O. L. Mangasarian,and J.-S.Pang,eds.,Kluwer Acad.Publ.,Dordrecht,2001,pp.341–360.

[142] M. Ulbrich andS.Ulbrich, Non-monotonetrust region methodsfor nonlinearequalityconstrained optimizationwithout a penaltyfunction, TechnicalReport,Fakultat furMathematik,TechnischeUniversitat Munchen,80290Munchen,Germany, 2000.

[143] , Superlinearconvergenceof affine-scalinginterior-point Newton methodsforinfinite-dimensionalnonlinearproblemswith pointwisebounds, SIAM J.ControlOp-tim., 38 (2000),pp.1938–1984.

[144] M. Ulbrich, S. Ulbrich, andM. Heinkenschloss,Global convergenceof trust-regioninterior-point algorithmsfor infinite-dimensionalnonconvex minimizationsubjecttopointwisebounds, SIAM J.ControlOptim.,37 (1999),pp.731–764.

[145] P. Wesseling,An introductionto multigrid methods, JohnWiley & SonsLtd., Chich-ester, 1992.

[146] H. Xu, Set-valuedapproximationsandNewton’smethods, Math.Program.,84 (1999),pp.401–420.

[147] E. Zeidler, Nonlinear functionalanalysisand its applications.II/B, Springer-Verlag,New York, 1990.

[148] C. Zhu,R. H. Byrd,P. Lu, andJ.Nocedal,Algorithm778:L-BFGS-B:Fortransubrou-tinesfor large-scalebound-constrainedoptimization, ACM Trans.Math.Software,23(1997),pp.550–560.

[149] W. P. Ziemer, Weaklydifferentiablefunctions.Sobolev spacesandfunctionsof boundedvariation, Springer-Verlag,Berlin, 1989.

[150] J.Zowe andS.Kurcyusz,Regularity andstability for themathematicalprogrammingproblemin Banach spaces, Appl. Math.Optim.,5 (1979),pp.49–62.

Michael Ulbrich Nonsmooth Newton-like Methods for ... · Michael Ulbrich Nonsmooth Newton-like...

Documents

Transcript of Michael Ulbrich Nonsmooth Newton-like Methods for ... · Michael Ulbrich Nonsmooth Newton-like...