Jacobian-Free Newton-Krylov Methods Issues and Solutions

Jacobian-Free Newton-Krylov Methods:Issues and Solutions

David W. Zingg1 and Todd T. Chisholm2

1 University of Toronto Institute for Aerospace Studies, 4925 Dufferin St.,Toronto, ON M3H 5T6 Canada [email protected]

2 University of Toronto Institute for Aerospace Studies (currently at MDACorporation) [email protected]

1 Introduction

A Newton-Krylov method computes the solution of a system of nonlinear al-gebraic equations, often arising from a discretization of a system of partial dif-ferential equations, using an inexact-Newton method combined with a Krylovsubspace method for linear systems. Such methods are very efficient in a va-riety of applications, as discussed in the review paper by Knoll and Keyes [1].However, they have not gained widespread acceptance in computational fluiddynamics (CFD), i.e. in the numerical solution of the Reynolds-averaged com-pressible or incompressible Navier-Stokes equations. Considerable interest wasgenerated during the 1990’s [2, 3, 4, 5, 6, 7], primarily using the Krylov methodGMRES [8], but current interest is more limited [9, 10, 11, 12, 13, 14, 15].

One of the reasons for the limited popularity of Newton-Krylov methodsin CFD is that there are a number of subtle issues that can significantly de-grade or prevent the convergence of a Newton-Krylov algorithm. These can bedifficult to identify and may be responsible for unexpected failures of currentNewton-Krylov algorithms in some cases. Some are neither well understoodnor widely recognized. When these issues are successfully addressed, the re-sulting algorithm can be very efficient. For example, the Newton-Krylov ap-proach has been used very successfully by Nemec and Zingg [9, 11] in aerody-namic shape optimization, where algorithm speed and reliability are essential.The purpose of this paper is to present and discuss a number of issues thatcan impair the convergence and efficiency of Jacobian-free Newton-Krylovmethods as well as strategies for identifying and addressing them.

2 Jacobian-free Newton-Krylov Methods

The basic ideas behind Jacobian-free Newton-Krylov methods are straight-forward. Newton’s method is applied to the nonlinear system of equations

H. Deconinck, E. Dick (eds.), Computational Fluid Dynamics 2006,DOI 10.1007/978-3-540-92779-2 35, c© Springer-Verlag Berlin Heidelberg 2009

238 David W. Zingg and Todd T. Chisholm

arising from a spatial discretization of the steady flow equations. At eachinexact-Newton iteration, a linear system of equations must be solved to sometolerance. Certain iterative methods, such as GMRES, require only the prod-uct of the system Jacobian and a series of vectors; the actual Jacobian isnever needed. This permits a matrix-free approximation to the matrix-vectorproduct based on a finite-difference approximation to a Frechet derivative asfollows:

Av ≈ R(Q+ εv)−R(Q)ε

(1)

where R(Q) is the residual function, Q is the current solution vector, A isthe Jacobian based on a linearization about the current solution, v is anarbitrary vector, and ε is a small parameter. Generally, the conditioning ofthe Jacobian is such that preconditioning is required. An incomplete lower-upper (ILU) factorization of some approximation to the Jacobian can be aneffective preconditioner. Alternatively, another algorithm can be used as apreconditioner; this is usually not as effective as an ILU factorization butsaves memory. Development of efficient preconditioners remains an active areaof research. Finally, given that Newton’s method is not globally convergent,a pseudo-transient continuation method or equivalent must be implemented;the switched-evolution-relaxation (SER) strategy of Mulder and van Leer [16]is popular.

Unfortunately, this basic Newton-Krylov formulation can be extremely in-efficient unless certain key parameters and strategies are correctly chosen tomaximize speed and reliability with consideration of memory use. For exam-ple, the linear system tolerance should be selected to avoid oversolving asmuch as possible. The fill level permitted in the ILU factorization is anotherimportant parameter. The ILU(0) factorization, in which no fill is permitted,is popular but not particularly effective. The SER strategy introduces someparameters controlling the increase in the pseudo-time step as the residualdecreases. However, this strategy is also not optimal, as a spike in the resid-ual can lead to unnecessarily small time steps. Other areas that affect theefficiency of a Newton-Krylov algorithm are the approximate Jacobian usedto form the ILU factorization and the ordering of the variables. Generally, theapproximate Jacobian includes nearest-neighbour contributions only. Whenan upwind spatial discretization is used, it is natural to base the approximateJacobian on a first-order discretization. When a centered scheme is used withadded numerical dissipation, there is some flexibility in the nearest-neighbourJacobian approximation. Finally, several authors have shown that a reverse-Cuthill-McKee (RCM) reordering of the mesh nodes can improve efficiency,especially in conjunction with an ILU factorization.

Fortunately, the parameters and strategies described in the previous para-graph have been studied by many researchers, and, although the pseudo-transient stage remains a challenge, good choices are available. However, thereare three subtle issues that are rarely discussed: 1) the effect of the root node

Jacobian-Free Newton-Krylov Methods: Issues and Solutions 239

in the RCM reordering, 2) misleading convergence of the linear iterations,and 3) inaccurate Jacobian-free approximate matrix-vector products. Thesecan be difficult to identify and may be responsible for unexpected failures ofcurrent Newton-Krylov algorithms in some cases. These issues will be furtherdiscussed after a brief description of the present Newton-Krylov algorithmand some sample results.

3 Present Algorithm and Results

The present algorithm originated with the work of Pueyo and Zingg [6], whodeveloped a Newton-Krylov algorithm applicable to the computation of aero-dynamic flows on single-block structured meshes using scalar artificial dissi-pation and the Baldwin-Lomax turbulence model. The algorithm has sincebeen extended to incorporate matrix dissipation, the Spalart-Allmaras one-equation turbulence model, including trip terms for laminar-turbulent transi-tion, and multi-block meshes.

Our algorithm is based on a Jacobian-free inexact-Newton strategy usingILU-preconditioned GMRES, as described above. There are numerous param-eters involved, such as the tolerance to which the inner iterations are solved,the amount of fill permitted in the incomplete factorization, and a parameterused in forming the approximate Jacobian upon which the ILU factorizationis based. These have been carefully chosen based on numerous experiments tomaximize efficiency while maintaining robustness, such that a fixed set of pa-rameters can be used for a broad range of flow conditions, including complexflows over multi-element geometries. Considerable effort has been put into op-timizing the pseudo-transient phase. In the original algorithm of Pueyo andZingg [6], an approximate factorization algorithm was used initially. This hasbeen replaced by a Newton-Krylov pseudo-transient scheme with graduallyincreasing spatially-varying time steps. In particular, it is critical to maintainpositivity of the Spalart-Allmaras turbulence model variable, since the modelcan become unstable when negative values occur. The technique suggestedby Spalart and Allmaras [17] to maintain positivity involving M-matrices isincompatible with Newton-like convergence. We use an alternative approachbased on a separate local time step for the turbulence model designed to main-tain positivity during the pseudo-transient phase. Full details of the presentalgorithm are deferred to a forthcoming paper and thesis [18].

Two examples are shown in Fig. 1. Convergence of the error in the lift co-efficient3 is plotted as a function of both CPU time (top axis label) and equiv-alent function evaluations, i.e. CPU time divided by the time required for oneright-hand side evaluation (bottom axis label). The present results are labelled“NK” and, in order to provide a well-known reference, are contrasted withthe convergence of an approximate-factorization algorithm (labelled “AF”)

3 Defined as the difference between the current lift coefficient and the fully con-verged lift coefficient.


1e-08

1e-07

1e-06

1e-05

1e-04

0.001

0.01

0.1

1

0 200 400 600 800 1000 1200 1400

0 20 40 60 80 100

Cl e

rro

r

Equivalent RHS evaluations

CPU time - seconds

AFNK

(a) NACA 0012 airfoil, M∞ = 0.15,Re = 9× 106

1e-08

1e-07

1e-06

1e-05

1e-04

0.001

0.01

0.1

1

0 2000 4000 6000 8000 10000

0 500 1000 1500 2000

Cl e

rror

Equivalent RHS evaluations

CPU time - seconds

AFNK

(b) Three-element configuration,M∞ = 0.197, Re = 3.52× 106

Fig. 1. Sample convergence histories

driving the identical right-hand side to zero. The approximate-factorizationalgorithm is run as efficiently as possible, using the diagonal form and anoptimal local time step. Both solvers use the same spatial discretization andhence produce identical converged solutions. The CPU times are obtained onan Intel 2800 Pentium 4 desktop computer. For a subsonic turbulent flowover the NACA 0012 airfoil, the lift coefficient is fully converged in roughly40 seconds on a mesh with 17,385 nodes. The complex three-element airfoilflowfield converges in roughly 8 minutes on a mesh with 71,868 nodes. Bothmeshes have cells with aspect ratios greater than 10,000. Similar performanceis obtained for transonic flows. The equivalent function evaluations provide ameasure that is somewhat independent of hardware. For convergence to twoor more significant figures in lift coefficient, the Newton-Krylov algorithm ismuch faster than the approximate-factorization algorithm in both cases.

4 Issues and Solutions

4.1 RCM Root Node

The RCM reordering minimizes the bandwidth of a sparse matrix. It is gen-erally believed that this improves the accuracy of an ILU factorization byreducing the fill required for a complete lower-upper factorization. While thisis a reasonable conjecture, it does not fully explain the observations. For ex-ample, RCM has been shown to outperform algorithms designed specificallyto minimize fill. There is also no explanation for the sensitivity to the rootnode used for the RCM ordering. Fig. 2 shows the dependence of the CPUtime required for convergence of the single-element airfoil case described inthe preceding section for various root nodes lying on the outer boundary ofthe mesh. The “Root angle” is measured from the positive x-axis. Hence anangle of zero corresponds to a root node directly downstream of the airfoil,ninety degrees is above the airfoil, 180 degrees lies upstream, etc. A CPU time

Jacobian-Free Newton-Krylov Methods: Issues and Solutions 241

0

20

40

60

80

100

120

140

160

180

-200 -150 -100 -50 0 50 100 150 200

CP

U tim

e (

sec)

Root angle in degrees

Fig. 2. CPU time plotted vs. far-field root node location

of zero indicates nonconvergence. The figure shows that root nodes upstreamof the airfoil can lead to varied performance of the algorithm, while root nodeslocated above and below often lead to nonconvergence. It is clear that it isbest to select a root node directly downstream of the airfoil. This choice hasbeen used for the results shown previously.

4.2 Misleading Convergence of the Linear Iterations

Another problem arises if the magnitude of the residual of one equation (e.g.the turbulence model) is much larger than that of the others. The exit toler-ance for the linear solver can then be satisfied if the larger residuals are re-duced, even though the other equations have not converged sufficiently, thuspreventing convergence of the Newton iterations. This problem can be di-agnosed by examining the residual norms of each equation separately. Theequations can be suitably scaled such that the residuals are of the same order.We also perform a minimum of five inner iterations per outer iteration.

4.3 Inaccurate Jacobian-Free Approximate Matrix-VectorProducts

The choice of the parameter ε in (1) is a balance between round-off and trun-cation errors. Many authors use

ε =√εm||v||2

(2)

where εm is the value of machine zero. Problems arise when there exist largedifferences in the magnitudes of the entries in Q but not in v or vice versa.The former occurs when the variables are poorly scaled, the latter when theresidual equations are poorly scaled. Poor scaling of the variables can occurfor various reasons. In our case, the nondimensionalization of the turbulent


viscosity using the actual viscosity leads to values of the turbulence parameterthat greatly exceed the nondimensional mean-flow variables. It can then bedifficult or impossible to find a suitable value of ε that leads to sufficiently lowtruncation error and round-off error. This difficulty is particularly insidiousbecause the linear iterations appear to converge. However, since the matrix-vector products are not accurate, GMRES is converging to the solution ofa different linear problem. The problem can be diagnosed by recalculatingthe linear residual after the GMRES iterations have terminated. Note thatGMRES does not directly evaluate the residual at each iteration, but insteadfinds it indirectly, assuming that the matrix-vector products are accurate. Ifthe residual calculated subsequently differs from that reported by GMRES,then the Jacobian-free matrix-vector products are inaccurate. The solutionis to rescale both the variables and the residual equations on the fly so thatboth are well scaled. In addition, we use a value of ε that is two orders ofmagnitude greater than that given by (2) to reduce round-off error.

5 Conclusions

A number of subtle problems that can arise when Newton-Krylov algorithmsare applied to the Reynolds-averaged Navier-Stoes equations have been dis-cussed. Strategies have been presented for identifying and addressing theseissues.

References

1. Knoll, D.A., and Keyes, D.E., J. Comp. Phys., 193:357-397, 2004.2. Venkatakrishnan, V., and Mavriplis, D.J., J. Comp. Phys., 105:83-91, 1993.3. Degrez, G., and Issman, E., VKI Lecture Notes 1994-05, 1994.4. Barth, T.J., and Linton, S.W., AIAA Paper 95-0221, 1995.5. Nielsen, E.J., Anderson, W.K., Walters, R.W., and Keyes, D.E., AIAA Paper

95-1733, 1995.6. Pueyo, A., and Zingg, D.W., AIAA J., 36:1991-1997, 1998.7. Geuzaine, P., Lepot, I., Meers, F., and Essers, J.A., AIAA Paper 99-3341,1999.8. Saad, Y., and Schultz, M.H., SIAM J. Sci. Stat. Comput., 7:856-869, 1986.9. Nemec, M., and Zingg, D.W., AIAA J., 40:1146-1154, 2002.

10. Chisholm, T.T., and Zingg, D.W., AIAA Paper 2003-3708, 2003.11. Nemec, M., Zingg, D.W., and Pulliam, T.H., AIAA J., 42:1057-1065, 2004.12. Isono, S., and Zingg, D.W., AIAA Paper 2004-0433, 2004.13. Smith, T.M., Hooper, R.W., Ober, C.C., Lorber, A.A., and Shadid, J.N., AIAA

Paper 2004-0743, 2004.14. Luo, H., Baum, J.D., and Lohner, R., AIAA Paper 2004-1103, 2004.15. Nichols, J.C., and Zingg, D.W., AIAA Paper 2005-5230, 2005.16. Mulder, W.A., and van Leer, B., J. Comp. Phys., 59:232-246, 1985.17. Spalart, P., and Allmaras, S., AIAA Paper 92-0439, 1992.18. Chisholm, T.T., Ph.D. Thesis, University of Toronto Institute for Aerospace

Studies, 2006.

Jacobian-Free Newton-Krylov Methods Issues and Solutions

Documents

Transcript of Jacobian-Free Newton-Krylov Methods Issues and Solutions