J S I A Mjsiaml.jsiam.org/ebooks/JSIAMLetters_vol4-2012.pdf · Akira Imakura, Tetsuya Sakurai,...

$: J S I A Mjsiaml.jsiam.org/ebooks/JSIAMLetters_vol4-2012.pdf · Akira Imakura, Tetsuya Sakurai, Kohsuke Sumiyoshi, Hideo Matsufuru . JSIAM Letters Vol.4 (2012) pp.1{4 ⃝c 2012 Japan$
J S I A M

The Japan Society for Industrial and Applied Mathematics

Vol.4 (2012) pp.1-44


Vol.4 (2012) pp.1-44

Editorial Board

Chief Editor Hideyuki Azegami (Nagoya University)

Vice-Chief Editor Yoshimasa Nakamura (Kyoto University)

Secretary Editors Reiji Suda (University of Tokyo)

Kenji Shirota (Aichi Prefectural University)

Tomohiro Sogabe (Aichi Prefectural University)

Associate Editors Kazuo Kishimoto (Tsukuba University)

Satoshi Tsujimoto (Kyoto University)

Masashi Iwasaki (Kyoto Prefectural University)

Norikazu Saito (University of Tokyo)

Koh-ichi Nagao (Kanto Gakuin University)

Koichi Kato (Japan Institute for Pacific Studies)

Nagai Atsushi (Nihon University)

Takeshi Mandai (Osaka Electro-Communication University)

Ryuichi Ashino (Osaka Kyoiku University)

Tamotu Kinoshita (University of Tsukuba)

Yuzuru Sato (Hokkaido University)

Ken Umeno (Kyoto University)

Katsuhiro Nishinari (University of Tokyo)

Tetsu Yajima (Utsunomiya University)

Narimasa Sasa (Japan Atomic Energy Agency)

Fumiko Sugiyama (Kyoto University)

Hiroko Kitaoka (JSOL Corporation)

Hitoshi Imai (University of Tokushima)

Nobito Yamamoto (University of Electro-Communications)

Daisuke Furihata (Osaka University)

Takahiro Katagiri (The University of Tokyo)

Tetsuya Sakurai (University of Tsukuba)

Takayasu Matsuo (Tokyo University)

Yoshitaka Watanabe (Kyushu University)

Katsuhisa Ozaki (Shibaura Institute of Technology)

Kenta Kobayashi (Hitotsubashi University)

Takaaki Nara (University of Electro-Communications)

Takashi Suzuki (Osaka University)

Tetsuo Ichimori (Osaka Institute of Technology)

Tatsuo Oyama (National Graduate Institute for Policy Studies)

Eiji Katamine (Gifu National College of Technology)

Masami Hagiya (University of Tokyo)

Maki Yoshida (Osaka University)

Hideki Sakurada (NTT Communication Science Laboratories)

Naoyuki Ishimura (Hitotsubashi University)

Jiro Akahori (Ritsumeikan University)

Kiyomasa Narita (Kanagawa University)

Ken Nakamura (Tokyo Metropolitan University)

Miho Aoki (Shimane University)

Kazuto Matsuo (Kanagawa University)

Naoshi Nishimura (Kyoto University)

Hiromichi Itou (Gunma University)

Keiko Imai (Chuo University)

Ichiro Kataoka (Hitachi)

Shin-Ichi Nakano (Gunma University)

Akiyoshi Shioura (Tohoku University)

Contents

Elliptic theta function and the best constants of Sobolev-type inequalities ・・・ 1-4

Hiroyuki Yamagishi, Yoshinori Kametaka, Atsushi Nagai, Kohtaro Watanabe, Kazuo Takemura

A conservative compact finite difference scheme for the KdV equation ・・・ 5-8

Hiroki Kanazawa, Takayasu Matsuo, Takaharu Yaguchi

Algorithm for solving Jordan problem of block Schur form ・・・ 9-12

Takuya Matsumoto, Kenji Kudo, Yutaka Kuwajima, Takaomi Shigehara

A fast wavelet expansion technique for Vasicek multi-factor model of portfolio credit risk ・・・ 13-16

Kensuke Ishitani

Fourier estimation method applied to forward interest rates ・・・ 17-20

Nien-Lin Liu, Maria Elvira Mancino

An integer factoring algorithm based on elliptic divisibility sequences ・・・ 21-23

Naotoshi Sakurada, Junichi Yarimizu, Naoki Ogura, Shigenori Uchiyama

A modified Block IDR(s) method for computing high accuracy solutions ・・・ 25-28

Michihiro Naito, Hiroto Tadano, Tetsuya Sakurai

An alternating discrete variational derivative method for coupled partial differential

equations ・・・ 29-32

Hiroaki Kuramae, Takayasu Matsuo

The existence of solutions to topology optimization problems ・・・ 33-36

Satoshi Kaizu

An exhaustive search method to find all small solutions of a multivariate modular linear

equation ・・・ 37-40

Hui Zhang, Tsuyoshi Takagi

A parameter optimization technique for a weighted Jacobi-type preconditioner ・・・ 41-44

Akira Imakura, Tetsuya Sakurai, Kohsuke Sumiyoshi, Hideo Matsufuru

JSIAM Letters Vol.4 (2012) pp.1–4 c⃝2012 Japan Society for Industrial and Applied Mathematics

Elliptic theta function and the best constants

of Sobolev-type inequalities

Hiroyuki Yamagishi1, Yoshinori Kametaka2, Atsushi Nagai3, Kohtaro Watanabe4

and Kazuo Takemura3

1 Tokyo Metropolitan College of Industrial Technology, 1-10-40 Higashi-ooi, Shinagawa, Tokyo140-0011, Japan

2 Osaka University, 1-3 Machikaneyama-cho, Toyonaka 560-8531, Japan3 Nihon University, 2-11-1 Shinei, Narashino 275-8576, Japan4 National Defense Academy, 1-10-20 Yokosuka 239-8686, Japan

E-mail yamagisi s.metro-cit.ac.jp

Received November 12, 2011, Accepted December 9, 2011

Abstract

We obtained the best constants of Sobolev-type inequalities corresponding to higher-orderpartial differential operators L = (∂t−∆+a0) · · · (∂t−∆+aM−1) and L0 = (−∆+a0) · · · (−∆+aM−1) with positive distinct characteristic roots a0, . . . , aM−1, under the suitable assumptionon M and n. The best constants are given by L2-norm of Green’s functions of the boundaryvalue problem Lu = f(x, t) and L0u = f(x). The Green’s functions are expressed by theelliptic theta function.

Keywords Sobolev-type inequality, best constant, elliptic theta function, Green’s function

Research Activity Group Applied Integrable Systems

1. Conclusion

For M = 1, 2, 3, . . . , we assume 0 < a0 < a1 < · · · <aM−1. Let be x = (x1, . . . , xn) ∈ In, I = (0, 1), t ∈ R,l = (l1, . . . , ln) ∈ Zn. We consider higher-order partialdifferential operators L = P (∂t −∆) and L0 = P (−∆)with the characteristic polynomial

P (z) =M−1∏j=0

(z + aj).

For arbitrary bounded continuous function f(x, t) andf(x), let us consider the following boundary value prob-lems on an n-dimensional torus:

BVPLu = f(x, t) ((x, t) ∈ In ×R),u(x, t) has a period 1 with respect to xj ,

BVP0L0u = f(x) (x ∈ In),u(x) has a period 1 with respect to xj ,

which have a unique solution given by

u(x, t) =

∫In×R

G(x− y, t− s)f(y, s) dyds, (1)

u(x) =

∫InG0(x− y)f(y) dy, (2)

respectively. The Green’s functions G and G0 are givenby

G(x, t) = Y (t)e(t)g(x, t), (3)

G0(x) =

∫ ∞

0

e(t)g(x, t) dt. (4)

Y (t) = 1 (0 ≤ t < ∞), 0 (−∞ < t < 0) is the Heavi-

side step function. e(t) =∑M−1

j=0 bje−ajt, the coefficients

bj = 1/P ′(−aj) appear in the partial fraction expansion

P (z)−1 =∑M−1

j=0 bj(z + aj)−1. g(x, t) is given by

g(x, t) =

n∏j=1

h(xj , t),

h(xj , t) = ϑ3(xj ,√−1 4πt),

ϑ3(xj ,√−1 4πt) =

∑l∈Z

exp(−4π2l2t+√−1 2πlxj),

where ϑ3 is the elliptic theta function [1, Section 12].Throughout this paper, we use the following two kinds

of norms:

∥u∥2 =

∫In×R

|u(x, t)|2 dxdt,

∥u∥20 =

∫In|u(x)|2 dx.

The conclusions are as follows.

Theorem 1 We assume 1 ≤ n ≤ 4M − 3. For any u= u(x, t) which satisfies the condition that Lu is boundedcontinuous and has a period 1 with respect to xj (1 ≤j ≤ n), there exists a positive constant C which is inde-pendent of u, such that a Sobolev-type inequality(

sup(y,s)∈In×R

|u(y, s)|)2≤ C∥Lu∥2 (5)

– 1 –

JSIAM Letters Vol. 4 (2012) pp.1–4 Hiroyuki Yamagishi et al.

holds. Among such C, the best constant C(a) is given by

C(a) = ∥G∥2 =M−1∑j,k=0

∑l∈Zn

bj bkaj + ak + 8π2|l|2

. (6)

If one relpaces C by C(a) in the above inequality (5),the equality holds for u(x, t) = cU(x − x0, t − t0) witharbitrary c ∈ C and (x0, t0) ∈ In×R. U(x, t) is given by

U(x, t) =

∫In×R

G(x− y, t− s)G(−y,−s) dyds. (7)

Theorem 2 We assume 1 ≤ n ≤ 4M − 1. For any u= u(x) which satisfies the condition that L0u is boundedcontinuous and has a period 1 with respect to xj (1 ≤j ≤ n), there exists a positive constant C which is inde-pendent of u, such that a Sobolev-type inequality(

supy∈In|u(y)|

)2≤ C∥L0u∥20 (8)

holds. Among such C, the best constant C0(a) is given by

C0(a) = ∥G0∥20 =M−1∑j,k=0

∑l∈Zn

bj bk(aj + 4π2|l|2)(ak + 4π2|l|2)

.

(9)

If one relpaces C by C0(a) in the above inequality (8),the equality holds for u(x) = cU0(x− x0) with arbitraryc ∈ C and x0 ∈ In. U0(x) is given by

U0(x) =

∫InG0(x− y)G0(−y) dy. (10)

In the previous paper [2], we found Green’s functionfor ordinary differential operator P (d/dt) and its L2-norm. The L2-norm is the best constant of a Sobolev-type inequality, which estimates the square of supremumof absolute value of output voltage from above by thepower of input voltage. The purpose of this paper is toextend this result to partial differential operator withperiodic boundary condition. We find Green’s functionsfor L and L0. The values ∥G∥2 and ∥G0∥20 are the bestconstants of Sobolev-type inequalities corresponding toL and L0.

2. The properties of functions

In this section, we list important properties of thefunctions g(x, t) and e(t) defined in the previous section.From the properties of the elliptic theta function [1,

Section 12], we have∫I

h(x− y, t)h(y, s) dy = h(x, t+ s),

h(−x, t) = h(x, t), h(1− x, t) = h(x, t),

(0 < x < 1, −∞ < t <∞).

Using these properties, we have the following propertiesof g(x, t): ∫

Ing(x− y, t)g(y, s) dy = g(x, t+ s), (11)

g(−x, t) = g(x, t), (x, t) ∈ In ×R.

At first, the value g(0, t) is given by

g(0, t) =

n∏j=1

h(0, t) =

(∑l∈Z

exp(−4π2l2t)

)n

=n∏

j=1

∑lj∈Z

exp(−4π2l2j t)

=∑

l=(l1,...,ln)∈Zn

exp

(− 4π2

n∑j=1

l2j t

)

=∑l∈Zn

exp(−4π2|l|2t).

From these properties, we can calculate the L2-norm ofg(x, t) as∫

Ing2(x, t) dx =

∫Ing(x, t)g(−x, t) dx = g(0, 2t)

=∑l∈Zn

exp(−8π2|l|2t

). (12)

Here, for any fixed m = 1, 2, 3, . . . , the Euler-Maclaurinformula [3, Theorem 7.6]∫ m

0

u(x) dx

=1

2u(0) +

m−1∑k=1

u(k) +1

2u(m)

−[n/2]∑j=1

b2j(0)(u(2j−1)(m)− u(2j−1)(0))

+

∫ 1

0

(bn+1(z)− bn+1(0))

×m−1∑k=0

u(n+1)(k + 1− z) dz (13)

holds, where bj(x) are Bernoulli polynomials [3,4] givenby

b0(x) = 1,

b′j(x) = bj−1(x),∫ 1

0

bj(x) dx = 0 (j = 1, 2, 3, . . . ).

Putting u(x) = exp(−4π2x2t) in (13) and taking thelimit of m→∞, we have

∞∑l=1

u(l)

=

∫ ∞

0

u(x) dx− 1

2u(0)− 1

2u(∞)

+

[n/2]∑j=1

b2j(0)(u(2j−1)(∞)− u(2j−1)(0))

−∫ 1

0

(bn+1(z)− bn+1(0))∞∑k=0

u(n+1)(k + 1− z) dz

=1

4√πt−1/2 + · · · = O(t−1/2) (t→ +0).

– 2 –


Hence we have

g(0, t) =

(∑l∈Z

exp(−4π2l2t)

)n= O(t−n/2) (t→ +0).

(14)

Next we investigate e(t). Since e(t) is a linear combi-nation of e−ajt, we have P (d/dt)e(t) = 0. Moreover, wehave the following relation:

e(k)(0) =M−1∑j=0

bj (−aj)k = ( . . . , (−aj)k, . . . )

...bi...

= ( . . . , (−aj)k, . . . )

(−aj)i

−1

0...01

= −

∣∣∣∣∣∣∣∣∣∣0

(−aj)i...01

· · · (−aj)k · · · 0

∣∣∣∣∣∣∣∣∣∣/ ∣∣∣∣∣∣ (−aj)i

∣∣∣∣∣∣

=

∣∣∣∣∣∣∣∣∣∣(−aj)i

· · · (−aj)k · · ·

∣∣∣∣∣∣∣∣∣∣/ ∣∣∣∣∣∣ (−aj)i

∣∣∣∣∣∣ ,where we have used the relation b0

...bM−1

=

(−aj)i

−1

0...01

in the third equality. In the last determinants of thenumerator and denominator, i, j satisfies 0 ≤ i ≤M−2,0 ≤ j ≤M − 1 and 0 ≤ i, j ≤M − 1, respectively. So wehave

e(k)(0) =M−1∑j=0

bj(−aj)k =

0 (0 ≤ k ≤M − 2),1 (k =M − 1).

Using this relation, the Taylor expansion of e(t) is givenby

e(t) =∞∑j=0

1

j!e(j)(0)tj =

1

(M −1)!tM−1+

∞∑j=M

1

j!e(j)(0)tj.

From this relation, we have

e(t) = O(tM−1) (t→ +0). (15)

3. Green’s functions

We introduce the eigen function

φ(l, x) = exp(√−1 2π⟨l, x⟩) (l, x) ∈ Zn × In,

where ⟨l, x⟩ =∑n

j=1 ljxj , |x|2 = ⟨x, x⟩. For l,m ∈ Zn,φ(l, x) satisfies∫

Inφ(l, x)φ(m,x) dx =

1 (l = m),0 (l = m).

For arbitrary f(x, t) having a period 1 with respect toxj (j = 1, 2, . . . , n), we consider its Fourier transforma-tion

f(l, ω) =

∫In×R

e−√−1ωtf(x, t)φ(l, x) dxdt,

f(x, t) =1

2π

∫R

e√−1 tω

∑l∈Zn

f(l, ω)φ(l, x) dω.

Through the above relation, BVP is rewritten as

BVPP (√−1ω + 4π2|l|2)u(l, ω) = f(l, ω) (l, ω) ∈ Zn×R,

which is solved as

u(l, ω) = G(l, ω)f(l, ω),

G(l, ω) =1

P (√−1ω + 4π2|l|2)

.

u(x, t) is an inverse Fourier transformation of u(l, ω). SoBVP possesses a unique solution (1). G(x, t) is an inverse

Fourier transformation of G(l, ω). In fact,

G(l, ω) =

M−1∑j=0

bj1√

−1ω + 4π2|l|2 + aj

=M−1∑j=0

bj

∫ ∞

0

e−(√−1ω+4π2|l|2+aj)t dt

=

∫R

e−√−1ωtY (t) e(t) e−4π2|l|2t dt

=

∫In×R

e−√−1ωt(Y (t)e(t)g(x, t))φ(l, x) dxdt,

where we used the relation

e−4π2|l|2t =

∫Ing(x, t)φ(l, x) dx. (16)

Thus we have (3).For arbitrary f(x) having a period 1 with respect to xj

(j = 1, 2, . . . , n), we consider its Fourier transformation

f(l) =

∫Inf(x)φ(l, x) dx, f(x) =

∑l∈Zn

f(l)φ(l, x).

Then BVP0 is rewritten as

BVP0P (4π2|l|2)u(l) = f(l) (l ∈ Zn),

which is solved as

u(l) = G0(l) f(l), G0(l) =1

P (4π2|l|2).

u(x) is an inverse Fourier transformation of u(l). SoBVP0 possesses a unique solution (2). G0(x) is an in-

verse Fourier transformation of G0(l). In fact,

G0(l)

=M−1∑j=0

bj1

4π2|l|2 + aj=

M−1∑j=0

bj

∫ ∞

0

e−(4π2|l|2+aj)t dt

=

∫ ∞

0

e(t)e−4π2|l|2t dt

– 3 –


=

∫In

∫ ∞

0

e(t)g(x, t) dt φ(l, x) dx,

where we used (16). Thus we have (4).

4. Proof of Theorem 1

Using (3) and (12), we have

∥G∥2 =

∫ ∞

0

e2(t)g(0, 2t) dt. (17)

From (14) and (15), if 1 ≤ n ≤ 4M−3, then the integral(17) is finite. Eq. (6) is shown through direct calculationsfrom (17). Exchanging (x, t) and (y, s) in (1), we have

u(y, s) =

∫In×R

G(y − x, s− t)f(x, t) dxdt. (18)

Considering that∫In×R

|G(y − x, s− t)|2 dxdt = ∥G∥2

and applying the Schwarz inequality to (18), we have

|u(y, s)|2 ≤ ∥G∥2∥f∥2 = ∥G∥2∥Lu∥2.

Taking the supremum with respect to (y, s), we obtaina Sobolev-type inequality,(

sup(y,s)∈In×R

|u(y, s)|)2≤ ∥G∥2∥Lu∥2. (19)

Putting u(x, t) = U(x, t) in (7), we have(sup

(y,s)∈In×R

|U(y, s)|)2≤ ∥G∥2∥LU∥2 = ∥G∥4, (20)

where we used the relation LU = G(−y,−s). NotingU(0, 0) = ∥G∥2 and combining a trivial inequality

∥G∥4 = (U(0, 0))2 ≤(

sup(y,s)∈In×R

|U(y, s)|)2

and (20), we show the equality of (19). This completesthe proof of Theorem 1.

5. Proof of Theorem 2

Using (4) and (11), we have

∥G0∥20 =

∫ ∞

0

∫ ∞

0

e(t)e(s)g(0, t+ s) dtds. (21)

Here, e(t)e(s)g(0, t + s) → 0 (t, s → ∞) is obvious. Sowe will show∫ ∫

0≤t,s,t+s≤1

e(t)e(s)(t+ s)−n/2 dtds < +∞.

For any 0 ≤ t, s, t + s ≤ 1, because of (14) and (15),there exists a positive constant const. such that the in-equalities 0 ≤ e(t) ≤ const. tM−1, 0 ≤ g(0, t + s) ≤const. (t+ s)−n/2 hold. Hence, we have

0 ≤ e(t)e(s)g(0, t+ s) ≤ const. (ts)M−1(t+ s)−n/2

= const.

(t

t+ s

)M−1(s

t+ s

)M−1

(t+ s)2M−2−n/2

≤ const. (t+ s)(4M−n)/2−2. (22)

Using the relation (22) and change of variables τ = t+s,σ = t− s, we have

0 ≤∫ ∫

0≤t,s,t+s≤1

e(t)e(s)g(0, t+ s) dtds

≤ const.

∫ ∫0≤t,s,t+s≤1

(t+ s)(4M−n)/2−2 dtds

= const.1

2

∫ 1

0

∫ τ

−τ

τ (4M−n)/2−2 dσdτ

= const.

∫ 1

0

τ (4M−n)/2−1 dτ.

The last integral is finite if 4M −n > 0, that is 1 ≤ n ≤4M−1. Applying change of variables τ = t+s, σ = t−sin (21), we have

∥G0∥20 =1

2

∫ ∞

0

∫ τ

−τ

e

(τ + σ

2

)e

(τ − σ2

)dσ g(0, τ) dτ.

Eq. (9) is shown through direct calculations from (21).Exchanging x and y in (2), we have

u(y) =

∫InG0(y − x)f(x) dx. (23)

Considering that∫In|G0(y − x)|2dx = ∥G0∥20

and applying the Schwarz inequality to (23), we have

|u(y)|2 ≤ ∥G0∥20∥f∥20 = ∥G0∥20∥L0u∥20.

Taking the supremum with respect to y, we obtain aSobolev-type inequality,(

supy∈In|u(y)|

)2≤ ∥G0∥20∥L0u∥20. (24)

Putting u(x) = U0(x) in (10), we have(supy∈In|U0(y)|

)2≤ ∥G0∥20∥L0U0∥20 = ∥G0∥40, (25)

where we used the relation L0U0 = G0(−y). NotingU0(0) = ∥G0∥20 and combining a trivial inequality

∥G0∥40 = (U0(0))2 ≤

(supy∈In|U0(y)|

)2and (25), we show (24). This completes the proof of The-orem 2.

References

[1] S.Moriguchi, K.Udagawa and S.Hitotsumatsu, Iwanami Sug-aku Koshiki III (in Japanese), Iwanami, Tokyo, 1960.

[2] Y. Kametaka, K. Takemura, H. Yamagishi, A. Nagai and K.Watanabe, Heaviside cable, Thomson cable and the best con-stant of a Sobolev-type inequality, Sci. Math. Jpn. 68 (2008),63–79.

[3] T. Yamamoto, Suchikaisekinyumon (in Japanese), Saien-susha, Tokyo, 1976.

[4] H. Yamagishi, Y. Kametaka, A. Nagai, K. Watanabe and K.Takemura, Riemann zeta function and the best constants of

five series of Sobolev inequalities, RIMS Kokyuroku BessatsuB13 (2009), 125–139.

– 4 –


A conservative compact finite difference scheme

for the KdV equation

Hiroki Kanazawa1, Takayasu Matsuo1 and Takaharu Yaguchi2

1 The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan2 Kobe University, Rokkodai-cho 1-1, Nada-ku, Kobe 657-8501, Japan

E-mail hiroki kanazawa mist.i.u-tokyo.ac.jp

Received October 3, 2011, Accepted January 10, 2012

Abstract

We propose a new structure-preserving integrator for the Korteweg-de Vries (KdV) equation.In this integrator, two independent structure-preserving techniques are newly combined; the“discrete variational derivative method” for constructing invariants-preserving integrator, andthe “compact finite difference method” which is widely used in the area of numerical fluiddynamics for resolving wave propagation phenomena. Numerical experiments show that thenew integrator is in fact advantageous than the existing integrators.

Keywords discrete variational derivative method, compact finite difference method, con-servative scheme

Research Activity Group Scientific Computation and Numerical Analysis

1. Introduction

In this report, we consider the numerical integrationof the Korteweg-de Vries (KdV) equation

∂u

∂t+ u

∂u

∂x+∂3u

∂x3= 0 (1)

on the torus of length L (i.e. we assume the periodicboundary condition). It is an integrable soliton equationdescribing shallow water waves.For such an integrable equation, certain structure-

preserving numerical methods, for example the “discretevariational derivative method”(DVDM) [1], are gener-ally advantageous. In fact, Furihata [2] constructed aconservative scheme for KdV using DVDM, and con-firmed that it gave stable and qualitatively better nu-merical solutions.On the other hand, in the field of numerical fluid dy-

namics, it is a common practice to use the so-called“compact finite difference method” for wave equations,when correct wave behaviors are of importance; in themethod, a numerical scheme is constructed so that it re-tains as correct dispersion relation as possible while the“stencil” (width of a difference operator) is kept “com-pact” (i.e. narrow). For KdV, a compact finite differencescheme was tested in [3] to prove that it was actuallysuitable for the equation.Although the above two methodologies share the same

target (wave equations) and goal (better qualitative be-haviors), it seems that the challenges to combine themhave not been done actively so far, except for the sim-ple cases where only linear or quadratic invariants areof interest and thus conservation can be relatively easilyaccomplished without utilizing any structure-preservingmethods such as DVDM: we can find a number of “con-servative” compact finite difference schemes for the so-called “conservation laws” of the form ut + (f(u))x = 0

(which trivially preserve∫udx); as an example of the

quadratic cases, we refer to [4], where a Strang split-ting compact finite difference scheme for the nonlinearSchrodinger equation preserving

∫|u|2dx was proposed.

In more general cases, however, the help of the structure-preserving methods is indispensable. In this report, weshow taking the KdV as our example that the abovementioned two methodologies can be in fact combined.

2. Compact finite difference operators

Below the idea of compact finite difference operator issummarized based on Lele [5]. Given a smooth functionf(x), we approximate it by fi (i = 0, . . . , N − 1) on theequispaced mesh with the mesh size ∆x = L/N . Here-after we always assume the discrete periodic boundarycondition fi±N = fi, and also that the values outsidei = 0, . . . , N − 1 are periodically defined.Typical compact finite difference operators for ∂/∂x

are defined in the following form:

δ⟨1⟩c fi + α(δ⟨1⟩c fi+1 + δ⟨1⟩c fi−1) + β(δ⟨1⟩c fi+2 + δ⟨1⟩c fi−2)

= afi+1 − fi−1

2∆x+ b

fi+2 − fi−2

4∆x+ c

fi+3 − fi−3

6∆x, (2)

where α, β, a, b, and c are real constants that characterizethe compact finite difference operator. Note that, whenα = β = 0, the definition (2) simply means the standardcentral difference operators; for example, when

α = β = 0, a =3

2, b = −3

5, c =

1

10, (C6)

the sixth order (i.e. O(∆x6)) standard central finite dif-ference operator is recovered. Otherwise, the values of

the compact-differences δ⟨1⟩c fi are determined only im-

plicitly; a tri- (when β = 0) or penta-diagonal (other-wise) linear system should be solved to obtain the val-

– 5 –

JSIAM Letters Vol. 4 (2012) pp.5–8 Hiroki Kanazawa et al.

ues of δ⟨1⟩c fi (in this sense, the operator is global). An

example of the compact finite difference operator is

α =1

6, β = 0, a =

14

9, b =

1

9, c = 0, (T6)

which attains the same order of accuracy as (C6) (i.e.O(∆x6)), while referring to only three grid points (i, i±1). This is contrastive to (C6) which requires five pointsfor the accuracy. The name “compact” finite differencecomes from this property. Another interesting choice ofthe parameters is (in double precision)

α = 0.5381301488732363, β = 0.066633190123881123,

a = 1.367577724399269, b = 0.8234281701082790,

c = 0.018520783486686603, (S6)

which also attains O(∆x6). Since (S6) refers to fivepoints, it apparently does not seem “compact” in thepresent context. Still it is called so, due to the followingreason. (S6) refers to five grid points, both in the left andright hand side of (2). In this setting, the best attainableorder of accuracy is O(∆x10). The choice (S6), however,stays only at O(∆x6), and instead uses the remaining de-grees of freedom of the coefficients in order to replicatethe dispersion relation of waves as good as possible, prac-tically at quite close level to the so-called spectral differ-ence operator, which is purely a global operator involv-ing FFT (recall that (S6) only involves penta-diagonallinear system). Due to this, (S6) is called “spectral-like”(sixth order) compact finite difference operator (see [6,7]for the detail).Next, let us consider the skew-symmetry of the dif-

ference operators, which plays a crucial role in the sub-sequent section. As is well-known, the standard centraldifference operator (C6) is skew symmetric: for any N -periodic sequences fi, gi,

N−1∑i=0

fiδ⟨1⟩c gi∆x = −

N−1∑i=0

(δ⟨1⟩c fi)gi∆x (3)

(see [8] for a proof). (T6) and (S6) also enjoy this prop-erty.

Lemma 1 The compact finite difference operatorscharacterized by (T6) and (S6) are also skew symmetric.

Proof Let us write f = (f0, . . . , fN−1)⊤ and δ

⟨1⟩c f =

(δ⟨1⟩c f0, . . . , δ

⟨1⟩c fN−1)

⊤, and rewrite (2) in matrix-vector

form: Tδ⟨1⟩c f = Sf/∆x, where T and S are the co-

efficient matrices determined by (2). The matrix T isinvertible in (T6) and (S6). T and S are circulant ma-trices which means they are commutative. Furthermore,T is obviously symmetric (T⊤ = T ), and S skew sym-metric (S⊤ = −S). Gathering these facts, we conclude(T−1S)⊤ = S⊤T−⊤ = T−⊤S⊤ = −T−1S, which is thedesired skew symmetry.

(QED)

3. A conservative compact finite differ-

ence scheme for the KdV equation

Now we are in a position to demonstrate how we canconstruct a conservative scheme using the compact fi-nite difference operators. Let us consider KdV (1) as

our example. In what follows, we basically follow theprocedure of the discrete variational derivative method(DVDM) [1]. In this case, KdV (1) should be first rewrit-ten into the variational form

∂u

∂t=

∂

∂x

(−u

2

2− ∂2u

∂x2

)=

∂

∂x

δG

δu, G = −u

3

6+ux

2

2,

where G is the energy density function. Then it isstraightforward to show that KdV is conservative in thefollowing sense.

d

dt

∫ L

0

Gdx =

∫ L

0

δG

δuutdx =

∫ L

0

δG

δu

(∂

∂x

δG

δu

)dx = 0.

(4)

In the DVDM, we try to mimic the variational form indiscrete setting. Let us denote the approximate solutionby Um

i ≃ u(i∆x,m∆t) (∆t is the size of the time mesh).We also writeUm = (Um

0 , . . . , UmN−1)

⊤ to save space. Wethen commence by defining a discrete energy function

with δ⟨1⟩c :

(Gd(Um))i = −

1

6(Um

i )3 +1

2(δ⟨1⟩c Um

i )2.

There is a degree of freedom in this definition, but in thisshort report, we only consider this simplest case (see alsothe concluding remark below). Next, we define a discreteversion of the variational derivative by

δGd

δ(Um+1,Um)i=− (Um+1

i )2 + Um+1i Um

i + (Umi )2

6

− (δ⟨1⟩c )2(Um+1i + Um

i

2

), (5)

which is obviously an approximation to the true varia-tional derivative. It is an easy exercise to show the dis-crete derivative (5) satisfies

N−1∑i=0

(Gd(Um+1)i −Gd(U

m)i)∆x

=

N−1∑i=0

δGd

δ(Um+1,Um)i(Um+1

i − Umi )∆x. (6)

Finally, we define a scheme as follows: for m = 0, 1,2, . . . ,

Um+1i − Um

i

∆t= δ⟨1⟩c

δGd

δ(Um+1,Um)i(i = 0, . . . , N − 1).

(7)

The scheme is conservative in the following sense,which corresponds to (4).

Theorem 1 The solutions of the scheme (7) enjoy

N−1∑i=0

Gd(Um+1)i∆x =

N−1∑i=0

Gd(Um)i∆x

(m = 0, 1, 2, . . . ).

Proof From (6) and (7), we see

1

∆t

N−1∑i=0

(Gd(Um+1)i −Gd(U

m)i)∆x

– 6 –


=N−1∑i=0

δGd

δ(Um+1,Um)i

(Um+1i − Um

i

∆t

)∆x

=N−1∑i=0

δGd

δ(Um+1,Um)iδ⟨1⟩c

δGd

δ(Um+1,Um)i∆x

= 0. (8)

The last equality follows from the skew symmetry of δ⟨1⟩c

(Lemma 1).(QED)

We show a numerical example. We take L = 50,and employ the initial condition u(x, 0) = 3 sech2(0.5x)(strictly speaking, we truncate and place it on the torusat x = 0). Other parameters are set to ∆x = 0.5 (i.e.N = 100), ∆t = 1/50. Then scheme (7) is tested with

(C6) and (S6) as δ⟨1⟩c . Note that, as mentioned above,

(C6) is just a standard central difference operator, andin this case the conservation has been already provedin [8]. Since the scheme (7) is O(∆t2), we also try theHeun method as the time stepping applied to an ordi-nary differential equation

dU

dt= −U ∗ δ⟨1⟩c U − δ⟨1⟩c δ⟨2⟩c U , (9)

where U(t) = (u0(t), . . . , uN−1(t))⊤ is the semi-

discretization of u(x, t), and the symbol ∗ represents

the Hadamard product (the elementwise product). δ⟨2⟩c

is the difference operator for ∂2/∂x2, which is chosento the standard sixth order central difference operator

when δ⟨1⟩c is approximated by (C6), and to the sixth or-

der spectral-like compact finite difference operator when

δ⟨1⟩c is taken to (S6) (see [7] for the definition; we omitits description here due to the restriction of space. Seealso the concluding remark below). In summary, we testthe following four schemes:

• Heun method applied to (9) with (C6) as δ⟨1⟩c ,

• Heun method applied to (9) with (S6),

• Scheme (7) with (C6),

• Scheme (7) with (S6).

Only the last two are conservative.Fig. 1 shows the evolution of numerical solutions. The

results by the Heun method (the top two graphs) arecatastrophic. They obviously need much finer time meshfor stable computation. This instability can be also un-derstood from Fig. 2, which shows the energy evolutions;in the Heun schemes, the energies rapidly diverge, whichagrees with the severe instability. On the other hand, theresults by the scheme (7) successfully preserve the energyas planned (Fig. 2), and capture the soliton propagation

at satisfactory level in both cases of δ⟨1⟩c ((C6) and (S6),

the bottom two graphs in Fig. 1). This means that thespecial structure-preserving time stepping of scheme (7)is in fact advantageous than the generic Heun method.Next, let us have a closer look at the difference be-

tween (C6) and (S6) to see if the compact finite differ-ence operator (S6) is in fact advantageous than (C6). Inorder to see this, we try a coarser space mesh: the pa-rameters are set to L = 90, ∆x = 0.9 (i.e. N = 100),

010

2030

4050

0

50

100

0

1

2

3

x

t

u0

1020

3040

50

0

50

100

0

1

2

3

x

t

u

010

2030

4050

0

50

100

0

1

2

3

x

t

u

010

2030

4050

0

50

100

0

1

2

3

x

t

u

Fig. 1. Evolution of the soliton solution: (top) Heun+(9)+(C6),(2nd) Heun+(9)+(S6), (3rd) scheme (7)+(C6), (bottom)scheme (7)+(S6).

∆t = 1/40. The initial data is chosen to the same one asbefore. Fig. 3 is the magnified detail of the soliton pro-files by scheme (7) with (C6) (shown in red) and (S6)(in blue), around u = 0 at t = 10. In the figure, it canbe clearly observed that the result by (C6) exhibits un-desirable small oscillations in the right half of the spaceinterval, i.e., at the tail of the moving soliton. It shouldbe attributed to the fact that the standard central fi-nite difference operator (C6) does not preserve correct

– 7 –


0 20 40 60 80 1007.1

7.11

7.12

7.13

7.14

7.15

7.16

7.17

7.18

7.19

7.2

t

glo

bal

ener

gy

(7)ÊwithÊ(C6) (7)ÊwithÊ(S6)

Heun(9)ÊwithÊ(C6)

Heun(9)ÊwithÊ(S6)

Fig. 2. Evolution of the discrete energies.

0 10 20 30 40 50 60 70 80 90−0.2

0

0.2

0.4

0.6

0.8

x

u

Fig. 3. Soliton profile detail around u = 0 at t = 10; (red) detail

by (C6), (blue) by (S6).

dispersion relation. The result by (S6) gives a far betterresult, from which we conclude that the compact finitedifference method is really suitable for wave propagationphenomena.Wrapping up the above observations, we conclude

that the combination of the structure-preserving method(DVDM) and the compact finite difference method is astrong new integrator for KdV. It is expected that thecombination is also useful for other wave equations.

4. Further discussions

In this report, we showed that the so-called compactfinite difference method can be incorporated into thediscrete variational derivative method (DVDM) to con-struct conservative numerical scheme which well repli-cates the wave behaviors. The key was the skew symme-try of the compact finite difference operators. Althoughthis can be easily understood, as was shown in Lemma 1,the authors do not know any reference in which this factwas explicitly written. We also showed several numer-ical examples, which confirmed the effectiveness of theconservative compact finite difference scheme for KdV.We would like to make several remarks on this work.

Firstly, although in this report we concentrated mainlyon the compact finite difference operator (S6), the storycan be easily extended to other compact finite differenceoperators of the form (2) (and further general forms with

wider stencils).Secondly, notice that in the scheme (7) (with the dis-

crete variational derivative (5)), ∂3/∂x3 in KdV was

approximated by (δ⟨1⟩c )3. This is, however, obviously

not optimal. To understand this, let us first considerthe operator ∂2/∂x2. Usually, when the approximationis done by the standard finite differences (i.e. not bythe compact finite differences), the operator is approx-imated by δ⟨2⟩fi = (fi+1 − 2fi + fi−1)/∆x

2, insteadof using the product of the first order difference oper-ator δ⟨1⟩fi = (fi+1−fi−1)/(2∆x). This is absolutely thepreferable choice, because the first one has narrower andthus better stencil than the latter. Similarly, the oper-ator ∂3/∂x3 is usually approximated by δ⟨3⟩ = δ⟨1⟩δ⟨2⟩,instead of (δ⟨1⟩)3. In fact, in Furihata [2], δ⟨1⟩δ⟨2⟩ wasemployed in the conservative scheme for KdV. Gettingback to the compact finite difference case, we know thereare also compact finite difference operators for ∂2/∂x2,

noted by δ⟨2⟩c here, which are generally preferable than

(δ⟨1⟩c )2. Accordingly, ∂3/∂x3 in KdV should be δ

⟨1⟩c δ

⟨2⟩c ,

instead of (δ⟨1⟩c )3 in the scheme (7). This, however, seri-

ously complicates the situation, where we would need toreconstruct the system of the compact finite differenceoperators so that it fits more to DVDM.The above points will be discussed in detail in our

forthcoming paper [9] (see also [10]).

Acknowledgments

This work was partly supported by Grant-in-Aid forScientific Research (C) and for Young Scientists (B).

References

[1] D. Furihata and T. Matsuo, Discrete Variational Derivative

Method—A Structure-Preserving Numerical Method for Par-tial Differential Equations, CRC Press, Boca Raton, 2011.

[2] D. Furihata, Finite difference schemes for ∂u/∂t = (∂/∂x)α

(δG/δu) that inherit energy conservation or dissipation prop-

erty, J. Comput. Phys., 156 (1999), 181–205.[3] J. Li and M. R. Visbal, High-order compact schemes for non-

linear dispersive waves, J. Sci. Comput., 26 (2006), 1–23.[4] M. Dehghan and A. Taleei, A compact split-step finite dif-

ference method for solving the nonlinear Schrodinger equa-tions with constant and variable coefficients, Comput. Phys.Comm., 181 (2010), 43–51.

[5] S. K. Lele, Compact finite difference schemes with spectral-

like resolution, J. Comput. Phys., 103 (1992), 16–42.[6] T. Colonius and S. K. Lele, Computational aeroacoustics:

progress on nonlinear problems of sound generation, Prog.

Aerosp. Sci., 40 (2004), 345–416.[7] C. Lui and S. K. Lele, Direct numerical simulation of spatially

developing, compressible, turbulent mixing layers, AIAA Pa-per, 2001-0291 (2001).

[8] T. Matsuo, M. Sugihara, D. Furihata and M. Mori, Spatiallyaccurate dissipative or conservative finite difference schemesderived by the discrete variational method, Japan J. Indust.Appl. Math., 19 (2002), 311–330.

[9] H. Kanazawa, T.Matsuo and T. Yaguchi, Discrete variationalderivative method based on the compact finite differences (inJapanese), in preparation.

[10] H. Kanazawa, Application of the compact-difference method

to a structure-preserving numerical method (in Japanese),bachelor’s thesis, The Univ. of Tokyo, March 2011.

– 8 –


Algorithm for solving Jordan problem of block Schur form

Takuya Matsumoto1, Kenji Kudo2, Yutaka Kuwajima1 and Takaomi Shigehara1

1 Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama City, Saitama 338-8570, Japan

2 Infoscience Corporation, Infoscience Bldg., 2-4-1, Shibaura, Minato-ku, Tokyo 108-0023, Japan

E-mail sigehara mail.saitama-u.ac.jp

Received September 29, 2011, Accepted December 24, 2011

Abstract

We propose a numerical algorithm to compute a Jordan basis as well as the Jordan canonicalform for matrices of block Schur form (BSF). Combining it with the standard preprocessingwhich reduces a square matrix to BSF, we establish an efficient numerical algorithm only withunitary processes to reveal a full detail of the Jordan structure for an arbitrarily given squarematrix.

Keywords Jordan canonical form, Jordan basis, block Schur form

Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications

1. Introduction

It is one of the most difficult issues in numerical linearalgebra to reveal the Jordan structure for square matri-ces. It is mainly because the matrices under the consider-ation are ill-conditioned in general cases and the nonuni-tary processes are required to determine a full detail ofthe Jordan structure [1–4]. In [5], we proposed a new nu-merical algorithm (JBA) for the Jordan problem. JBAis based on unitary deflation [2], and it computes notonly the Jordan canonical form (JCF) but also a Jordanbasis (JB) by using only a successive unitary processesincluding singular value decomposition (SVD).JBA is based on the fact that for an n × n input

square matrix A, the Jordan structure of the general-ized eigenspace GA(λ) corresponding to the eigenvalueλ of A is completely determined by the Jordan struc-ture of GF ′(0) for F ′ = SI, where A − λEn = IS is adecomposition of A− λEn with injective and surjectivematrices I and S. Recursive applications of this featuremake it possible to construct a JB of GA(λ) only withunitary processes, since the IS decomposition is obtainedby SVD. However, JBA has the following problems to besettled; (a) The eigenvalues of A are just the inputs forJBA. Namely, the eigenvalues of A should be computedoutside the framework of JBA in advance. (b) SVD ofA− λEn is required for each eigenvalue λ of A. (c) JBArequires an automatic handling mechanism to controltiny singular values. Although it serves to enhance thenumerical accuracy drastically, it causes a considerableincrease of computational cost.To remedy them, we propose an efficient algorithm

which computes the JCF and a JB for matrices of blockSchur form (BSF) in this paper. Combining it with thestandard preprocessing by the QRmethod which reducesthe input matrix to BSF only with unitary processes,we establish an economical numerical algorithm (JBA-BSF) for the Jordan problem for any input square ma-trix. Numerical experiment shows that the introduction

of the reduction process to BSF serves not only to re-duce computational cost substantially, but also to keepthe numerical accuracy. Indeed, the handling mechanismof tiny singular values is not needed in the revised algo-rithm.

2. Notations and definitions

Denote the set of n-dimensional complex vectors andthe set of n1×n2 complex matrices by Cn and Cn1×n2 ,respectively. Denote the n-dimensional identity matrix

by En and the ith column vector of En by e(n)i .

Let λ be a complex constant. For A ∈ Cn×n, an or-dered sequence (x1, . . . ,xl) of l vectors in Cn with theproperty

(A− λEn)xk = xk−1 (k = 1, . . . , l), x0 ≡ 0 (1)

is called a Jordan sequence (JS) of length l associatedwith the eigenvalue λ of A. The set of JSs associatedwith the eigenvalue λ of A is denoted by JSA(λ). A setof JSs for A such that the vectors in the JSs compose abasis of Cn is called a Jordan basis (JB) for A. A Jordancell of size l associated with the eigenvalue λ is denotedby Jl(λ).For A ∈ Cn×n, the eigenspace and the generalized

eigenspace associated with an eigenvalue λ of A are de-noted by EA(λ) and GA(λ), respectively.Let A ∈ Cn×n be a singular matrix. A decomposition

A = IS with an injective I ∈ Cn×r and a surjectiveS ∈ Cr×n is called an IS decomposition of A, where r(< n) is the rank of A.

3. Theoretical aspects

3.1 Brief review of JBA

Let A ∈ Cn×n be an input matrix. Assume that Ahas m distinct eigenvalues λi (i = 1, . . . ,m ≤ n), whichare known in advance. For each eigenvalue λi, a JB ofGA(λi) is computed by the following procedure. Set F ≡A−λiEn, then F has the eigenvalue zero and hence the

– 9 –

JSIAM Letters Vol. 4 (2012) pp.9–12 Takuya Matsumoto et al.

rank r of F is less than n. By using an IS decompositionF = IS of F , define F ′ ≡ SI ∈ Cr×r.

Theorem 1 If GF ′(0) has a JB

(x′j;1, . . . ,x

′j;lj ) ∈ JSF ′(0) (j = 1, . . . , s),

then GF (0) = GA(λi) has a JB

(xj;1, . . . ,xj;lj ,xj;lj+1) ∈ JSF (0) (j = 1, . . . , s),

(xs+j;1) ∈ JSF (0) (j = 1, . . . , t).

Here xj;l ≡ Ix′j;l (j = 1, . . . , s; l = 1, . . . , lj), and

xj;lj+1 (j = 1, . . . , s) is a solution of the linear systemSxj;lj+1 = x′

j;lj, which is solvable in general. The vec-

tors xs+j;1 (j = 1, . . . , t) are chosen such that the vectorsxj;1 (j = 1, . . . , s+ t) are a basis of EF (0) = EA(λi).

A recursive usage of Theorem 1 makes it possible toconstruct a JB of GA(λi). More precisely, we repeat thedeflation process by F (k) = (F (k−1))′ (k = 1, 2, . . . , k′)with F (0) ≡ F until F (k′) is reduced to nonsingular(invertible). Then, F (k′−1) has a trivial Jordan struc-ture: GF (k′−1)(0) = EF (k′−1)(0). From this, we con-

struct the Jordan structures of F (k′−2), . . . , F (0) = Fsuccessively by Theorem 1. By applying this procedurefor all the eigenvalues of A, one can obtain a JB ofCn = ⊕m

i=1GA(λi) for A.In [5], an IS decomposition of F is obtained by set-

ting I = U ∈ Cn×r and S = DV ∗ ∈ Cr×n, whereF = UDV ∗ is SVD of F . In order to determine the rankr of F numerically, the tiny singular values less than athreshold value are set to zero. To keep the numerical ac-curacy, JBA has an automatic mechanism to move thethreshold value within a certain range which substan-tially covers the entire region of tiny singular values.

3.2 IS decomposition of BSF

Consider a matrix

B =

B1,1 B1,2 · · · B1,m

O B2,2 · · · B2,m

.... . .

. . ....

O · · · O Bm,m

∈ Cn×n (2)

of BSF, where each diagonal block Bi,i ∈ Cni×ni

(∑m

i=1 ni = n) has a single eigenvalue λi and the eigen-values λi (i = 1, . . . ,m) are distinct. The block size niof Bi,i is multiplicity of the eigenvalue λi of B. We set

n′i =∑i

j=1 nj (i = 1, . . . ,m).For each eigenvalue λi, denote F ≡ B − λiEn by

F =

F1,1 F1,2 · · · F1,m

O F2,2 · · · F2,m

.... . .

. . ....

O · · · O Fm,m

∈ Cn×n

in a block form. This means Fk,k = Bk,k − λiEnk(k =

1, . . . ,m) and Fk,l = Bk,l (1 ≤ k < l ≤ m). Note thatFi,i is singular, while Fj,j (j = i) is nonsingular.

Proposition 2 Let

A =

(A1,1 A1,2

O A2,2

)∈ Cn×n

be a matrix such that A1,1 ∈ Cn1×n1 is singular, while

A2,2 ∈ Cn2×n2 is nonsingular. Here n1 + n2 = n. Then

GA(0) ⊂ span(e(n)1 , . . . , e

(n)n1 ) holds.

Proof Obviously, the powers of A have a form

Ak =

(Ak

1,1 A(k)1,2

O Ak2,2

)(k = 1, 2, . . . )

with a certain matrix A(k)1,2 ∈ Cn1×n2 . Since Ak

2,2 is non-singular by assumption, any vector x ∈ Cn such that

Akx = 0 for some k satisfies x ∈ span(e(n)1 , . . . , e

(n)n1 ).

(QED)

Since the lower right (n−n′i)-dimensional part of F isnonsingular, Proposition 2 shows that a JB of GB(λi) =GF (0) is determined by the upper left n′i-dimensionalpart

Fi ≡

F1,1 F1,2 · · · F1,i

O F2,2 · · · F2,i

.... . .

. . ....

O · · · O Fi,i

∈ Cn′i×n′

i (3)

of F . If we know an IS decomposition of Fi,i,

Fi,i = Ii,iSi,i (Ii,i ∈ Cni×ri , Si,i ∈ Cri×ni), (4)

a possible IS decomposition Fi = IiSi of Fi is given by

Ii =

(En′

i−1O

O Ii,i

)∈ Cn′

i×n′′i ,

Si =

F1,1 · · · F1,i−1 F1,i

O. . .

......

.... . . Fi−1,i−1 Fi−1,i

O · · · O Si,i

∈ Cn′′i ×n′

i ,

where n′′i ≡ n′i−1 + ri < n′i. Thus, by Theorem 1, the

Jordan structure of GB(λi) is determined by examiningthe Jordan structure of the deflated matrix

F ′i ≡ SiIi ∈ Cn′′

i ×n′′i

=

F1,1 · · · F1,i−1 F1,iIi,i

O. . .

......

.... . . Fi−1,i−1 Fi−1,iIi,i

O · · · O Si,iIi,i

.

Note that the upper left n′i−1-dimensional part of F ′i is

exactly the same as that of Fi and thus we have only tomanage the last block-column to determine F ′

i from Fi.Incorporating this procedure into JBA algorithm, one

can construct an efficient algorithm to reveal the Jordanstructure of matrices of BSF.

3.3 Example

Consider the Jordan problem of a matrix of BSF;

B =

1 1 20 −6 40 −9 6

≡ ( B1,1 B1,2

O B2,2

).

Obviously, the upper left block B1,1 has a single eigen-

value λ1 = 1 and e(3)1 is a JB of GB(1) = EB(1). The

lower right block B2,2 has a single eigenvalue λ2 = 0 of

– 10 –


multiplicity two, and let us make clear the Jordan struc-ture of GB(0). We may set F2 = B, since λ2 = 0. Apossible IS decomposition of F2,2 = B2,2 is

F2,2 =

(−6 4−9 6

)=

(23

)(−3 2

)≡ I2,2S2,2,

leading to an IS decomposition

F2 =

(E1 OO I2,2

)(B1,1 B1,2

O S2,2

)

=

1 00 20 3

( 1 1 20 −3 2

)≡ I2S2

of F2. From this, we obtain the deflated matrix

F ′2 = S2I2 =

(1 80 0

).

Clearly, x′1 ≡ (−8, 1)T is a basis of EF ′

2(0). (A recur-

sive application of the above procedure to F ′2 leads to

F ′′2 ≡ (F ′

2)′ = (1) = B1,1. Namely, the lower right block

disappears and the recursive process terminates at thenext step.) Theorem 1 indicates that a JB for GB(0) =GF2(0) is given by a single JS (x1,x2) ∈ JSB(0) withx1 ≡ (−8, 2, 3)T and x2 ≡ (−9, 0, 1/2)T , where x1 =I2x

′1, and x2 is a solution of the linear system S2x2 =

x′1. We see from the above that P−1BP = J1(1)⊕J2(0)

with a nonsingular matrix P = (e(3)1 ,x1,x2).

4. Framework of JBA-BSF

We show the framework of an algorithm which com-putes the JCF and a JB of a given square matrix A viaBSF B of A.

JBA-BSFinput: A ∈ Cn×n.

1) Compute BSF B in (2) of A; B = Q∗AQ, whereQ is a unitary matrix. The eigenvalues λi of A areobtained from the corresponding diagonal block Bi,i

of B (i = 1, . . . ,m).

2) For each eigenvalue λi (i = 1, . . . ,m), repeat 2-1)–2-4).

2-1) Compute Fi in (3) and set F(0)i ≡ Fi.

2-2) For k = 1, . . . , k′, define F(k)i ≡ S

(k−1)i I

(k−1)i by

using an IS decomposition F(k−1)i = I

(k−1)i S

(k−1)i

of F(k−1)i . Here k′ is the minimum integer such

that the lower right block of F(k′)i becomes the

zero matrix.2-3) Compute a basis of E

F(k′)i

(0).

2-4) For k = k′ − 1, . . . , 0, construct a JB of GF

(k)i

(0)

successively by Theorem 1. The JB of GF

(0)i

(0)

gives a JB of GB(λi).

3) Construct the JCF J of B and a nonsingular matrixS such that B = SJS−1.

4) Compute P = QS.

output: JCF J of A and a nonsingular matrix P suchthat A = PJP−1.

Table 1. Experimental environment.

CPU Intel Core(TM)2 Duo E7400 2.80GHz

Memory 2GB

OS Windows Vista Home Basic

Compiler cygwin gcc version 3.4.4

LAPACK version 3.2.1-1

5. Numerical experiment

Numerical environment is summarized in Table 1. Inthe present implementation of JBA-BSF, we use ZGEES

routine in LAPACK to obtain BSF B of A in step 1).The routine reduces A to an upper-triangular form withthe eigenvalues µj (j = 1, . . . , n) of A as diagonal ele-ments, which are numerically distinct in general. To re-cover the Jordan structure of A, neighboring eigenvaluesare clustered as follows [5]:

1) Set Λ = µ1, . . . , µn and m = 0.

2) Repeat i)–iii) until Λ becomes the empty set.

i) Set m = m+ 1.ii) Let µ be the eigenvalue with the maximal abso-

lute value in Λ. Define

Λm ≡ µj ∈ Λ | |(µj − µ)/µ| < 10−β.

iii) Define µ′m by the average of the eigenvalues in

Λm. Set Λ = Λ− Λm.

We use the output µ′1, . . . , µ

′m as the input eigenvalues

for step 2). In numerics, we set β = 2, since some ofthe eigenvalues output from ZGEES routine often have arelative numerical error of this order.To obtain the IS decomposition of F

(k−1)i in step 2-2),

we need an IS decomposition of the lower right block,

say F(k−1)i,i , of F

(k−1)i as in (4). This is done by using

ZGESVD routine in LAPACK for SVD. Precisely, we set

I(k−1)i,i = U

(k−1)i,i and S

(k−1)i,i = D

(k−1)i,i V

(k−1)∗i,i by using

SVD F(k−1)i,i = U

(k−1)i,i D

(k−1)i,i V

(k−1)∗i,i . In numerics, tiny

singular values σ such that σ < εσmax are set to zero,where σmax is the largest singular value of Fi,i and ε isa cut-off parameter.Numerical tests are performed by using complex ma-

trices with a form

A = PJP−1 ∈ Cn×n, (5)

where n is a multiple of three and

J =3⊕

i=1

(n(i)⊕j=1

Jl(i)j

(λi)

)(6)

is a JCF with the eigenvalues (λ1, λ2, λ3) = (1, 1 +10−α, 10), each of which has multiplicity n/3. Here αis a nonnegative parameter and λ1, λ2 are nearly de-generate for large α. In (5), P is an invertible complexmatrix with the elements pkl = xkl+ iykl, where xkl andykl are uniform random numbers in the range [−1, 1].In (6), l

(i)j ∈ [1, 5] are random integers and n(i) satis-

fies∑n(i)

j=1 l(i)j = n/3. The JCF as well as a JB for A in

(5) is obvious by construction, and a comparison withnumerical results is easily made.

– 11 –


Let J ′ and P ′ be numerical results for J and P , re-spectively. Set

W1;2 ≡ GA(λ1)⊕GA(λ2), W3 ≡ GA(λ3).

Both of W1;2 and W3 are determined from P in (5).Numerical counterparts of W1;2 and W3 are denoted byW ′

1;2 and W ′3 respectively, that are determined from P ′.

We estimate two kinds of numerical errors:

E1 = ∥AP ′ − P ′J ′∥∞/∥AP ′∥∞,E2 = maxsin θW1;2,W ′

1;2, sin θW3,W ′

3,

where θV1,V2 is the largest canonical angle between thesubspaces V1 and V2 in general [6]. E1 measures whetherthe numerical JSs indeed satisfy the relation in (1) ornot, while E2 measures whether the numerical JB indeedspans the input generalized eigenspaces or not.Fig. 1 shows the results for JBA (left column) and

JBA-BSF (right column) in case of n = 30 and α =0, 4, 8, 12. The bar and solid line graphs show E1 andE2, respectively, for 50 examples. An automatic mecha-nism to move ε within a certain range is not introduced,but it is fixed at ε = 10−7 for α = 0, 8, 12, while ε = 10−3

for α = 4. In case of α = 4, 8, 12, the two neighboringeigenvalues λ1 and λ2 come close and they cannot beseparated from each other numerically. As a result, wehave two numerical eigenvalues µ′

1 ≃ 1 of multiplicity20 and µ′

2 ≃ 10 of multiplicity 10 for each input matrix.In case of α = 4, a relatively large cut-off parameterε = 10−3 is required, since |λ1 − λ2| = 10−4 has thesame order of magnitude. One can see from Fig. 1 thatJBA fails in reproducing the generalized eigenspaces forthe ill-conditioned cases (α = 4, 8, 12), due to using afixed cut-off parameter. Contrarily, JBA-BSF succeedsin reproducing the Jordan structure including the gen-eralized eigenspaces in all cases. The main reason whyJBA-BSF keeps numerical stability is that the computa-

tion of the kernel of F(k)i (k = k′, . . . , 0) required in steps

2-3) and 2-4) in JBA-BSF can be carried out by solvinga simple linear system with a BSF as a coefficient ma-trix. Contrarily, JBA needs a hard task such as SVD todetermine the kernel of the corresponding matrix and itcauses a considerable loss of numerical accuracy. Exper-iments so far have shown that JBA-BSF works well evenwith a fixed cut-off parameter, though its value shouldbe adjusted according to the distribution of the eigenval-ues as well as the accuracy of the numerical eigenvalues.Also JBA-BSF serves to reduce the numerical cost sig-

nificantly. Fig. 2 shows a comparison between JBA andJBA-BSF in execution time. Here the average executiontime among 100 examples in case of α = 0 is displayedfor each matrix size. It is worthy to stress that, contraryto JBA, JBA-BSF manages only the upper left part Fi

in (3) of F for each eigenvalue λi, and furthermore SVDis required only for the last diagonal block of the sub-matrix.

Acknowledgments

This work was partially supported by Grant-in-Aidfor Scientific Research (C) No. 19560058.

10-16

10-12

10-8

10-4

100

0 25 5010

-16

10-12

10-8

10-4

100

0 25 50

10-16

10-12

10-8

10-4

100

0 25 5010

-16

10-12

10-8

10-4

100

0 25 50

10-16

10-12

10-8

10-4

100

0 25 5010

-16

10-12

10-8

10-4

100

0 25 50

10-16

10-12

10-8

10-4

100

0 25 5010

-16

10-12

10-8

10-4

100

0 25 50

JBA, α = 0, ε =10 −7

JBA, α = 4, ε =10 −3

JBA, α = 8, ε =10 −7

JBA, α = 12, ε =10 −7

JBA-BSF, α = 0, ε =10 −7

JBA-BSF, α = 4, ε =10 −3

JBA-BSF, α = 8, ε =10 −7

JBA-BSF, α = 12, ε =10 −7

Fig. 1. E1 (bar graph) and E2 (solid line) for 50 examples for

JBA and JBA-BSF.

100

200

300

400

500

30 45 60 75 90 105 120

Tim

e [

ms]

Matrix size

JBAJBA-BSF

Fig. 2. Dependence of average execution time among 100 exam-ples on matrix size n. The input matrices have three eigenvalues1, 2 and 10, each of which has multiplicity n/3.

References

[1] V. N. Kublanovskaya, On a method of solving the completeeigenvalue problem for a degenerate matrix, USSR Comput.

Math. Math. Phys., 6 (1966) 1–14.[2] G.H.Golub and J.H.Wilkinson, Ill-conditioned eigensystems

and the computation of the Jordan canonical form, SIAMRev., 18 (1976) 578–619.

[3] B. Kagstrom and A. Ruhe, An algorithm for numerical com-putation of the Jordan normal form of a complex matrix,ACM Trans. Math. Software, 6 (1980) 398–419.

[4] D. S. Watkins, The Matrix Eigenvalue Problem, SIAM,

Philadelphia, 2007.[5] K. Kudo, Y. Kakinuma, K. Hiraoka, H. Hashiguchi, Y. Kuwa-

jima and T.Shigehara, Algorithm for computing Jordan basis,

JSIAM Letters, 2 (2010) 119–122.[6] G. W. Stewart, Matrix Algorithms, Vol. II: Eigensystems,

SIAM, Philadelphia, 2001.

– 12 –


A fast wavelet expansion technique for Vasicek

multi-factor model of portfolio credit risk

Kensuke Ishitani1

1 Mitsubishi UFJ Trust Investment Technology Institute Co., Ltd. (MTEC), 2-6, Akasaka 4-Chome, Minato-ku, Tokyo 107-0052 Japan

E-mail ishitani mtec-institute.co.jp

Received January 19, 2012, Accepted February 22, 2012

Abstract

This paper presents a new methodology to compute VaR in the portfolio credit loss model.The Wavelet Approximation can be useful to compute non-smooth distributions, often arisingin small or concentrated portfolios. We contribute to this technique by extending the WaveletApproximation for Vasicek one-factor model to multi-factor model. Key features of our newalgorithm are: (i) a finite series expansion of the wavelet scaling coefficients, (ii) Wynn’sepsilon-algorithm to accelerate convergence of those series, and (iii) an efficient spline inter-polation to calculate the Laplace transforms. We illustrate the effectiveness of our algorithmthrough numerical examples.

Keywords Vasicek multi-factor model, Haar wavelets, finite series expansion, Wynn’sepsilon-algorithm, spline interpolation method

Research Activity Group Mathematical Finance

1. Introduction

Credit Risk Models are usually classified as struc-tural or reduced-form models (see, for example, recentresearch [1]). In the present paper, we consider the struc-tural model called Vasicek multi-factor model.Let (Ω,F , P ) be a probability space. Consider a credit

portfolio consisting of N obligors. Define the exposureweight of obligor i by wi = Ei/

∑Nj=1Ej , where Ei is

the exposure of obligor i, and the probability of defaultpi. The Vasicek model assumes that the standardizedasset log-return Zi of obligor i is standard normallydistributed and that obligor i defaults when Zi is lessthan a pre-specified threshold Φ−1(pi), where Φ(x) isthe standard normal cumulative distribution functionand Φ−1(p) is its inverse. Therefore the default can bemodeled as a Bernoulli random variable Di such that

Di =

1, Zi ≤ Φ−1(pi),0, Zi > Φ−1(pi).

It follows that the portfolio loss is given by

L =N∑i=1

wiDi.

Let VaRα(L) be α-quantile of the loss distribution Ldefined by

VaRα(L) = inf x : P (L > x) ≤ 1− α . (1)

The modeling of the dependence structure amongcounterparties in the portfolio is simplified by the in-troduction of systematic risk factors Y = (Y1, . . . , YM ).For each obligor i, Zi is represented by standard nor-mally distributed systematic risk factor Y components

and an idiosyncratic noise component ϵi:

Zi = αi · Y +√

1− |αi|2 ϵi, (2)

where Y and ϵ1, . . . , ϵN are independent and normallydistributed and the parameter αi = (αi,s)

Ms=1 ∈ [0, 1)M

is called the loading vector of obligor i.Monte Carlo (MC) simulation is a standard method

for measuring the risk of a credit portfolio. However thismethod is very time-consuming when the size of theportfolio increases. For this reason, analytical or fast nu-merical techniques have been developed during the lastyears. One of such approach is to give an analytical ap-proximation of the Bromwich integral of the momentgenerating function [2, 3]. Another approach to numeri-cally invert the Laplace transform is studied by [4,5] viathe wavelet approximation (WA) method [6]. Under theVasicek one-factor model (M = 1), [4,5] shows accurateand fast results for a wide range of portfolios at veryhigh loss levels.In the present paper, we contribute to this techniques

by extending the WA method for Vasicek multi-factormodel (M ≥ 2).

2. Moment generating function (MGF)

of the portfolio loss

Recall that in the Vasicek model framework, if thesystematic risk factors Y are fixed, default occurs inde-pendently because the only remaining uncertainty is theidiosyncratic noise (ϵi)i. The MGF conditional on Y isthus given by the product of each obligor’s MGF as

ML(s;Y ) ≡ E(e−sL|Y ) =N∏i=1

E(e−swiDi |Y )

– 13 –

JSIAM Letters Vol. 4 (2012) pp.13–16 Kensuke Ishitani

=N∏i=1

(1− pi(Y ) + pi(Y )e−swi

),

where pi(y) is the probability of obligor i’s default con-ditional on a realization Y = y given by

pi(y) ≡ P (Zi ≤ Ti|Y = y) = Φ

(Ti − α · y√1− |αi|2

),

for y = (y1, . . . , yM ), and Ti ≡ Φ−1(pi).

Taking the expectation value of this conditional MGFyields the unconditional MGF:

ML(s) ≡ E(e−sL) = E(E(e−sL|Y ))

= E

[N∏i=1

(1− pi(Y ) + pi(Y )e−swi

)]

=

∫RM

1

(2π)M2 |Σ| 12

exp

(−1

2ytΣ−1y

)

×N∏i=1

(1− pi(y) + pi(y)e

−swi)dy, (3)

where Σ is the correlation matrix of a multivariate ran-dom vector Y = (Y1, . . . , YM ).

3. Wavelet approximation of cumulative

distribution functions (CDF)

Let K = L+η (η > 0) be a random variable, then theCDF FL(·) is equal to FK(·+η) and FK(x) = 0, x ∈ [0, η)holds. Let ϕ be the scaling function of Haar Wavelet

ϕ(x) =

1, if 0 ≤ x < 1,0, otherwise,

define ϕj,k(x)k∈Z by ϕj,k(x) = 2j/2ϕ(2jx−k), and de-fine a CDF Fm

K (x) =∑∞

k=0 cm,kϕm,k(x), where cm,k =

2m/2∫ (k+1)/2m

k/2mFK(y)dy. Recall that FK(·) is a mono-

tone nondecreasing function. We can show that for eachx ∈ Qd ≡ ∪m∈Nk/2m; k ∈ Z there exists some m0 ∈ Nsatisfying:

FmK (x) ≥ Fm+1

K (x) ≥ FK(x), (m ≥ m0),

FK(x) = limm→∞

FmK (x).

The sequence (FmK )m converges in distribution to FK ,

since Qd is a dense subset of [0,∞).The main idea of this section is to approximate the

mgf MK(·) of FK by the mgf MmK (·) of Fm

K [4, 6]. Inte-gration by parts on the integral in Mm

K gives

MK(s) ≈MmK (s) ≡

∫ ∞

0

e−sxdFmK (x)

= s

∫ ∞

0

e−sxFmK (x)dx

= (1− z)Pm

(exp

(− s

2m

))for s ∈ D ≡ s ∈ C; Re(s) ≥ 0 and m ≥ − log2(η),where Pm(z) ≡ 2m/2

∑∞k=0 cm,kz

k. The residue theorem

then gives an approximation of FK(x):

FK(x) ≈ FmK (x) = 2

m2 cm,k =

1

2πi

∫Cr

Pm(z)

zk+1dz

≈ 1

2πi

∫Cr

Pm(z)

zk+1dz, for x ∈

[k

2m,k + 1

2m

), (4)

where Pm(z) ≡ ML(−2m log z)e2mη log z

1− zand Cr ≡ z ∈ C; |z| = r (0 < r < 1).

Therefore, by using the change of variables for the inte-gral in (4), we have the following approximation

FK(x) ≈ FmK (x) ≡ 1

πrk

∫ π

0

Re(Pm(reiθ)e−ikθ)dθ (5)

for k/2m ≤ x < (k + 1)/2m.

4. A fast algorithm for wavelet coefficient

calculation

In the case of Vasicek one-factor model (M = 1), [4,5]approximate the integral of (5) using the ordinary trape-zoidal rule and compute the MGF (3) fast and accuratelyby using a Gauss-Hermite quadrature formula.In the case of Vasicek multi-factor model (M ≥ 2),

on the other hand, Monte Carlo Integration is one ofthe most accurate method to compute the MGF. Thus,in this paper, we use the Monte Carlo Integration tocompute the MGF (3). However, as is well known, MonteCarlo Integration is very time consuming. Therefore, weneed more efficient method to approximate the integralof (5) than previous method [4, 5] using the ordinarytrapezoidal rule.We introduce fast and efficient methods for calculating

the integral of (5) by using the convergence accelerationscheme (Wynn’s epsilon algorithm) and a cubic splineinterpolation.

5. Finite series expansion and Wynn’s

epsilon-algorithm

Let us consider a finite series expansion of (5) by

changing the scale of the variable θ = kθ:

FmK (x) =

2mx(m)k −1∑j=0

aj(x(m)k ), x ∈ [x

(m)k , x

(m)k+1), (6)

where aj(x) ≡eγx

π2mx

∫ (j+1)π

jπ

Re(I(m)x (θ)e−iθ)dθ,

I(m)x (θ) ≡

ML

(γ − i θ

x

)exp

(−η

(γ − i θ

x

))

1− exp(− γ

2m

)exp

(iθ

2mx

) ,

x(m)k ≡ k

2mand γ ≡ −2m log r.

The purpose of this finite series expansion is to ap-ply the Wynn’s epsilon-algorithm as a method for ac-celeration of convergence of a complex valued series.

– 14 –


Wynn’s epsilon-algorithm is the following nonlinear re-cursive scheme:

ϵnk+1 = ϵn+1k−1 +

1

ϵn+1k − ϵnk

, (7)

ϵn−1 = 0, ϵn0 = Sn,

where Sn is (n + 1)th partial sum of the series (6). Forq ∈ N, it is known that the converted sequence (ϵn2q)nconverges drastically faster than the original series. As-sume that S0, S1, . . . , Sn(ϵ)−1 (n(ϵ) < k) is given, thenwe can calculate

ϵ02q, ϵ12q, . . . , ϵ

n(ϵ)−2q−12q

by using the recursive equation (7) and approximate

FmK (x) by F

m,(ϵ)K (x) ≡ ϵ

n(ϵ)−2q−12q for k/2m ≤ x <

(k + 1)/2m. Therefore, the original CDF FK(x) can be

approximated by Fm,(ϵ)K (x):

FK(x) ≈ Fm,(ϵ)K (x), x ∈

[k

2m,k + 1

2m

).

In order to calculate FmK (x), we need to calculateML(γ−

iθ) for all θ ∈ [0, 2mπ). But on the other hand, in order to

calculate Fm,(ϵ)K (x), we only need to calculateML(γ−iθ)

for θ ∈ [0, 2mπn(ϵ)/k). We can thus expect that Wynn’sepsilon-algorithm is quite efficient in calculating the in-tegral of (5).

6. Cubic spline interpolation

It is required to calculate the moment generating func-tion ML(γ − iθ); θ ∈ [0, 2mπn(ϵ)/k0) in order to ob-

tain the CDF function Fm,(ϵ)K (x) for x ∈ [k0/2

m, 1). Ifa cubic spline interpolation method introduced below isused, the computation time of the moment generatingfunction can be shortened significantly.First, we calculate the MGF at grid points ∆ξ

∆ξ ≡ 0 = ξ0 < ξ1 < · · · < ξNX, (8)

where2mπn(ϵ)

k0< ξNX−1,

using the Monte Carlo methods:

ML(γ− iξi) ≈1

NI

NI∑k=1

ML(γ− iξi;Y (k)), i ≤ NX , (9)

where NI is the sample size of monte carlo integrationand Y (1), . . . , Y (NI) are samples from the probability dis-tribution of random variable Y = (Y1, . . . , YM ).For i = 1, . . . , NX − 1, we then find a quadratic func-

tion fi(θ) whose graph (θ, fi(θ)); θ ≥ 0 contains allthree data points (ξj ,ML(γ − iξj))i+1

j=i−1.Thus, we obtain the following cubic spline interpola-

tion

ML(γ − iθ) ≈ξi+1 − θξi+1 − ξi

fi(θ) +θ − ξiξi+1 − ξi

fi+1(θ)

for θ ∈ [ξi, ξi+1] (i ≥ 1) and quadratic spline interpola-tion ML(γ − iθ) ≈ f1(θ) for θ ∈ [ξ0, ξ1].Therefore, we can calculate the approximate MGFML(γ − iθ); θ ∈ [0, 2mπn(ϵ)/k0) and compute the

integral of∫ (j+1)π

jπRe(I

(m)x (θ)e−iθ)dθ, x ∈ [k0/2

m, 1),

j < n(ϵ), via the trapezoidal rule with a partition of

the interval [jπ, (j + 1)π] into N(j)T equal parts.

7. Computation of VaR

We use a numerical root-finding algorithm of bi-

section method to invert the CDF Fm,(ϵ)K (·). Suppose

Fm,(ϵ)K (k0/2

m) < α, then there exists j∗ such that

Fm,(ϵ)K

(j∗

2m

)≤ α < F

m,(ϵ)K

(j∗ + 1

2m

)and approximate the Value-at-Risk

VaRα(L) = VaRα(K)− η ≈ 2j∗ + 1

2m+1− η.

8. Numerical examples

In this section we illustrate the performance of ourmethod through examples. All experiments are per-formed on a personal computer, Intel(R) Core(TM) 2Duo CPU 3.00GHz, 2.00GB RAM.We assume that the number of systematic risk factors

is set as M = 33 which is the number of TOPIX sec-tor indices, each obligor i belong to an industrial sectorsec(i) ∈ 1, . . . , 33 and Zi is represented by a standardnormally distributed sectoral factor component Ysec(i)and an idiosyncratic noise component ϵi:

Zi =√ρi Ysec(i) +

√1− ρi ϵi. (10)

In this multi-sector model (10), we can use an efficientmonte carlo sampling algorithm [3] for the calculationof (9), whose computational time moderately dependson portfolio size. Note that efficient monte carlo sam-pling algorithms for the calculation of general multi-factor model (2) are not established. The reason is thatdiscretization method [3] for the fast MGF computationdoes not work effectively as for the multi-sector model.The factor correlations Σ = (Σs,t)

33s,t=1 are estimated

by the historical correlation of TOPIX sector index re-turns. We use monthly data of TOPIX sector indicesfrom January 2007 to December 2011.We consider eight sample portfolios, as shown in the

following table,

Portfolio N wi ρi

P1 1,000 C/i 0.2P2 1,000 C/i 0.25

P3 1,000 C/i 0.3P4 1,000 C/i 0.35P5 1,000 C/i 0.4P6 3,000 1/N 0.3

P7 10,000 C/i 0.3P8 30,000 C/i 0.3

where C is a positive constant such that∑N

i=1 wi = 1holds. The exposure distribution indicates that portfo-lios P1, P2, P3, P4, P5 and P7 are concentrated accord-ing to a power law distribution, whereas portfolio P6 iscompletely diversified.We consider a rating system with 20 ratings, and as-

sume their PD to be given by the following table.

– 15 –


ri 1 2 3 4 5

PDri 0.19% 0.21% 0.23% 0.25% 0.27%

ri 6 7 8 9 10

PDri 0.29% 0.30% 0.33% 0.35% 0.39%

ri 11 12 13 14 15

PDri 0.42% 0.5% 0.6% 0.9% 1.1%

ri 16 17 18 19 20

PDri 1.3% 1.73% 1.9% 3.0% 10.0%

PD for 20 ratings

In all portfolios, the rating r(i) ∈ 1, . . . , 20 of obligori is chosen with

r(i) ≡ i (mod 20)

holds for i = 1, . . . , N . Similarly to the rating allocationexplained above, the sector sec(i) ∈ 1, . . . , 33 and thecorrelation parameter ρi of obligor i is chosen such that

sec(i) ≡ i (mod 33)

holds for i = 1, . . . , N .We refer to all other parameters as algorithm param-

eters, which determine the performance of the numeri-cal approximation but do not affect the risk profile ofportfolios. These algorithm parameters are listed in thefollowing table.

Image resolution parameter m = 13

Parallel shift parameter η = 0.0003Real part of integration path γ = −0.25× log 10−14

Acceleration index q = 4

Truncation parameter n(ϵ) = 14Number of integration points NI = 1, 000, 000

Partition size of [jπ, (j + 1)π] N(j)T = 190, 000

The grid points (8) are set as

∆ξ = i199i1=0 ∪ 100 + 4× i299i2=0 .

We examine the performance of our method (WA)bycomputing VaR for sample portfolios. Monte Carlo Sim-ulation used as benchmarks are performed with 1 millionloss scenarios (MC).We present VaR0.999(L) measurement results obtained

by our method and Monte Carlo Simulation in the fol-lowing table,

Portfolio VaR(WA) VaR(MC) RE

P1 0.1454 0.1453 0.09%P2 0.1471 0.1473 0.16%

P3 0.1495 0.1499 0.27%P4 0.1531 0.1541 0.68%P5 0.1586 0.1594 0.53%P6 0.1001 0.0997 0.44%

P7 0.1228 0.1227 0.05%P8 0.1148 0.1154 0.55%

where the relative error (RE) is defined by

RE =|VaR(WA)−VaR(MC)|

VaR(MC).

All relative errors are smaller than 0.68%.We provide as well the computational time in min-

utes in the following table both for WA method and MCmethod.

N Portfolio Time(WA) Time(MC)

1,000 P1∼P5 3.2 min 5.3 min3,000 P6 3.3 min 17.9 min10,000 P7 3.5 min 51.7 min30,000 P8 3.9 min 158.3 min

WAmethod takes a few minutes even for a portfolio with30, 000 obligors. On the other hand, MC method requirescomputation time roughly proportional to portfolio size.We have shown the suitability of the fast wavelet ex-

pansion method using epsilon-algorithm and cubic splineinterpolation to measure the credit portfolio risk, basedon the multi-factor Vasicek model (M ≥ 2).

Acknowledgments

We thank Masaaki Otaka, Tetsu Kukita, HideyukiTanaka, Hiroshi Minaguchi, Takeshi Hirose, TakayoshiYamamoto, Yuki Tai and Mari Ozeki for helpful discus-sions and comments.

References

[1] S. Yamanaka, M. Sugihara and H. Nakagawa, Analysis ofcredit event impact with self-exciting intensity model, JSIAM

Letters, 3 (2011), 49–52.[2] P. Glasserman and J. Ruiz-Mata, Computing the credit loss

distribution in the Gaussian copula model: a comparison ofmethods, J. Credit Risk, 2 (2007), 33–66.

[3] Y. Takano and J. Hashiba, A Novel Methodology forCredit Portfolio Analysis: Numerical approximation ap-proach, Available at www.defaultrisk.com.

[4] J. J. Masdemont and L. O. Gracia, Haar wavelets-based ap-

proach for quantifying credit portfolio losses, QuantitativeFinance, DOI:10.1080/14697688.2011.595731.

[5] L. O. Gracia and J. J. Masdemont, Credit Risk Contributionsunder the Vasicek One-Factor Model : A fast wavelet expan-

sion approximation, Available at www.defaultrisk.com.[6] G. G. Walter, Wavelets and Other Orthogonal Systems With

Applications, CRC Press, Inc., Boca Raton, Florida, USA,

1994.

– 16 –

JSIAM Letters Vol.4 (2012) pp.17–20 c⃝2012 Japan Society for Industrial and Applied Mathematics J S I A MLetters

Fourier estimation method applied to forward interest

rates

Nien-Lin Liu1 and Maria Elvira Mancino2

1 Research Organization of Science and Engineering, Ritsumeikan University, 1-1-1 Nojihigashi,Kusatsu-shi, Shiga 525-8577, Japan

2 Department of Mathematical for Decisions, University of Florence, Via delle Pandette 32,50127, Firenze, Italy

E-mail nlt09001 fc.ritsumei.ac.jp

Received January 12, 2012, Accepted February 24, 2012

Abstract

Principal component analysis (PCA) is a general method to analyse the factors of the termstructure of interest rates. There are usually two or three factors. However, it is shown by Liuthat when we apply PCA to forward rates, not spot rates, we need more factors to explain 95%of variability. In order to verify the robustness of this result, we introduce another methodbased on Fourier series, which is proposed by Malliavin and Mancino. The results reconfirmthe observation of Liu with different data sets. In particular, the Fourier series method givesus similar results to PCA.

Keywords term structure of interest rates, principal component analysis, Fourier seriesmethod

Research Activity Group Mathematical Finance

1. Introduction

The use of principal component analysis (PCA) on theterm structure of interest rates is a common method toreduce the dimensionality of the vector space of the orig-inal variables. It is a well known result that three factorsare sufficient to explain most of the spot rate variability(see e.g. [1]). Nevertheless, the empirical results of [2]show that the number of factors for the forward rates ismuch greater than generally believed. Briefly speaking,for the 40 maturities of real market data in [2], we re-quire more than 20 factors in order to explain at least95% of variability.In order to verify the validity of PCA applied to the

term structure of interest rates, we introduce anothermethod which has been proposed by Malliavin and Man-cino [3] and has been developed by Barucci and Reno [4],Malliavin and Thalmaier [5], and Malliavin and Man-cino [6]. They have presented a method to compute thevolatility based on Fourier series. This method is non-parametric and can be applied to high-frequency finan-cial data. Thus, we apply this method to the term struc-ture of forward rates.In this paper, we will compute the volatility matrix

of the forward rates by using the Fourier estimationmethodology, and then compare this with the results ofapplying PCA. The organization of the present paper isas follows:First, we summarize the Fourier series methodology

presented by Malliavin and Mancino [3] in Section 2.Then, the method of estimating the volatility by usingthe Fourier series method will be given in Section 3. Nextwe perform a numerical study and give the results in Sec-tion 4. Finally, we summarize our findings in Section 5.

2. Fourier series method

We briefly recall the Fourier series method introducedby Malliavin and Mancino [3].Let X be a d-dimensional stochastic process defined

on a filtered probability space (Ω,F , (Ft), P ) given by

dXi(t) = µi(t)dt+Bi,j(t)dWj(t), 0 ≤ t ≤ T,

where W is a d1-dimensional standard Brownian mo-tion, µi is a d-dimensional drift process and Bi,j is aRd×d1 -valued cadlag volatility process, both of whichare adapted to (Ft).The volatility matrix Σ = (Σi,j)1≤i,j≤d of process X

is an adapted process defined by

Σi,j(t) =

d1∑k=1

Bi,k(t)Bj,k(t), 0 ≤ t ≤ T.

Suppose that T = 2π. Now we shall show how theFourier series method reconstructs Σ(t) for all t ∈(0, 2π). Let us denote the (random) Fourier coefficientsof “dXj”, j = 1, . . . , d by

ak(dXj) :=1

π

∫(0,2π)

cos(kt)dXj(t),

bk(dXj) :=1

π

∫(0,2π)

sin(kt)dXj(t).

The Fourier coefficients of each cross volatility Σi,j ,1 ≤ i, j ≤ d, are defined by

ak(Σi,j) =1

π

∫(0,2π)

cos(kt)Σi,j(t)dt,

bk(Σi,j) =1

π

∫(0,2π)

sin(kt)Σi,j(t)dt.

– 17 –

JSIAM Letters Vol. 4 (2012) pp.17–20 Nien-Lin Liu et al.

It follows from the Fourier-Fejer inversion formulathat we can reconstruct Σ from its Fourier coefficientsby

Σi,j(t) = limN→∞

N∑k=0

(1− k

N

)(ak(Σi,j) cos(kt)

+ bk(Σi,j) sin(kt)).

Based on the observations of X at times ti = 2πi/n,i = 0, . . . , n, and by fixing some positive integer N , wecan approximate Σ as follows:

(1) Fourier coefficients ak(dXj), bk(dXj), k = 0, . . . ,2N , are approximated by

ak(dXj) =1

π

n∑i=1

(cos(kti−1)− cos(kti)

)Xj(ti−1)

+1

π

(Xj(tn)−Xj(t0)

),

bk(dXj) =1

π

n∑i=1

(sin(kti−1)− sin(kti)

)Xj(ti−1).

(2) Fourier coefficients of each cross volatility Σi,j , 1 ≤i, j ≤ d, are approximated by

a0(Σi,j) =π

2(N + 1− n0)

N∑s=n0

(as(dXi)as(dXj)

+ bs(dXi)bs(dXj)),

ak(Σi,j) =π

N + 1− n0

N∑s=n0

(as(dXi)as+k(dXj)

+ as(dXj)as+k(dXi)),

for k = 1, 2, . . . , N , and

bk(Σi,j) =π

N + 1− n0

N∑s=n0

(as(dXi)bs+k(dXj)

+ as(dXj)bs+k(dXi)),

for k = 0, 1, . . . , N .

(3) The volatilities Σi,j(t) are approximated by

ΣN,ni,j (t) =

N∑k=0

(1− k

N

)(ak(Σi,j) cos(kt)

+ bk(Σi,j) sin(kt)). (1)

3. Method of estimating the volatility

Let rt(T ) denote the spot rate during the period [t, T ];i.e. the rate of the spot borrowing until the maturity T .The forward rate at t during the period [Ti, Ti+1] is givenby

Ft(Ti, Ti+1) =(Ti+1 − t)rt(Ti+1)− (Ti − t)rt(Ti)

Ti+1 − Ti(2)

=: Fi(t), i = 1, . . . , d.

We will write F(t) = (F1(t), . . . , Fd(t)).

Suppose that sample data of the forward rate curveF(t), t = t0, t1, . . . , tN are given. From the data, wecan calculate ∆F(1), . . . ,∆F(N), where for each l =1, . . . , N , ∆F(l) := F(tl) − F(tl−1) is a d-dimensionalvector (∆F1(l), . . . ,∆Fd(l)).

Remark 1 Assume that the forward rates process F(t)= (F1(t), . . . , Fd(t)) are Brownian semi-martingalesgiven by

dFi(t) =r∑

j=1

µi(t)dt+ σi,j(t)dWj(t), i = 1, . . . , d.

whereW = (W 1, . . . ,W r) is an r-dimensional Brownianmotion and σi,j and µi are adapted processes. Then wedefine the time-dependent volatility matrix by

Σi,j(t) =

r∑l=1

σi,l(t)σj,l(t).

According to [6, Theorem 3.4], under a suitable con-dition, the following convergence holds in probability

limn,N→∞

supt

∣∣∣ΣN,ni,j (t)− Σi,j(t)

∣∣∣= 0.

4. Numerical study

We use time series of American zero rates andJapanese zero rates from May 2005 to May 2008 andfrom June 2005 to June 2008 respectively, where foreach, we have a total of 777 and 723 daily observations.The maturities of the American zero rates are

T1 = 2009/5/15,T2 = 2010/5/15,T3 = 2011/5/15,...T13 = 2021/5/15,

and the maturities of the Japanese zero rates are

T1 = 2009/6/20,T2 = 2010/6/20,T3 = 2011/6/20,...T13 = 2021/6/20.

We use this data to calculate the forward rates by for-mula (2). Here we have HA = 777 observations forAmerican data and HJ = 723 observations for Japanesedata. We follow the steps introduced previously to ap-proximate the volatility matrix using the Fourier se-ries method. We calculate the Fourier coefficients byN1 = H/2 points and estimate the Fourier coefficients ofcross volatility by N2 = H/4. We smooth the Fejer ker-nel in (1) by replacing (1−k/N) with sin2(δk)/(δk)2 forsome appropriate parameter δ > 0. In this study, we useδA = 2π/259 and δJ = 2π/241 for American data andJapanese data, respectively. We use both the Fourier se-ries method and PCA to analyze the interest rates andthe results are as follows:

4.1 Analysis of spot rates

In order to compare our results with general beliefs, inthis section we perform empirical studies on both meth-

– 18 –


0 100 200 300 400 500 600 700 800

0.8

1

Percentage of variance explained by the first eigenvalue

0 100 200 300 400 500 600 700 8000.8

0.9

1

Percentage of variance explained by the first two eigenvalues

0 100 200 300 400 500 600 700 8000.9

0.951

Percentage of variance explained by the first three eigenvalues

Fig. 1. Percentage of variance explained by the first three eigen-

values as a function of time for American spot rate.

0 5 100

0.02

0.04

Eigenvalues0 5 10

0.92

0.94

0.96

0.98

1

Cumulative proportion

0 2 4 6 8 10 12 14-0.5

0

0.5

Eigenvectors of first three factors

Fig. 2. Estimated eigenvalues using PCA, the proportion of con-

tributions of principle component and eigenvectors of first threefactors for American spot rate.

ods to analyse the spot rate.

4.1.1 American interest rate data

We analyse the Fourier series method and PCA each,to see what will happen to the American spot rates. Fig.1 shows the result of the Fourier series method, that is,the percentage described by the first three eigenvaluesas a function of time during the trading days. Fig. 2shows the result of PCA. In the Fourier series method,the first eigenvalue can almost describe spectrum morethan 98% except for a few points. It is similar to theresult of PCA, where one eigenvalue can describe 92%and two eigenvalues can describe over 95% of variability.

4.1.2 Japanese interest rate data

The results for Japanese spot rates are organized simi-larly. Fig. 3 shows the result of the Fourier series methodand Fig. 4 shows the result of PCA. Both of the resultsare similar to those in the previous section. The firsteigenvalue excluding a few points can describe spectrummore than 98% in the Fourier series method, and oneeigenvalue can describe around 94% of the variability inPCA.

4.2 Analysis of forward rates

Now we are going to explain the empirical studies ofboth methods to analyse the forward rate.

4.2.1 American interest rate data

First we analyse the Fourier series method. Fig. 5shows the percentage described by the first three eigen-values. As you see, the first eigenvalue can only describespectrum of the volatility matrix from 30% to 60%. Even

0 100 200 300 400 500 600 700 8000.5

1


0 100 200 300 400 500 600 700 8000.8

0.9

1


0 100 200 300 400 500 600 700 8000.9

0.95

1



values as a function of time for Japanese spot rate.

0 5 100

0.005

0.01

Eigenvalues 0 5 10

0.94

0.96

0.98

1


0 2 4 6 8 10 12 14-0.5

0

0.5



tributions of principle component and eigenvectors of first threefactors for Japanese spot rate.

if we consider three eigenvalues, it is still not significantas three eigenvalues only describe from 70% to 90%. Thisresult is similar to [2]. We need more factors to explainmost of the forward rate variability. We note that if wewant to describe spectrum more than 95%, we need atleast the first six eigenvalues. We also apply PCA to theterm structure of American forward rates and the resultis shown in Fig. 6. Three eigenvalues can only describe50%.

4.2.2 Japanese interest rate data

We use the same arrangement as in the previous sec-tion. Fig. 7 shows the percentage described by the firstthree eigenvalues. The first eigenvalue can only describespectrum of the volatility matrix from 30% to 80%. Fur-thermore, three eigenvalues only describe from 70% to90%. If we want to describe spectrum of more than 95%,we need at least the first six eigenvalues. The result ofapplying PCA to the term structure of Japanese forwardrates is shown in Fig. 8 and three eigenvalues describe70%.

Remark 2 There is no significant difference betweenthe Fourier series method and PCA applied to theJapanese forward rate, while the two methods do not givea very close result for the American forward rate.

5. Conclusions

In this paper we applied two methods to the termstructure of interest rates. The numerical studies showthat the results of [2] are reconfirmed with different datasets and different methods. In short, if we want to ex-

– 19 –


0 100 200 300 400 500 600 700 800

0.5

1


0 100 200 300 400 500 600 700 800

0.6

0.8

1


0 100 200 300 400 500 600 700 8000.70.80.9



values as a function of time for American forward rate.

0 5 100

0.01

0.02

0.03

Eigenvalues 0 5 10

0.2

0.4

0.6

0.8


0 2 4 6 8 10 12

-0.5

0

0.5



tributions of principle component and eigenvectors of first threefactors for American forward rate.

plain up to 95% of forward rate variability, both methodsseems to require strictly more than three eigenvalues,while a few eigenvalues are sufficient for spot rates.In future work, we plan to generate a different estima-

tor by using the Fourier series method to verify if thisresult is robust or not.

References

[1] R. Litterman and J. Scheinkman, Common factors affectingbond returns, J. Fixed Income, 1 (1991), 54–61.

[2] N. L. Liu, A comparative study of principal component anal-ysis on term structure of interest rates, JSIAM Letters, 2(2010), 57–60.

[3] P. Malliavin and M. E. Mancino, Fourier series method for

measurement of multivariate volatilities, Financ. Stoch., 6(2002), 49–61.

[4] E. Barucci and R. Reno, On measuring volatility of diffusionprocesses with high frequency data, Econ. Lett., 74 (2002),

371–378.[5] P. Malliavin and A. Thalmaier, Stochastic Calculus of Vari-

ations in Mathematical Finance, Springer Finance, Berlin,

2006.[6] P. Malliavin and M. E. Mancino, A Fourier transform method

for nonparametric estimation of multivariate volatility, Ann.Statist., 37 (2009), 1983–2010.

0 100 200 300 400 500 600 700 800

0.5

1


0 100 200 300 400 500 600 700 8000.5

1


0 100 200 300 400 500 600 700 800

0.8

1



values as a function of time for Japanese forward rate.

0 5 100

0.005

0.01

Eigenvalues 0 5 10

0.4

0.6

0.8

1


0 2 4 6 8 10 12-1

-0.5

0

0.5



tributions of principle component and eigenvectors of first threefactors for Japanese forward rate.

– 20 –


An integer factoring algorithm based on elliptic divisibility

sequences

Naotoshi Sakurada1, Junichi Yarimizu1, Naoki Ogura1 and Shigenori Uchiyama1

1 Tokyo Metropolitan University, Tokyo 192-0397, Japan

E-mail sakurada-naotoshi ed.tmu.ac.jp

Received February 22, 2012, Accepted March 21, 2012

Abstract

In 1948, Ward defined elliptic divisibility sequences satisfying a certain recurrence relation.An elliptic divisibility sequence arises from any choice of elliptic curve and initial point onthat curve. In this paper, we propose a factorization algorithm based on elliptic divisibilitysequences. We then discuss our implementations of the algorithm and its optimization, andestimate the computational complexity.

Keywords elliptic divisibility sequence, elliptic curve, elliptic curve method

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

In 1948, Ward defined the concept of an elliptic divis-ibility sequence (EDS for short) [1]. This is a sequenceof integers, satisfying a certain divisibility property anda non-linear recurrence relation, which is related to adivision polynomial. Ward’s results on the arithmetic ofEDSs are summarized in [1]. In this paper, we propose anew integer factoring algorithm based on using the prop-erties of EDSs. We discuss the algorithm, its speed-up,and its computation.In Section 2, we begin with an introduction to EDSs

and how to calculate general terms of EDSs. Also weverify a speed-up of EDSs. In Section 3, we introducethe proposed integer factoring algorithm (we call thisEDS method algorithm), and we explain why the EDSmethod succeeds in finding a divisor. Then, we estimatecomputational complexity, implement it and compareEDS method with the elliptic curve method (ECM forshort). Our conclusion is presented in Section 4.

2. Elliptic divisibility sequences

In this section, we briefly review EDSs. See [1–3] forthe detail.

2.1 Elliptic divisibility sequences

Let us begin the definition of an EDS.

Definition 1 ([1]) An EDS (hn) is a sequence of in-tegers satisfying:

hm+nhm−n = hm+1hm−1h2n − hn+1hn−1h

2m

(∀m,n ∈ Z)

and the divisibility property that hn divides hm whenevern divides m.

EDSs satisfy a relation which division polynomials ofelliptic curves have. We now recall two theorems relatedto the integer factoring algorithm:

Theorem 2 ([2]) If (hn) is a non-trivial EDS, thenh0 = 0, h1 = ±1, and h−n = −hn (∀n ∈ Z) .This theorem means that we only need to consider

positive subscript terms of EDS with h1 = 1; we assumethis throughout this paper.

Theorem 3 ([3]) If the initial five terms h0, h1, h2,h3, h4 of an EDS (hn) are known, then the whole se-quence is well defined.

Since we always have h0 = 0, h1 = 1, this is equivalentto knowing the three terms h2, h3, h4.

2.2 Calculating a general term

It is then important to know how to calculate a generalterm of an EDS defined by the three terms h2, h3, h4. Forthis purpose, we use the recurrence relations below:

Definition 4 By Definition 1, we have the recurrencerelations for all k ∈ Z:• h2k+1h1 = hk+2h

3k − hk−1h

3k+1.

• h2kh2 = hk(hk+2h2k−1 − hk−2h

2k+1).

These formulae are called the doubling formulae.

Let us also consider the following set:

Definition 5 For an EDS (hn), we define ⟨⟨hi⟩⟩ to bethe set of 8 consecutive terms (hi) centered of hi:

⟨⟨hi⟩⟩ := hi−3, hi−2, . . . , hi+3, hi+4.

It is possible to calculate ⟨⟨h2i⟩⟩ and ⟨⟨h2i+1⟩⟩ from⟨⟨hi⟩⟩ using the doubling formulae. In fact, for the caseof calculating ⟨⟨h2i⟩⟩ from ⟨⟨hi⟩⟩, we see that:

h2i+4 = hi+2(hi+4h2i+1 − hih2i+3)h

−12 ,

h2i+3 = hi+3h3i+1 − hih3i+2,

h2i+2 = hi+1(hi+3h2i − hi−1h

2i+2)h

−12 ,

h2i+1 = hi+2h3i − hi−1h

3i+1,

h2i = hi(hi+2h2i−1 − hi−2h

2i+1)h

−12 ,

h2i−1 = hi+1h3i−1 − hi−2h

3i ,

h2i−2 = hi−1(hi+1h2i−2 − hi−3h

2i )h

−12 ,

h2i−3 = hih3i−2 − hi−3h

3i−1,

– 21 –

JSIAM Letters Vol. 4 (2012) pp.21–23 Naotoshi Sakurada et al.

and for the case of calculating ⟨⟨h2i+1⟩⟩ from ⟨⟨hi⟩⟩, weobtain:

h2i+5 = hi+4h3i+2 − hi+1h

3i+3,

h2i+4 = hi+2(hi+4h2i+1 − hih2i+3)h

−12 ,

h2i+3 = hi+3h3i+1 − hih3i+2,

h2i+2 = hi+1(hi+3h2i − hi−1h

2i+2)h

−12 ,

h2i+1 = hi+2h3i − hi−1h

3i+1,

h2i = hi(hi+2h2i−1 − hi−2h

2i+1)h

−12 ,

h2i−1 = hi+1h3i−1 − hi−2h

3i ,

h2i−2 = hi−1(hi+1h2i−2 − hi−3h

2i )h

−12 .

According to the above, we need 48 multiplications forcalculating ⟨⟨h2i⟩⟩ or ⟨⟨h2i+1⟩⟩ from ⟨⟨hi⟩⟩. We call thisa naive calculation.It is already indicated in [3] how to calculate an EDS.

However, the calculating algorithm and the consecutiveterms ⟨hi⟩ in [3] are slightly different from ours.

2.3 Speeding up

We discuss speeding up the naive calculation. We de-fine the following quantities for speeding up:

a1 = hi−3hi−1, b1 = h2i−2,a2 = hi−2hi, b2 = h2i−1,a3 = hi−1hi+1, b3 = h2i ,a4 = hihi+2, b4 = h2i+1,a5 = hi+1hi+3, b5 = h2i+2,a6 = hi+2hi+4, b6 = h2i+3,

then calculating ⟨⟨h2i⟩⟩ from ⟨⟨hi⟩⟩ becomes:

h2i+4 = (a6b4 − a4b6)h−12 ,

h2i+3 = a5b4 − a4b5,h2i+2 = (a5b3 − a3b5)h−1

2 ,h2i+1 = a4b3 − a3b4,h2i = (a4b2 − a2b4)h−1

2 ,h2i−1 = a3b2 − a2b3,h2i−2 = (a3b1 − a1b3)h−1

2 ,h2i−3 = a2b1 − a1b2,

and thus, we need 32 multiplications for calculating⟨⟨h2i⟩⟩ from ⟨⟨hi⟩⟩.Similarly, for ⟨⟨h2i+1⟩⟩ from ⟨⟨hi⟩⟩:

h2i+5 = a6b5 − a5b6,h2i+4 = (a6b4 − a4b6)h−1

2 ,h2i+3 = a5b4 − a4b5,h2i+2 = (a5b3 − a3b5)h−1

2 ,h2i+1 = a4b3 − a3b4,h2i = (a4b2 − a2b4)h−1

2 ,h2i−1 = a3b2 − a2b3,h2i−2 = (a3b1 − a1b3)h−1

2 .

Thus, we need 32 multiplications for calculating⟨⟨h2i+1⟩⟩ from ⟨⟨hi⟩⟩. So we can speed up about 33 per-cent.Note that we can calculate ⟨⟨hk⟩⟩ from ⟨⟨h1⟩⟩ for all

k ∈ N because we can use the doubling formulae likerepeated squaring.

2.4 Implementation

Theoretically, we have shown that it is possible tospeed up about 33 percent. To check that this is pos-sible in practice, we used a Windows 7 system (32 bits)with 2.8GHz CPU (Intel Core i7), 4GB memory, and

Table 1. Experimental results.

n [bit] 500 600 700

Naive [s] 0.4674 0.7834 1.2104

Speed up [s] 0.3261 0.5344 0.8074

Table 2. EDS method.

Input: Composite number N , integers B1, B2

Output: Non-trivial divisor of N

Step 1. Put k = k(B1, B2).

Step 2. Randomly choose h2, h3, h4 ∈ Z/NZ,where h2 divides h4.

Step 3. If gcd(6h2h3, N) = 1, N ,output gcd(6h2h3, N) and finish this algorithm.

Step 4. If gcd(∆, N) = 1, N ,output gcd(∆, N) and finish this algorithm.

Step 5. Compute ⟨⟨h1 mod N⟩⟩ from h2, h3, h4.Step 6. Compute ⟨⟨hk mod N⟩⟩ from ⟨⟨h1 mod N⟩⟩.Step 7. If gcd(hk, N) = 1, N ,

output gcd(hk, N) and finish this algorithm,else go back to Step 2.

500GB hard disk. Moreover, we used Python (Ver. 2.7)for writing the programs.At this point, we define a set:

⟨⟨hi mod p⟩⟩ := hi−3 mod p, . . . , hi+4 mod p,

for p a prime number.Our implementation consisted of recording the time

it took to calculate ⟨⟨hk mod p⟩⟩ from ⟨⟨h1 mod p⟩⟩,using the “naive” method (with 48 multiplications)and the speed-up method (with 32 multiplications), forrandomly-chosen 50 natural numbers k ∈ N less thansome n (representing bit length), and then comparingthe average running time for each method. Table 1 liststhe results for n = 500, 600, 700, which appear to val-idate the theoretical prediction in increase in speed by∼33%.

3. EDS method

In this section, we introduce the EDS method algo-rithm and its implementation.

3.1 Algorithm

First we recall the discriminant ∆ of an EDS [2]:

Definition 6 Let (hn) be an EDS with h2h3 = 0. Thediscriminant of (hn) is

∆ :=[−h44 − 3h52h34 + (−3h102 − 8h22h

33)h

24

+ (−h152 + 20h72h33)h4 + h122 h

33 − 16h42h

63]/(h

82h

33).

The sequence (hn) is said to be singular if ∆ = 0.For natural numbers B1 and B2, we define k(B1, B2)

= Πr≤B1rer (r: prime, er ∈ N, rer ≤ B2 < rer+1). We

use these notations for the following algorithm.Then, we show the EDS method algorithm in Table 2.Note that we can calculate h5 from h2, h3, h4 in Step

4 by putting m = 3, n = 2 in Definition 1.

3.2 Implementation

Then, we explain the implementation of the EDSmethod.We implemented the EDS method and the ECM us-

ing the same computer in Section 2.4. Table 3 shows the

– 22 –

JSIAM Letters Vol. 4 (2012) pp.21–23 Naotoshi Sakurada et al.

Table 3. EDS method and ECM.

N [bit] 40 80 120 160

EDS [s] 0.015 3.013 114.49 4906.76

ECM [s] 0.020 2.092 97.78 4317.45

average running time of the algorithms for randomly-chosen 50 composite numbers N = p×q (p, q are primes)for each bit length. In Table 3, “N” denotes a bit lengthof composite numbers N . In this implementation, theEDS method is about 20% slower than ECM on an av-erage.

3.3 EDS method

We explain why it is possible to find a non-trivial fac-tor by the EDS method. We recall the relationship be-tween EDSs and elliptic curves because the EDS methodis analogous to the ECM. First, recall that an ellipticcurve induces an EDS:

Theorem 7 ([3]) Let E be an elliptic curve: E : y2 +a1xy + a3y = x3 + a2x

2 + a4x, where a1, a2, a3, a4 ∈ Q,and consider a non-singular rational point P = (x1, y1).Then we can also write P using Jacobian coordinatesas P = [X1, Y1, Z1] where X1, Y1, Z1 are integers withgcd(X1, Z1) = gcd(Y1, Z1) = 1 and x1 = X1/Z1

2 andy1 = Y1/Z1

3. Here [i]P denotes a point adding P toitself i times using operations of an elliptic curve, and[i]P = [Xi, Yi, Zi]. In particular, let P = (0, 0, 1); thenthe sequence Zi is in fact an EDS.

Hence, an EDS is determined by an elliptic curve anda point on the curve. Conversely, an EDS induces anelliptic curve by the following:

Theorem 8 ([2]) Let (hn) be an EDS in which neitherh2 nor h3 are zero. Then there exists an elliptic curve:E : y2 = x3+ax+ b, where a, b ∈ Q, and a non-singularrational point P = (x1, y1) on E such that ψn(x1, y1) =hn for all n ∈ Z, where ψn is the n-th division polynomialof E.

An elliptic curve consists of a coefficient a of an el-liptic curve and an initial point (x, y) in the ECM. ByTheorems 7 and 8, choosing the initial terms (h2, h3, h4)of an EDS is equivalent to choosing a coefficient a of anelliptic curve and an initial point (x, y). Moreover, byTheorem 8, if the greatest common divisor of hk, whichis a general term of an EDS, and N , which is a compositenumber, is non-zero, it is possible to find a non-trivialdivisor of N , like in the ECM. In other words, if we cal-culate general terms hk of an EDS from the initial terms(h2, h3, h4), and calculate the greatest common divisorof hk and N , then it is possible to find a non-trivial divi-sor of N . This is the principle of the EDS method whichwe proposed.

3.4 Complexity

In this section, we explain a computational complexityof the EDS method. For estimation of the computationalcomplexity, we define the following parameters:

Lp[s, α] := exp((α+ o(1))[(log p)s(log log p)1−s)],

B1 = Lp[s, α], B2 = Lp[s′, α′], k = Πr≤B1r

er

(r : prime, rer ≤ B2).

Lp is used in estimating the computational complexityof integer factoring algorithms, such as the ECM.The computational complexity of the EDS method

is dominantly based on Step 6. This estimate relies onLenstra [4].In Step 7, the algorithm may be repeated until the

prime factorization of N is obtained. The number oftimes of the repetition is O(Lp[1− s, (1− s)/α]) becausechoosing the initial terms (h2, h3, h4) of an EDS is equiv-alent to choosing a coefficient a of an elliptic curve andan initial point (x, y).In Step 6, by Section 2, we can calculate ⟨⟨h2i mod

N⟩⟩ or ⟨⟨h2i+1 mod N⟩⟩ from ⟨⟨hi mod N⟩⟩ by 48 mul-tiplications at most, so the order is O((logN)2). Wecalculate ⟨⟨hk mod N⟩⟩ from ⟨⟨h1 mod N⟩⟩ like therepeated squaring method, so we repeat O(log k) =O(Lp[s, α]). So the computational complexity of calcu-lating ⟨⟨hk mod N⟩⟩ from ⟨⟨h1 mod N⟩⟩ is O(Lp[s, α](logN)2).Thus, the computational complexity of the EDS

method is O(Lp[1 − s, (1 − s)/α] × Lp[s, α](logN)2) =O(Lp[max1 − s, s, α′](logN)2), where α′ = (1 − s)/αif 1/2 > s, α′ = α if 1/2 < s, α′ = 1/2α if 1/2 = s.Then, do optimization s = 1/2, α′ =

√2. Therefore

optimized computational complexity of EDS method isO(Lp[1/2,

√2](logN)2). This is the same estimate of the

ECM in [4].Then we have the following theorem:

Theorem 9 The optimized complexity of the EDSmethod is O(Lp[1/2,

√2](logN)2).

4. Conclusion

We proposed an integer factoring algorithm, its opti-mization, and estimated the computational complexityin this paper. Future works are improvements and fur-ther optimizations of the EDS method.

Acknowledgments

We would like to thank the anonymous review-ers for their valuable comments. This work was sup-ported in part by Grant-in-Aid for Scientific Research(C)(20540125).

References

[1] M. Ward, Memoir on elliptic divisibility sequences, Amer. J.

Math., 70 (1948), 31-74.[2] C. S. Swart, Elliptic curves and related sequences, Ph.D. the-

sis, The Univ. of London, London, 2003.[3] R. Shipsey, Elliptic divisibility sequences, Ph.D. thesis, The

Univ. of London, London, 2000.[4] H. W. Lenstra, Jr., Factoring integers with elliptic curves,

Ann. Math., 126 (1987), 649-673.

– 23 –


A modified Block IDR(s) method for computing high

accuracy solutions

Michihiro Naito1, Hiroto Tadano1 and Tetsuya Sakurai1,2

1 Department of Computer Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki305-8573, Japan

2 JST CREST, 4-1-8 Hon-cho, Kawaguchi-shi, Saitama 332-0012, Japan

E-mail michihiro mma.cs.tsukuba.ac.jp

Received January 18, 2012, Accepted March 12, 2012

Abstract

In this paper, the difference between the residual and the true residual caused by the compu-tation errors that arise in matrix multiplications for solutions generated by the Block IDR(s)method is analyzed. Moreover, in order to reduce the difference between the residual and thetrue residual, a modified Block IDR(s) method is proposed. Numerical experiments demon-strate that the difference under the proposed method is smaller than that of the conventionalBlock IDR(s) method.

Keywords Block Krylov subspace methods, Block IDR(s) method, linear systems withmultiple right-hand sides, high accuracy solutions


1. Introduction

Linear systems with multiple right-hand sides of theform

AX = B,

where the coefficient matrix A ∈ Cn×n, B ∈ Cn×L, andX ∈ Cn×L appear together in many problems, includinglattice quantum chromodynamics calculation of physi-cal quantities [1] and an eigensolver problem using con-tour integration [2]. To solve these linear systems, BlockKrylov subspace methods such as Block BiCG [3] andBlock BiCGSTAB [4] have been proposed. These meth-ods can solve linear systems with multiple right-handsides more efficiently than Krylov subspace methods forsingle right-hand side.We consider the Block IDR(s) method [5] as a Block

Krylov subspace method. A difference between the resid-ual generated by the Block IDR(s) method and the trueresidual B − AX obtained by the approximate solutionoccurs. When such a difference occurs, even if the resid-ual generated by the Block IDR(s) method satisfies theconvergence criterion, high accuracy approximate solu-tions cannot be obtained. In this paper, we analyze thedifference between the residual and the true residual,and, based on the results of the analysis, a solution forreducing the difference is proposed.The composition of this paper is as follows. In Section

2, the algorithm of the Block IDR(s) method is illus-trated. In Section 3, the difference between the residualand the true residual caused by the computation errorsthat arise in matrix multiplications for solutions gener-ated by the Block IDR(s) method is analyzed. In Sec-tion 4, to reduce this difference, a modified Block IDR(s)method is proposed. We show that the errors which arise

in matrix multiplications for the proposed Block IDR(s)method do not influence between the residual and thetrue residual. In Section 5, some numerical experimentscomparing the conventional Block IDR(s) method andthe proposed Block IDR(s) method are described. InSection 6, this paper is concluded.

2. The Block IDR(s) method

In this section, we show the algorithm of the BlockIDR(s) method [5]. Given A ∈ Cn×n and R0 ∈ Cn×L,and assuming that the residuals Ri−s, . . . , Ri belong tosubspace Gj , the residual Ri+1 which belongs to sub-space Gj is constructed by setting

Ri+1 = (I − ωj+1A)Vi,

where Vi ∈ Cn×L. Then let

∆Rk = Rk+1 −Rk,

∆Xk = Xk+1 −Xk,

Gk = (∆Rk−s,∆Rk−s+1, . . . ,∆Rk−1),

Uk = (∆Xk−s,∆Xk−s+1, . . . ,∆Xk−1).

Then Vi can be written as

Vi = Ri −GiCi. (1)

Moreover, the condition on Vi can be written as

PHVi = O, (2)

where P ∈ Cn×sL. Then Ci can be obtained from (1)and (2).The approximate solution Xi+1 can be written as

Xi+1 = Xi + ωj+1Vi − UiCi,

– 25 –

JSIAM Letters Vol. 4 (2012) pp.25–28 Michihiro Naito et al.

X0 ∈ Cn×L is an initial guessR0 = B −AX0, P ∈ Cn×sL

for i = 0 to s− 1 doVi = ARi, ω = Tr(V H

i Ri)/Tr(VHi Vi)

∆Xi = ωRi,∆Ri = ωViXi+1 = Xi +∆Xi, Ri+1 = Ri +∆Ri

end forGi+1 = (∆Ri−s+1,∆Ri−s+2, . . . ,∆Ri)Ui+1 = (∆Xi−s+1,∆Xi−s+2, . . . ,∆Xi)M = PHGi+1, F = PHRi+1

i = swhile ∥Ri∥F < ϵ∥B∥F dofor k = 0 to s do

solve Ci from MCi = FVi = Ri −GiCi

if k = 0 thenTi = AViω = Tr(TH

i Vi)/Tr(THi Ti)

∆Ri = −GiCi − ωAVi∆Xi = −UiCi + ωVi

else∆Xi = −UiCi + ωVi∆Ri = −A∆Xi

end ifXi+1 = Xi +∆Xi, Ri+1 = Ri +∆Ri

M = PHGi, F = PHRi+1

Gi+1 = (∆Ri−s+1,∆Ri−s+2, . . . ,∆Ri)Ui+1 = (∆Xi−s+1,∆Xi−s+2, . . . ,∆Xi)i = i+ 1

end forend while

Fig. 1. Algorithm of the Block IDR(s) method.

where the scalar parameter ωj+1 is

ωj+1 = Tr[(AVi)

HVi]/Tr[(AVi)

HAVi].

The algorithm of the Block IDR(s) method is shownin Fig. 1. Here, ∥ · ∥F denotes the Frobenius norm of amatrix and Tr[ · ] denotes the trace of a matrix.

3. Analysis of the difference between the

residual and the true residual

The relation between the residual Rk and the approx-imate solution Xk can be written as

Rk = B −AXk. (3)

However, a difference between the residual generatedby the Block IDR(s) method and the true residual ob-tained by the approximate solution occurs. In this sec-tion, we analyze this difference based on an analysismethod of the Block BiCGGR method [6].We define X0 and R0 as

X0 = X0 +∆X0 +∆X1 + · · ·+∆Xs−1,

R0 = R0 +∆R0 +∆R1 + · · ·+∆Rs−1.

The residual Ri+1 and the approximate solution Xi+1

generated by the Block IDR(s) method are written as

Xi+1 = Xi + ωj+1Vi − UiCi

= X0 +

i∑k=s

ωmVk −i∑

k=s

UkCk, (4)

and

Ri+1 = Ri − ωj+1AVi −GiCi

= R0 −i∑

k=s

ωmAVk −i∑

k=s

GkCk, (5)

where m = ⌊(k + 1)/(s+ 1)⌋. From (4) and (5), the trueresidual B−AXk for the Block IDR(s) method is givenby

B −AXi+1 = R0 −i∑

k=s

A(ωmVk)−i∑

k=s

A(UkCk)

= Ri+1 +i∑

k=s

[ωm(AVk)−A(ωmVk)]

+i∑

k=s

[GkCk −A(UkCk)]. (6)

From (3) and (6), the difference between the residual

and the true residual is given by∑i

k=s[ωm(AVk) −A(ωmVk)] +

∑ik=s[GkCk −A(UkCk)], in (6).

4. Derivation of a modified Block IDR(s)

method

In this section, from the analysis of the differencebetween the residual generated by the Block IDR(s)method and the true residual obtained from the approx-imate solution, a modified Block IDR(s) method is pro-posed to reduce this difference.To reduce the difference, the proposed method negates

the influence of the computation error generated by themultiplication with Ci in the Block IDR(s) method.Then the proposed method satisfies

GkCk −A(UkCk) = O.

We define the following equation

Qk = −Uk − ωj+1Gk. (7)

From (7), the residual Ri+1 and the approximate solu-tion Xi+1 generated by the Block IDR(s) is written as

Xi+1 = Xi + ωj+1Ri +QiCi

= X0 +i∑

k=s

ωmRk +i∑

k=s

QkCk. (8)

Ri+1 = Ri − ωj+1ARi −A(QiCi)

= R0 −i∑

k=s

ωm(ARk)−i∑

k=s

A(QkCk). (9)

From (8) and (9), the true residual B−AXk is written

– 26 –


X0 ∈ Cn×L is an initial guessR0 = B −AX0, P ∈ Cn×sL

for i = 0 to s− 1 doVi = ARi, ω = Tr(V H

i Ri)/Tr(VHi Vi)

∆Xi = ωRi,∆Ri = ωViXi+1 = Xi +∆Xi, Ri+1 = Ri +∆Ri

end forGi+1 = (∆Ri−s+1,∆Ri−s+2, . . . ,∆Ri)Ui+1 = (∆Xi−s+1,∆Xi−s+2, . . . ,∆Xi)M = PHGi+1, F = PHRi+1

i = swhile ∥Ri∥F < ϵ∥B∥F dofor k = 0 to s do

solve Ci from MCi = Fif k = 0 thenQi = −Ui − ωGi

W = ARi

ω = Tr(WHRi)/Tr(WHW )

∆Ri = −ωW −A(QiCi)∆Xi = ωRi +QiCi

elseVi = Ri −GiCi

∆Xi = −UiCi + ωVi∆Ri = −A∆Xi

end ifXi+1 = Xi +∆Xi, Ri+1 = Ri +∆Ri

M = PHGi, F = PHRi+1

Gi+1 = (∆Ri−s+1,∆Ri−s+2, . . . ,∆Ri)Ui+1 = (∆Xi−s+1,∆Xi−s+2, . . . ,∆Xi)i = i+ 1

end forend while

Fig. 2. Algorithm of the proposed method.

as

B −AXi+1 = R0 −i∑

k=s

A(ωmRk)−i∑

k=s

A(QkCk)

= Ri+1 +i∑

k=s

[ωm(ARk)−A(ωmRk)]

+

i∑k=s

[A(QkCk)−A(QkCk)]

= Ri+1 +i∑

k=s

[ωm(ARk)−A(ωmRk)].

By comparing (6) with the above equation, we see thatthe influence of the computation error generated by themultiplication with Ci in the Block IDR(s) method isnegated.The algorithm of the proposed Block IDR(s) method

is shown in Fig. 2.

5. Numerical experiments

In this section, we verify that the proposed BlockIDR(s) method can reduce the difference between theresidual and the true residual relative to the conven-

Table 1. Size and number of nonzero elements of test matrices.

Matrix name SizeNumber of

nonzero elements

poisson2D 367 2,417CONF5.0-00L8X8-1000 49,152 1,916,928

Table 2. Results of the Block IDR(s) method for poisson2D.

s L Iter. Res. True Res.

1 140 8.73×10−15 9.84×10−15

1 2 105 1.39×10−14 1.44×10−14

4 79 1.86×10−15 4.87×10−15

1 100 2.92×10−15 4.52×10−14

8 2 79 7.30×10−15 3.15×10−13

4 57 9.89×10−15 6.34×10−14

1 99 6.30×10−15 2.34×10−13

16 2 77 4.89×10−15 2.58×10−13

4 57 2.88×10−15 4.59×10−13

1 98 3.77×10−15 1.47×10−10

32 2 75 9.10×10−15 6.72×10−11

4 57 3.30×10−15 1.86×10−11

Table 3. Results of the Block IDR(s) method for CONF5.0-00L8X8-1000.


1 1140 7.43×10−15 1.12×10−14

1 2 895 1.85×10−14 3.95×10−14

4 847 8.57×10−15 2.47×10−11

1 904 7.67×10−15 1.09×10−14

8 2 710 1.89×10−14 5.04×10−14

4 550 9.70×10−15 1.42×10−12

1 867 5.75×10−15 2.69×10−14

16 2 697 2.71×10−15 4.08×10−13

4 539 3.31×10−15 2.58×10−13

1 851 9.16×10−15 1.05×10−5

32 2 692 2.59×10−15 1.14×10−3

4 527 6.08×10−15 1.40×10−4

tional Block IDR(s) method through comparative ex-periments.The test matrices used in the numerical experiments

are poisson2D and CONF5.4-00L8X8-1000 from theMATRIX MARKET collection [7]. The size and thenumber of nonzero elements of these matrices are shownin Table 1. The matrix CONF5.0-00L8X8-1000 is con-structed as In−κD, whereD ∈ Cn×n is a non-Hermitianmatrix and κ is a real-valued parameter. The parameterκ was set to 0.1782.The initial solution X0 was set to the zero matrix. The

right-hand side B is given by B = [e1, e2, . . . , eL], whereej is the jth unit vector. The convergence criterion ofthe residual was set with 1.0× 10−14.All experiments were performed on an Intel Core i7

2.8 GHz CPU with 8 GB of memory using MATLAB7.12.0.635 (R2011a).The results of the conventional Block IDR(s) method

are shown in Tables 2 and 3. In this Table, Iter., Res.,and True Res. denote the number of iterations, the rel-ative residual norm ∥Rk∥F/∥Bk∥F, and the true rela-tive residual norm ∥B−AXk∥F/∥Bk∥F, respectively. Asshown in Tables 2 and 3, the relative residual norms ofthe conventional Block IDR(s) method satisfy the con-vergence criterion. However, because of the differencebetween the true residual and the residual generated bythe Block IDR(s) method, the true residual norms do

– 27 –


Table 4. Results of the proposed Block IDR(s) method for pois-son2D.


1 139 8.47×10−15 8.95×10−15

1 2 103 3.99×10−15 4.61×10−15

4 77 6.93×10−15 7.31×10−15

1 100 2.65×10−15 1.53×10−15

8 2 79 6.34×10−16 1.85×10−15

4 60 2.43×10−16 1.89×10−15

1 100 6.72×10−16 1.27×10−15

16 2 77 5.02×10−15 5.22×10−15

4 59 4.48×10−15 4.67×10−15

1 97 9.35×10−15 9.50×10−15

32 2 77 8.14×10−16 1.48×10−15

4 61 5.10×10−15 5.25×10−15

Table 5. Results of the proposed Block IDR(s) method forCONF5.0-00L8X8-1000.


1 1127 9.24×10−15 1.74×10−14

1 2 991 6.86×10−15 7.15×10−15

4 813 3.10×10−14 3.19×10−14

1 901 5.86×10−15 6.04×10−15

8 2 710 1.08×10−14 1.09×10−14

4 557 1.58×10−14 1.62×10−14

1 868 7.61×10−15 7.69×10−15

16 2 703 8.11×10−15 8.18×10−15

4 530 9.14×10−15 9.21×10−15

1 689 8.20×10−15 8.36×10−15

32 2 524 8.57×10−15 8.67×10−15

4 404 7.71×10−15 8.57×10−15

not satisfy the convergence criterion.The results of the proposed Block IDR(s) method are

shown in Tables 4 and 5. As shown, the relative resid-ual norms of the proposed Block IDR(s) method sat-isfy the convergence criterion. Moreover, the proposedBlock IDR(s) method reduced the differences betweenthe residual and the true residual relative to the conven-tional Block IDR(s) method.

6. Conclusion

A difference between the residual generated by theBlock IDR(s) method and the true residual B−AX mayoccur. If so, even if the residual generated by the BlockIDR(s) method satisfies the convergence criterion, highaccuracy approximate solutions cannot be obtained.Therefore, in this paper, we analyzed the difference

between the residual generated by the Block IDR(s)method and the true residual. From the analysis re-sults, we were able to propose a modified Block IDR(s)method. The proposed method can negate the influenceof the computation error generated by the multiplicationwith Ci in the Block IDR(s) method. Through numer-ical experiments, we verified that the proposed methodcan reduce the difference relative to the conventionalmethod.

Acknowledgments

This research was supported in part by a Grant-in-Aidfor Scientific Research of Ministry of Education, Culture,Sports, Science and Technology, Japan, Grant number:21246018, 21105502 and 22700003.

References

[1] PACS-CS Collaboration, S. Aoki et al., 2+1 Flavor Lattice

QCD toward the Physical Point, arXiv:0807.1661v1 [hep-lat],2008.

[2] T. Sakurai, H. Tadano, T. Ikegami and U. Nagashima, Aparallel eigensolver using contour integration for generalized

eigenvalue problems in molecular simulation, Taiwanese J.Math., 14 (2010), 855–867.

[3] D. P. O’Leary, The block conjugate gradient algorithm andrelated methods, Lin. Alg. Appl., 29 (1980), 293–322.

[4] A. El Guennouni, K. Jbilou and H. Sadok, A block version ofBiCGSTAB for linear systems with multiple right-hand sides,Elec. Trans. Numer. Anal., 16 (2003), 129–142.

[5] L. Du, T. Sogabe, B. Yu, Y. Yamamoto and S. -L. Zhang,

A block IDR(s) method for nonsymmetric linear systemswith multiple right-hand sides, J. Comput. Appl. Math., 235(2011), 4095–4106.

[6] H. Tadano, T. Sakurai and Y. Kuramashi, Block BiCGGR:

a new Block Krylov subspace method for computing highaccuracy solutions, JSIAM Letters, 1 (2009), 44–47.

[7] Matrix Market, http://math.nist.gov/MatrixMarket/

– 28 –

JSIAM Letters Vol.4 (2012) pp.29–32 c⃝2012 Japan Society for Industrial and Applied Mathematics J S I A MLetters

An alternating discrete variational derivative method for

coupled partial differential equations

Hiroaki Kuramae1 and Takayasu Matsuo1

1 Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1Hongo, Bunkyo-ku, Tokyo 113-8654, Japan

E-mail hiroaki kuramae mist.i.u-tokyo.ac.jp

Received November 11, 2011, Accepted April 3, 2012

Abstract

A new procedure to design numerical schemes for coupled partial differential equations isproposed. The resulting schemes have discrete counterparts of conservative or dissipativequantity in original system. They also enjoy another welcome feature that they are constructedon staggered time meshes, by which each variables can be computed alternately with lesscomputational costs than usual schemes. The procedure is demonstrated in the case of thecoupled KdV equations.

Keywords coupled partial differential equations, conservation law, discrete variationalderivative method

Research Activity Group Scientific Computation and Numerical Analysis

1. Introduction

We propose a new procedure to design numeri-cal schemes for coupled partial differential equations(PDEs) that have two welcome features; the resultingschemes inherit the energy conservation or dissipationproperties from the original equations in some relaxedsenses, and are defined on staggered time meshes so thatsolution variables can be alternately computed with lesscomputational effort.For the first feature, it is known that the inheri-

tance of the conservation/dissipation properties gener-ally makes the numerical scheme stable and qualitativelybetter. A general procedure to derive such a conser-vative/dissipative numerical scheme—the so-called dis-crete variational derivative method (DVDM)—has beenalready proposed, and intensively investigated in the lasttwo decades [1–4].The second feature, i.e. the idea of staggered meshes,

dates back to the time of the old “leapfrog method,”which is for integrating Hamiltonian systems. There,positions and velocities are calculated in an interleavedway, using “staggered meshes”; positions are defined oninteger time steps, and velocities on integer plus halftime steps. An important outcome of this method is thatthe two variables can be computed alternately, and thusthe systems to be solved is half the size of the originalsystem. This idea easily extends to general evolutionarycoupled PDEs, and several “alternating” time steppinghave been tried; see [5,6], for example. The systems de-rived by some space discretization of PDEs tend to bequite large, and thus the size reduction can be furtherbeneficial than the case of Hamiltonian systems.As far as the authors know, however, it has not

ever been tried to combine the above two features,except for some limited studies devoted to specificPDEs: Glassey [7, 8] proposed an alternating conser-

vative finite difference scheme for the Zakharov equa-tions, and some modification was proposed by Changand Jiang [9]. Mu and Huang [10] proposed an alter-nating dissipative finite element scheme for the timedependent (non-equilibrium) Ginzburg–Landau equa-tions. Although they were successful on their own,no general principle to find such alternating conserva-tive/dissipative schemes has been known so far.The aim of this paper is to provide this missing prin-

ciple: more precisely, we show that the DVDM proce-dure for coupled PDEs problems [11] can be modified sothat alternating conservative/dissipative schemes can beconstructed. The keypoint of the new method—the “al-ternating” DVDM—is a new construction of “discretevariational derivatives.” It will then turn out that theabove specific alternating schemes can be derived as thespecial cases of the new method.Due to the restriction of space, in this paper we set

the following two restrictions. First, we focus on the timediscretization and leave the space variable untouched.Second, we consider only the simplest real-valued con-servative coupled PDEs defined afterward.The rest of this paper is organized as follows: in Sec-

tion 2, we define the target systems. The existing ap-proach of DVDM for coupled PDEs systems is brieflyreviewed in Section 3. We introduce the new procedurein Section 4. In Section 5, an example is shown for thecase of the coupled KdV equations. Finally, in Section6, some additional remarks are given.

2. Target PDEs and their properties

Let u and v be real-valued functions of (t, x) ∈ [0, T ]×[0, L] for finite L and T , with the periodic boundary con-dition: u(t, x) = u(t, x + L). We consider a real-valuedfunction G(t, x) = G(u(t, x), v(t, x), ux(t, x), vx(t, x)),which is often called “local energy,” and its integral H

– 29 –

JSIAM Letters Vol. 4 (2012) pp.29–32 Hiroaki Kuramae et al.

over [0, L]:

H(t) =

∫ L

0

G(u(t, x), v(t, x), ux(t, x), vx(t, x))dx,

which is called “energy.” The variational derivatives ofG are defined by

δG

δu:≡ ∂G

∂u− ∂

∂x

∂G

∂ux,

δG

δv:≡ ∂G

∂v− ∂

∂x

∂G

∂vx.

Let us then consider the following PDE system:

∂u

∂t= Au

δG

δu,

∂v

∂t= Av

δG

δv, (1)

where Au and Av are operators which are skew-symmetric with respect to the L2 inner product.

Theorem 1 The system (1) is conservative: dH/dt =0.

Proof

dH

dt=

∫ L

0

∂G

∂tdx

=

∫ L

0

(∂G

∂u

∂u

∂t+∂G

∂ux

∂ux∂t

+∂G

∂v

∂v

∂t+∂G

∂vx

∂vx∂t

)dx

=

∫ L

0

(δG

δu

∂u

∂t+δG

δv

∂v

∂t

)dx

=

∫ L

0

δG

δuAu

δG

δudx+

∫ L

0

δG

δvAv

δG

δvdx = 0.

On third equality, we use the integration by parts andthe periodic boundary condition.

(QED)

3. The standard DVDM

We briefly review the procedure of the discrete varia-tional derivative method [4, 11]. Suppose that the localenergy function G is of the form:

G(u, v, ux, vx) =

M∑l=1

fl(u)gl(v)ϕl(ux)ψl(vx).

We describe a numerical solution by u(k)(x) ≃ u(k∆t, x)where ∆t is the time mesh size. We also use the followingnotation:

f(k)l :≡ fl(u(k)), g

(k)l :≡ gl(v(k)),

ϕ(k)l :≡ ϕl(ux(k)), ψ

(k)l :≡ ψl(vx

(k)),

δku(k) :≡ u(k) − u(k−1)

∆t,

Mf(k)l :≡ fl(u

(k)) + fl(u(k−1))

2,

Df (k)l :≡

fl(u

(k))− fl(u(k−1))

u(k) − u(k−1), (u(k) = u(k−1)),

dfl(a)

da

∣∣∣∣a=u(k)

, (u(k) = u(k−1)).

Now we define a discrete version of an energy and partialderivatives by

Gd(k) :≡

M∑l=1

f(k)l g

(k)l ϕ

(k)l ψ

(k)l , Hd

(k) :≡∫ L

0

Gd(k)dx,

∂Gd

∂(u(k), u(k−1))

:≡M∑l=1

[(Df (k)l )(Mg

(k)l )(Mϕ

(k)l )(Mψ

(k)l )],

∂Gd

∂(u(k)x , u

(k−1)x )

:≡M∑l=1

[(Mf

(k)l )(Mg

(k)l )(Dϕ(k)l )(Mψ

(k)l )].

We then define the discrete version of the variationalderivative by

δGd

δ(u(k), u(k−1)):≡ ∂Gd

∂(u(k), u(k−1))− ∂

∂x

∂Gd

∂(u(k)x , u

(k−1)x )

.

We also define δGd/δ(v(k), v(k−1)

)in like manner. Now

we define the conservative numerical scheme.

Scheme 2 (Conservative scheme by the standardDVDM) For given initial data u(0) and v(0), we com-pute u(k) and v(k) (k = 1, 2, . . . ) by

δku(k) = Au

δGd

δ(u(k), u(k−1)),

δkv(k) = Av

δGd

δ(v(k), v(k−1)).

(2)

Lemma 3 The solutions of Scheme 2, if exist, satisfy

δkHd(k) =

∫ L

0

(δGd

δ(u(k), u(k−1))δku

(k)

+δGd

δ(v(k), v(k−1))δkv

(k)

)dx.

This is the key identity of the DVDM, which correspondsto the chain rule of differentiation. It is easily verified,and hence the proof is omitted (see [4] for a relatedproof).Notice that δGd/δ

(u(k), u(k−1)

)is time-symmetric:

δGd/δ(u(k), u(k−1)

)= δGd/δ

(u(k−1), u(k)

), and so is

δGd/δ(v(k), v(k−1)

), which guarantees the resulting

scheme (more precisely the ODE obtained by the spacialdiscretization) being O(∆t2) accurate.The scheme retains the desired conservation property

as follows.

Theorem 4 The solutions of Scheme 2, if exist, satisfyHd

(k) = Hd(k−1).

Proof

δkHd(k) =

∫ L

0

δGd

δ(u(k), u(k−1))Au

δGd

δ(u(k), u(k−1))dx

+

∫ L

0

δGd

δ(v(k), v(k−1))Av

δGd

δ(v(k), v(k−1))dx

= 0.

– 30 –


Lemma 3 and Scheme 2 are used.(QED)

Note that the discrete variational derivatives, derivedby the procedure above, are functions of u(k−1), u(k),v(k−1) and v(k). Therefore, the scheme is implicit withrespect both to u and v, and we must solve the wholesystem simultaneously at every time step.

4. An “alternating” DVDM

We modify the procedure above so that staggered timemeshes can be utilized. We set u on integer meshes asbefore, and v on integer plus half meshes: v(k+1/2) ≃v((k + 1/2)∆t, x). We then define the modified discreteenergy:

G(k,k+ 1

2 )

d :≡M∑l=1

f(k)l g

(k+ 12 )

l ϕ(k)l ψ

(k+ 12 )

l ,

H(k,k+ 1

2 )

d :≡∫ L

0

G(k,k+ 1

2 )

d dx, (3)

where g(k+1/2)l = gl(v

(k+1/2)) and ψ(k+1/2)l = ψl(v

(k+1/2)x ).

Next, corresponding to the modified energy (3), we de-fine the modified discrete partial derivatives as follows.

∂Gd

∂(u(k), u(k−1))

:≡M∑l=1

[(Df (k)l )(g

(k− 12 )

l )(Mϕ(k)l )(ψ

(k− 12 )

l )], (4)

∂Gd

∂(u(k)x , u

(k−1)x )

:≡M∑l=1

[(Mf

(k)l )(g

(k− 12 )

l )(Dϕ(k)l )(ψ(k− 1

2 )

l )], (5)

∂Gd

∂(v(k+12 ), v(k−

12 ))

:≡M∑l=1

[(f

(k)l )(Dg(k+

12 )

l )(ϕ(k)l )(Mψ

(k+ 12 )

l )], (6)

∂Gd

∂(v(k+ 1

2 )x , v

(k− 12 )

x )

:≡M∑l=1

[(f

(k)l )(Mg

(k+ 12 )

l )(ϕ(k)l )(Dψ(k+ 1

2 )

l )]. (7)

Then the discrete variational derivatives are defined likebefore (see (2)), except that for v the staggered variablesshould be used instead.The new scheme reads as follows.

Scheme 5 (An alternating conservative scheme)For given initial data u(0) and v(1/2), we compute u(k)

and v(k+1/2) (k = 1, 2, . . . ) byδku

(k) = AuδGd

δ(u(k), u(k−1)),

δkv(k+ 1

2 ) = AvδGd

δ(v(k+12 ), v(k−

12 ))

.

(8)

Carefully observing (4)–(7), we see that Scheme 5 is

alternating. In fact, since (4) and (5) are functions ofu(k−1), u(k) and v(k−1/2), the first equation of (8) pro-vides u(1) only from u(0) and v(1/2). Then, in turn, thesecond equation gives v(3/2) from u(1) and v(1/2) as seenin (6) and (7). Notice also that the scheme is symmet-ric with respect to time, which immediately implies thatthe scheme is O(∆t2) accurate.The conservation property of Scheme 5 is immediate

from the following key identity, which is analogous toLemma 3.

Lemma 6 The solutions of Scheme 5, if exist, satisfy

δkH(k,k+ 1

2 )

d =

∫ L

0

(δGd

δ(u(k), u(k−1))δku

(k)

+δGd

δ(v(k+12 ), v(k−

12 ))

δkv(k+ 1

2 )

)dx.

Theorem 7 The solutions of Scheme 5, if exist, satisfy

H(k,k+1/2)d = H

(k−1,k−1/2)d .

Proof The same argument as that in Theorem 4.(QED)

Scheme 5 demands the initial data v(1/2). This can beobtained by Scheme 2 which is guaranteed to be con-servative, or simply by an arbitrary numerical methodimplemented with sufficiently small time steps.At first glance, the modified discrete energy func-

tion (3) might seem weird, since it is ambiguous exactlyat what time t the energy is defined (it refers to two dif-ferent t’s and not symmetric). One excuse (of not onlythis work, but also all the related “alternating” schemes)may be that when ∆t is sufficiently small, this ambiguityshould not be serious because it is to some extent similarto the original one. A better solution is to “symmetrize”the energy function while keeping the resulting schemesame. See Remark 8 below for the detail.

5. Example

We demonstrate the above new procedure, taking theconservative system of Ito’s coupled KdV equations [12]:

G(u, v, ux, vx) = u3 − 1

2(ux)

2 + uv2,

ut =∂

∂x

δG

δu=

∂

∂x(3u2 + uxx + v2),

vt =∂

∂x

δG

δv=

∂

∂x(2uv).

The standard DVDM yields the following scheme:

δku(k) =

∂

∂x

[(u(k))2 + u(k)u(k−1) + (u(k−1))2

]+M(uxx

(k)) +M[(v(k))2

], (9)

δkv(k) =

∂

∂x

[2(Mu(k))(Mv(k))

], (10)

which conserves the discrete energy:

H(k) =

∫ L

0

[(u(k))3 − 1

2(ux

(k))2 + u(k)(v(k))2]dx.

Eqs. (9) and (10) form a system of nonlinear equationswith respect to u(k) and v(k), which means we have to

– 31 –


Table 1. Computation time of two schemes.

Scheme CPU time (second) Ratio

Standard DVDM 54.776 1Alternating DVDM 30.053 0.5487

solve a system of 2N nonlinear equations when the spacevariable is discretized with N points.With the new approach, the following alternating

scheme is obtained:

δku(k) =

∂

∂x

[(u(k))2 + u(k)u(k−1) + (u(k−1))2

]+M(uxx

(k)) + (v(k−12 ))2

, (11)

δkv(k+ 1

2 ) =∂

∂x

[2u(k)(Mv(k+

12 ))], (12)

which conserves the modified discrete energy:

H(k,k+ 12 )

=

∫ L

0

[(u(k))3 − 1

2(ux

(k))2 + u(k)(v(k+12 ))2

]dx. (13)

In contrast to the previous case, the equations of thescheme can be computed alternately: u(1) can be ob-tained from u(0) and v(1/2) via (11); then v(3/2) fromv(1/2) and u(1) via (12). After the space discretization,we only have to solve two systems of dimension N , in-stead of one 2N system, which implies less computa-tional cost. A computation time comparison is shown inTable 1. Space is discretized by the standard finite dif-ference setting [4]. We set to N = 256 and T/∆t = 100(the number of time steps), and used Intel Core 2 P9400,Python 2.6.5, and scipy.optimize.fsolve, which is awrapper around MINPACK’s hybrd and hybrj.

Remark 8 We have proved the conservation of (13)by the procedure in Section 4. Additionally, with a care-ful observation on the discrete variational property, wecan also prove the conservation of the following “sym-metrized” discrete energy:

Hs(k+1,k,k+ 1

2 )

=

∫ L

0

[M(u(k+1))3 − M(ux

(k+1))2

2

+ (Mu(k+1))(v(k+12 ))2

]dx. (14)

We omit the proof of this due to the limitation of space.The symmetrized discrete energy (14) is exactly “on timek+1/2” and therefore it has no ambiguity. The same ideacan be found in [9] in the case of the Zakharov equations.

6. Concluding remarks

In this paper, we proposed a new method by which“alternating” conservative schemes can be automaticallyderived. We also demonstrated the method for the caseof the coupled KdV. Although in this report we consid-ered only the conservative case, the procedure can beeasily extended to dissipative cases, where the operatorAu and Av are, for example, negative semidefinite.Finally we like to make comments on some issues

that could not be discussed in this short report. First,the existing alternating conservative/dissipative schemes(those mentioned in Introduction, and also some relatedstudies [13,14]) can be understood to be the special casesof this new method. In other words, we have revealed thehidden “principle” behind the existing schemes.Next, note that the schemes derived by the above pro-

cedure are generally still nonlinear (recall the coupledKdV case); what the new method achieves is just tohalve the system size. On the other hand, in the liter-ature of DVDM, a general linearization technique hasbeen proposed [2, 4]. Combining this technique and theproposed alternating technique, we can construct lin-ear, alternating and conservative/dissipative schemes.We have already tested this approach for the Ginzburg–Landau equations, and found that the scheme is thefastest among all, while stability is successfully kept.The above issues and the more details of the method,

together with further application to various coupledPDEs, will be reported elsewhere soon.

References

[1] D. Furihata, Finite difference schemes for∂u

∂t=

(∂

∂x

)α δG

δuthat inherit energy conservation or dissipation property, J.

Comput. Phys., 156 (1999), 181–205.[2] T. Matsuo and D. Furihata, Dissipative or conservative finite-

difference schemes for complex-valued nonlinear differential

equations, J. Comput. Phys., 171 (2001), 425–447.[3] T. Matsuo, Dissipative/conservative Galerkin method using

discrete partial derivatives for nonlinear evolution equations,J. Comput. Appl. Math., 218 (2008), 506–521.

[4] D. Furihata and T. Matsuo, Discrete Variational DerivativeMethod: A Structure-Preserving Numerical Method for Par-tial Differential Equations, Chapman & Hall/CRC, Boca Ra-ton, 2011.

[5] K. S. Yee, Numerical solution of initial boundary value prob-lems involving Maxwell’s equations in isotropic media, IEEETrans. Antennas Propag., 14 (1966), 302–307.

[6] M. Ghrist, B. Fornberg and T. A. Driscoll, Staggered time

integrators for wave equations, SIAM J. Numer. Anal., 38(2000), 718–741.

[7] R. T. Glassey, Approximate solutions to the Zakharov equa-

tions via finite differences, J.Comput.Phys., 100 (1992), 377–383.

[8] R. T. Glassey, Convergence of an energy-preserving schemefor the Zakharov equations in one space dimension, Math.

Comp., 58 (1992), 83–102.[9] Q. Chang and H. Jiang, A conservative difference scheme for

the Zakharov equations, J. Comput. Phys., 113 (1994), 309–319.

[10] M. Mu and Y. Huang, An alternating Crank–Nicolson meth-od for decoupling the Ginzburg–Landau equations, SIAM J.Numer. Anal., 35 (1998), 1740–1761.

[11] T. Matsuo, M. Sugihara, D. Furihata and M. Mori, The dis-

crete variational derivative method for systems of equations(in Japanese), RIMS Kokyuroku, 1198 (2001), 128–136.

[12] M. Ito, Symmetries and conservation laws of a coupled non-linear wave equation, Phys. Lett. A, 91 (1982), 335–338.

[13] L. Zhang, Convergence of a conservative difference scheme fora class of Klein–Gordon–Schrodinger equations in one spacedimension, Appl. Math. Comp., 163 (2005), 343–355.

[14] L. Zhang, D. Bai and S. Wang, Numerical analysis fora conservative difference scheme to solve the Schrodinger–Boussinesq equation, J. Comput. Appl. Math., 235 (2011),4899–4915.

– 32 –


The existence of solutions to topology

optimization problems

Satoshi Kaizu1

1 College of Science and Technology, Nihon University, 8-14, Kanda-Surugadai 1-chome,Chiyoda-ku, Tokyo 101-8308, Japan

E-mail kaizusatoshi gmail.com

Received April 5, 2012, Accepted May 16, 2012

Abstract

Topology optimization is to determine a shape or topology, having minimum cost. We aredevoted entirely to minimum compliance (maximum stiffness) as minimum cost. An optimalshape Ω is realized as a distribution of material on a reference domain D, strictly larger than Ωin general. The optimal shape Ω and an equilibrium u(Ω) on Ω are approximated by materialdistributions on the domain D and equilibriums also on D, respectively. This note gives asufficient setting to the existence of an optimal material distribution.

Keywords topology optimization, the Hausdorff metric, minimum compliance, maximumstiffness, material distribution problem

Research Activity Group Mathematical Design

1. Introduction

Let D be a bounded domain of Rd, d = 2, 3, and ω bean open subset included in D together with its closure ω,i.e., ω ⊂ ω ⊂ D. The set ω may have several connectedcomponents. Let Ω = D \ ω. The domain Ω could bemultiply connected in general. Some material with den-sity value one is filled into Ω and D, so the weights ofΩ and D are given by |Ω| =

∫Ωdx and |D|, respectively.

We assume that Ω has the weight |Ω| (≤ cv), where cv isstrictly smaller than |D|. Let Γ = ∂D be the boundaryof D such that Γ = ΓD ∪ ΓN , Γo

D ∩ ΓoN = ∅, where Γo

D

and ΓoN are the interiors of ΓD and ΓN , respectively. Let

H1(Ω) and H1/2(ΓN ) be the usual Sobolev spaces andlet H−1/2(ΓN ) be the dual space of H1/2(ΓN ).For f ∈ L2(D), f = 0 and g ∈ H−1/2(ΓN ) we consider

the problem BVP(Ω): Find uΩ ∈ H1(Ω) such that−∆uΩ + uΩ = f in Ω,uΩ = 0 on ΓD,∂νu

Ω = g on ΓN ,∂νu

Ω = 0 on Γω(= ∂ω ∩ ∂Ω).

(1)

Let J(Ω) be cost of Ω(⊂ D) given by

J(Ω) =∫ΩfuΩdx+

∫ΓN

guΩdΓ. (2)

After giving an admissible family U of domains Ω ad-equately, we consider a minimizing problem, called thetopology optimization problem (confer [1,2] and referecestherein), TOP(D): Find Ω∗ ∈ U such that

J(Ω∗) = infΩ∈U J(Ω). (3)

Our aim is to assure the existence of a solution Ω∗ ∈ Uof TOP(D). For this aim a choice of admissible setsis significant (confer Theorem 10). Another aim is toapproximate such a solution Ω∗ by density functions ϕ ∈L∞(D). Before precise description of U we approximate

the problem (1) by boundary value problems (5).

2. Approximation

We owe the idea of the approximation (5) to the spiritin the SIMP model by Bendsøe and Sigmund [2]. Let OD

be a family of open, connected sets Ω (⊂ D,Γ ⊂ Ω, |Ω| ≤cv) and let Φ = χΩ | Ω ∈ OD, where χΩ denotes thecharacteristic function of Ω. For Ω of OD, χΩ (∈ Φ) canbe obviously approximated by simple functions χΩ,ω

κ =χΩ + κχω using small κ (> 0). Let Φκ = χΩ,ω

κ | Ω ∈OD and V (Ω) = v ∈ H1(Ω) | v = 0 on ΓD. Theproblem BVP(Ω) (1) is equivalent to the problem: FinduΩ ∈ V (D) such that

aΩ(uΩ, v) = (f, v)Ω + ⟨g, v⟩ΓN , v ∈ V (Ω),aΩ(v, w) =

∫DχΩ(∇v · ∇w + vw)dx,

(f, v)Ω =∫DχΩfvdx, ⟨g, v⟩ΓN

=∫ΓN

gvdΓ.(4)

Let V = v ∈ H1(D) | v = 0 on ΓD. Replacingthe function χΩ by a function χΩ,ω

κ in the equality (4)implies BVPΩ,ω

κ (D): Find uΩ,ωκ ∈ V such that

aΩ,ωκ (uΩ,ω

κ , v) = (f, v)Ω + κ(f, v)ω+ ⟨g, v⟩ΓN

, v ∈ V,aΩ,ωκ (v, w) =

∫Ω(∇v · ∇w + vw)dx+ κ

∫ω(∇v · ∇w + vw)dx.

(5)

For simplicity let uΩκ = uΩ,ωκ |Ω, uωκ = uΩ,ω

κ |ω and ∂Gν vbe the outer normal derivative of v on a smooth opendomain G. Further if uΩ,ω

κ is smooth enough, then uΩ,ωκ

is a unique solution of the following problem:−∆uΩ,ω

κ + uΩ,ωκ = f in Ω ∪ ω,

uΩκ = uωκ on ∂Ω ∩ ∂ω,∂Ων u

Ωκ + κ∂ων u

ωκ = 0 on ∂Ω ∩ ∂ω,

∂Ων uΩκ = g on ΓN ,

uΩκ = 0 on ΓD.

– 33 –

JSIAM Letters Vol. 4 (2012) pp.33–36 Satoshi Kaizu

We shall approximate the cost function J(Ω) by

J (χΩ,ωκ ) = (f, uΩ,ω

κ )Ω + κ(f, uΩ,ωκ )ω + ⟨g, uΩ,ω

κ ⟩ΓN. (6)

3. Admissible family

Our one aim is to prescribe U precisely and anotheraim is to assure approximation of uΩ and J(Ω) by uΩ,ω

κ

and J (χΩ,ωκ ) as κ → +0, respectively. A family U =

Lip(k, r), constructed by uniformly Lipschitz continuousdomains, due to Chenais [3], is a good admissible family.Here k > 0 and r > 0. For simplicity we define Lip(k, r)with a slight modification from the original.

Definition 1 A domain Ω (∈ OD) is admissible andbelongs to Lip(k, r), if and only if, for any x ∈ ∂Ω, thereexists a local coordinate system with a real valued func-tion φ of d− 1 variables with Lipschitz constant k suchthat B(x, r) ∩ Ω = (x, xd) ∈ Ω | xd < φ(x), wherex ∈ Rd−1, B(x, r) = y ∈ Rd | |y − x| ≤ r and | · |denotes the Euclid norm in Rd.

The advantage of using U = Lip(k, r) is that The-orems 2, 3 and 10, a target in this paper, are avail-able. We write Φ(k, r) = χΩ | Ω ∈ Lip(k, r) andΦκ(k, r) = χΩ,ω

κ | Ω ∈ Lip(k, r). For m ∈ N, the usualSobolev norm of v ∈ Hm(G) is denoted by ∥v∥m,G.

Theorem 2 (Chenais [3]) For Ω ∈ Lip(k, r) thereexists a linear continuous extension operator pΩ :Hm(Ω) ∋ v 7→ v = pΩ(v) ∈ Hm(Rd) with operatornorm ∥pΩ∥ ≤ c(k, r), where c(k, r) depends only on m,k, r and d.

Recall that we have set uΩκ = uΩ,ωκ |Ω and uωκ = uΩ,ω

κ |ω.Theorem 3 Let Ω ∈ OD and 0 < κ ≤ 1. Then wehave

∥uΩκ ∥21,Ω + κ∥uωκ∥21,ω ≤ 2(∥f∥20,D + ∥g∥2−1/2,ΓN). (7)

Let c(k, r) be a constant in Theorem 2 withm = 1 and letc1(k, r) = 1+2c(k, r)2. Further we assume Ω ∈ Lip(k, r).Then we have

∥uΩκ − uΩ∥21,Ω + κ∥uωκ∥21,ω≤ 3κ(c1(k, r)∥f∥20,D + 2c(k, r)2∥g∥2−1/2,ΓN

). (8)

Proof Let ∥v∥1,Ω,ω,κ = (∥v∥21,Ω + κ∥v∥21,ω)1/2. Thenmax|aΩ,ω

κ (uΩ,ωκ , v)| | ∥v∥1,Ω,ω,κ = 1 and v ∈ V ≤

21/2(∥f∥20,D + ∥g∥2−1/2,ΓN)1/2. Moreover the maximum

of the left-hand side attains at v = uΩ,ωκ . This shows

(7). Putting v = uΩ into (4) implies

∥uΩ∥21,Ω ≤ 2(∥f∥20,D + ∥g∥2−1/2,ΓN). (9)

We show (8). Connecting (9) with Theorem 2 with m =1 we have an extension uΩ(∈ V ) of uΩ such that

∥uΩ∥21,ω ≤ 2c(k, r)2(∥f∥20,D + ∥g∥2−1/2,ΓN). (10)

Recall that notation aG(·, ·), where G = Ω, ω. Puttingthe right-hand side of (4) into the sum of the first andthe last terms in the right-hand side of (5) implies

aΩ(uΩκ − uΩ, v) + κaω(uωκ , v) = κ(f, v)ω, v ∈ V. (11)

Putting v = uΩ,ωκ − uΩ into (11) gives

∥uΩκ − uΩ∥21,Ω + κ∥uωκ∥21,ω ≤ 3κ(∥f∥20,ω + ∥uΩ∥21,ω).(12)

The inequalities (9), (10) and (12) imply (8).(QED)

Definition 4 Let E be a bounded closed subset of Rd

and F be the totality of compact subsets of E. For F ∈ Fwe set [F ]c = x ∈ Rd | ∃y ∈ F such that |y − x| ≤ c.For Fi ∈ F , i = 1, 2, the Hausdorff metric d(F1, F2)between F1 and F2 is defined by

d(F1, F2) = infc > 0 | F1 ⊂ [F2]c, F2 ⊂ [F1]c.

We define an equivalent relation Ω1 ∼ Ω2 for Ωi ∈OD, i = 1, 2, defined by d(Ω1,Ω2) = 0. The equivalentrelation ∼ determines a metric d(·, ·) by d(Ω1,Ω2) =d(Ω1,Ω2) on OD/ ∼. Hereafter we use a notation d(Ω1,Ω2) instead of d(Ω1,Ω2), if no confusion occurs.

Theorem 5 (Blachke selection theorem [4]) Thefamily F with topology TH induced by metric d is com-pact.

Due to Chenais [3, Theorems III.1 and III.2], the setΦ(k, r) = χΩ ∈ L2(D) | Ω ∈ Lip(k, r) with the topol-ogy TC induced from L2(D) is compact. Theorem 5 isapplied to E = D. We see that the identity map I :

(Φ(k, r), TH)I7→ (Φ(k, r), TC), is continuous by the defi-

nition of the Hausdorff metric. Further, I−1 is also con-tinuous, because any closed set F in (Φ(k, r), TH) is alsoclosed in (Φ(k, r), TC). In fact, the set F of (Φ(k, r), TH)is compact in (Φ(k, r), TH) clearly. So it is compact in(Φ(k, r), TC), because I is continuous. Thus the set F isclosed in (Φ(k, r), TC). This means that the inverse I−1

is continuous. Therefore the map I is homeomorphismbetween (Φ(k, r), TH) and (Φ(k, r), TC).Theorem 6 Topologies TC and TH are compact andequivalent to each other.

4. The continuity of the cost function

and solvability of TOP(D)

We consider the convergence of Jn = J (χΩ,ωκn

) to J(Ω)as κn → +0. The estimate (8) shows uΩ,ω

κn→ uΩ strongly

in V (Ω) as κn → +0. Actually, by (8) we have

|Jn − J(Ω)|≤ (∥f∥0,Ω + ∥g∥−1/2,ΓN

)∥uΩ,ωκn− uΩ∥1,Ω

+ κ1/2n ∥f∥0,D · κ1/2n ∥uΩ,ω

κn∥1,ω → 0 as κn→+0.

Lemma 7 Let Ω ∈ Lip(k, r). And let κnn be asequence tends to 0 as n → ∞. Then two sequencesuΩ,ω

κnn and J (χΩ,ω

κn)n converge to uΩ strongly in

V (Ω) and to J(Ω), respectively, as n→∞. Both the se-quences converge to their limits uniformly over Lip(k, r)or Φ(k, r).

Lemma 8 Let κ be fixed and let Ωn,Ω ∈ OD such thatd(Ωn,Ω) → 0 as n → ∞. Then we have uΩn,ωn

κ → uΩ,ωκ

strongly in V as n→∞.

Proof Let un = uΩn,ωnκ and u = uΩ,ω

κ . Since unn isbounded in V by Theorem 3, we have a subsequence ofunn, still denoted by unn, having a weak limit u ∈V . To see u = u it suffices to show Rn → 0, where Rn isgiven by aΩ,ω

κ (un, ζ) = (f, ζ)Ω+κ(f, ζ)ω+⟨g, ζ⟩ΓN+Rn.

Here ζ denotes any smooth function of V ∩ C1(D)with notation: ∥ζ∥C0(D) = maxx∈D |ζ(x)| and ∥ζ∥C1(D)

– 34 –


= ∥ζ∥C0(D) +maxx∈D,1≤i≤d |∂ζ(x)/∂xi|. A tedious cal-culation gives

Rn = R1n +R2

n,R1

n = (aΩ\Ωn(un, ζ)− aΩn\Ω(un, ζ))+ κ(aω\ωn(un, ζ)− aωn\ω(un, ζ))

= aΩ\Ωn,ω\ωnκ (un, ζ)− aΩn\Ω,ωn\ω

κ (un, ζ),

R2n = l

Ωn\Ω,ωn\ωκ (ζ)− lΩ\Ωn,ω\ωn

κ (ζ),

lΩn\Ω,ωn\ωκ (ζ) =

∫Ωn\Ω fζdx+ κ

∫ωn\ω fζdx,

lΩ\Ωn,ω\ωnκ (ζ) =

∫Ω\Ωn

fζdx+ κ∫ω\ωn

fζdx.

Since |F | = 0 for a measurable set F ⊂ Rd such that(F )o = ∅, we see that |∂G| = 0, where ∂G = G \ G foran open subset G in Rd. We have ωn \ ω = Ω \ Ωn a.e. in D,

ω \ ωn = Ωn \ Ω a.e. in D,ω ⊖ ωn = Ω⊖ Ωn a.e. in D.

(13)

Let ∥v∥1,Ω\Ωn,ω\ωn,κ = (∥v∥21,Ω\Ωn+ κ∥v∥21,ω\ωn

)1/2 for

v ∈ V . Let c2(f, g)2 be a constant described as the right-

hand side of the inequality (7). Applying the Schwarz

inequality to aΩ\Ωn,ω\ωnκ (un, ζ), we have

|aΩ\Ωn,ω\ωnκ (un, ζ)(un, ζ)|≤ ∥un∥1,Ω\Ωn,ω\ωn,κ∥ζ∥1,Ω\Ωn,ω\ωn,κ

≤ c2(f, g)∥ζ∥C1(D)

√|Ω \ Ωn|.

All the estimates of remaining terms in R1n and R2

n havethe same upper bounds as above deriving by similar con-sidering. Thus we have

|Rn| ≤ 4c2(f, g)∥ζ∥C1(D)

√|Ω⊖ Ωn|.

Let δn = d(Ωn,Ω), then the definition of the Hausdorffmetric implies Ωn\Ω ⊂ [Ω]δn\Ω and Ω\Ωn ⊂ [Ωn]δn\Ωn.The assumption says δn → 0. Thus we have |Ωn⊖Ω| → 0as n → ∞. So we see Rn → 0 as n → ∞. Thus the fullsequence uΩn,ωn

κ n converges to u weakly in V .Next we show un → u strongly in V . For the aim we

notice that the bilinear form aΩ,ωκ (·, ·) is an inner prod-

uct equivalent to the usual one, because κ is a positiveconstant. So the value (aΩ,ω

κ (v, v))1/2 could play as anorm on V . Since V is a Hilbert space, so if we can showthat

limn→∞ aΩ,ωκ (un, un) = aΩ,ω

κ (u, u), (14)

then un converges to u strongly in V . Actually it is true.In fact, let lΩ,ω

κ (v) = lΩ,ωκ (v) + ⟨g, v⟩ΓN

. Then lΩ,ωκ (v)

belongs to the dual space V ′ of V and we notice that unconverges to u weakly in V , thus we have

limn→∞ lΩ,ωκ (un) = lΩ,ω

κ (u). (15)

Beside we see limn→∞ aΩ,ωκ (un, un) = limn→∞ lΩ,ω

κ (un)and lΩ,ω

κ (u) = aΩ,ωκ (u, u). The equality (14) holds true.

(QED)

Lemma 9 Let Ωn,Ω ∈ Lip(k, r) such that d(Ωn,Ω)→0 as n→∞. Then we have J(Ωn)→ J(Ω) as n→∞.

Proof Since we have J(Ωn) − J(Ω) = (J(Ωn) −J (χΩn,ωn

κ )) + (J (χΩn,ωnκ ) − J (χΩ,ω

κ )) + (J (χΩ,ωκ ) −

J(Ω)) = j1(n, κ) + j2(n, κ) + j3(κ), for any ϵ > 0 itsuffices to show the existence of nϵ and κϵ such that, for

all n(≥ nϵ), we have

max|j1(n, κϵ)|, |j3(n, κϵ)| ≤ϵ

3, (16)

|j2(n, κϵ)| ≤ϵ

3. (17)

Let 3κc3(f, g)2 be a constant described at the right-

hand side of (8). Then, applying the Schwarz inequalityand the definition of ∥ · ∥−1/2,ΓN

to j1(n, κ) and j3(κ),we have

max|j1(n, κ)|, |j3(κ)|≤√3κc3(f, g)[(1 +

√κ)∥f∥0,D + ∥g∥−1/2,ΓN

].

Thus there exists κϵ(> 0) satisfying (16), independentof n ∈ N. Finally we have nϵ ∈ N such that (17) holdsby Lemma 8 with κ = κϵ. The proof is completed.

(QED)

Although the problem TOP(D) is known well, wedon’t assure the existence of solutions to the problemtogether with its uniqueness generally.

Theorem 10 For U = Lip(k, r) there exists a solutionΩ∗ ∈ Lip(k, r) of TOP(D) (3).

Proof First, we see infΩ∈Lip(k,r) J(Ω) ≥ 0, becausewe see J(Ω) = ∥uΩ∥21,Ω. So there exists a minimiz-ing sequence of J(Ωn)n,Ωn ∈ Lip(k, r). ApplyingTheorem 6 together with Lemma 9 to J(Ωn)n, wehave the assertion. Here we notice J(Ω∗) > 0 provided∥f∥0,D + ∥g∥−1/2,ΓN

> 0.(QED)

5. Approximation of TOP(D) by simple

functions

We consider the minimizing problem Pκ(D): Findϕ∗ ∈ Φκ such that

J (ϕ∗κ) = infϕ∈Φκ J (ϕ). (18)

We show that the existence of a solution of Pκ(D)and that the problem TOP(D) is approximated by theproblems Pκn(D) as κn → 0 provided that Φκ is re-placed by Φκ(k, r). Now, we denote a topology on theset Φκ induced through L2(D) by T Φκ

L2 .

Lemma 11 Let ϕn = χΩn,ωnκ , ϕ = χΩ,ω

κ ∈ Φκ forn ∈ N. Then ϕn → ϕ as n→∞ with respect to T Φκ

L2 , ifand only if Ωn → Ω as n→∞ with respect to TH . ThusΦκ is compact with respect to T Φκ

L2 .

Proof The relation (13) implies |ϕn − ϕ| = (1 −κ)(|χΩn\Ω|+ |χΩ\Ωn |) a.e. in D. So ∥ϕn−ϕ∥2L2(D) = (1−κ)2|Ωn ⊖ Ω|. Since d(Ωn,Ω) → 0 implies |Ωn ⊖ Ω| → 0,then ∥ϕn − ϕ∥2L2(D) → 0.

Next we show the inverse assertion. Assume ∥ϕn −ϕ∥2L2(D) → 0 doesn’t imply d(Ωn,Ω)→ 0. Then, by The-

orem 5 with E = D there exists a subsequence Ωnmmof Ωnn, where d(Ωnm , Ω) → 0 for some Ω ∈ OD,

Ω = Ω. Let ϕ = χΩ,ωκ . So ∥ϕnm − ϕ∥L2(D) → 0. It con-

tradicts to ϕ = ϕ and limm→∞ϕnm = ϕ.(QED)

It is to be noticed that the equivalence between THand T Φκ

L2 restricted to Φ(k, r) is shown already by The-orem 6.

– 35 –


Theorem 12 Let κ be fixed. Then the minimizationproblem Pκ(D) (18) admits a solution ϕ∗ ∈ Φκ.

Proof Lemmas 8 and 11 imply the existence of a so-lution ϕ∗ ∈ Φκ of the problem Pκ(D).

(QED)

Lemma 13 Let ϕn = χΩn,ωnκn

∈ Φκn(k, r), ϕ = χΩ ∈Φ(k, r) and let un = uΩn,ωn

κn∈ V . We assume that κn →

0 and d(Ωn,Ω) → 0. Then unn weakly converges tou ∈ V , where u|Ω and u|ω, ω = D \ Ω are given by (4)and the equalities below, respectively.

u|ω = U |ω, (19)

where U ∈ V is determined by aD(U, v) = (f, v)ω, v ∈ V .Further we have

limn→∞ J (ϕn) = J(Ω). (20)

Proof Recall that c2(f, g)2 and 3κc3(f, g)

2 be con-stants used in the right-hand sides of (7) and (8) (cf.the proofs of Lemmas 8 and 9), respectively. The esti-mate (7) shows that c2(f, g) denotes an upper bound of∥un∥1,Ωnn. After applying (8) to Ωn, ωn, devidingthe both hands side of (8) by κn, we see

∥un−uΩn∥21,Ωn

κn+ ∥un∥21,ωn

≤ 3c3(f, g)2. (21)

So 31/2c3(f, g) denotes an upper bound of ∥un∥1,ωnn.Thus unn is bounded in V . We have a weakly conver-gent subsequence, still denoted by unn, with its weaklimit u. We show u|Ω = uΩ. For this aim it suffices toshow Rn → 0 as n → ∞, where Rn denotes a constantgiven by below.

aΩ(un, ζ) = (f, ζ)Ω + ⟨g, ζ⟩ΓN+Rn,

Rn = R1n +R2

n,R1

n = aΩ\Ωn(un, ζ)− aΩn\Ω(un, ζ)− κnaωn(un, ζ),R2

n = (f, ζ)Ωn\Ω − (f, ζ)Ω\Ωn+ κn(f, ζ)ωn ,

where ζ denotes a function belonging to V ∩ C1(D). Itis shown similarly as in the proof of Lemma 8. Thus,u|Ω satisfies (4). Now we show (19). We consider spacesVn = v ∈ L2(D) | v|Ωn ∈ V (Ωn), v|ωn ∈ H1(ωn), n ∈N and orthogonal projections pnv = v from Vn onto thespace V defined by aD(v, w) = aΩn(v, w)+aωn(v, w) forall w ∈ V . Applying (11) to v = ζ ∈ V ∩C1(D), we have

aΩn(un−uΩn

κn, ζ) + aωn(un, ζ)

= aΩn(pn(un−uΩn

κn), ζ) + aωn(pn(un), ζ)

= (f, ζ)ωn . (22)

Because of (22) it suffices to Rn → 0, where Rn is givenby

aω(un, ζ) + aΩ(U, ζ)− (f, ζ)ω = Rn,Rn = R1

n +R2n +R3

n +R4n,

R1n = aω\ωn(un, ζ)− aωn\ω(un, ζ),

R2n = −aΩ\Ωn(U, ζ)− aΩn\Ω(U, ζ),

R3n = −aΩn(U − pn(un−uΩn

κn)), ζ),

R4n = −(f, ζ)ωn\ω + (f, ζ)ω\ωn

.

It is shown also similarly as in the proof of Lemma 8together with a fact such that pn(zn) weakly convergesto U , where zn(x) = (un − uΩn)/κn for x ∈ Ωn andzn(x) = un(x) for x ∈ ωn. Finally we shall show (20).

First we see

J (ϕn)− J(Ω)= (f, un)Ωn\Ω − (f, u)Ω\Ωn

+ (f, un − u)Ωn∩Ω

+ κn(f, un)ωn + ⟨g, un − u⟩ΓN. (23)

The last two terms of the right hand side of (23) vanishrespectively, because of the weak convergence of un tou in V , g ∈ V ′ and (8). For the third term we have|(f, un − u)Ωn∩Ω| ≤ ∥f∥0,D∥un − u∥0,D → 0, becauseof the Rellich theorem. The Sobolev imbedding theoremimplies that ∥v∥L2(G) ≤ c∥v∥V |G|1/3, where G and c

denote an open subset of Rd and a constant independentof v ∈ V , respectively. Thus, we have

|(f, un)Ωn\Ω − (f, u)Ω\Ωn|

≤ ∥f∥0,D(∥un∥0,Ωn\Ω + ∥u∥0,Ω\Ωn)

≤ c∥f∥0,D max∥un∥V , ∥u∥V |Ωn ⊖ Ω|1/3.

The estimate (7) and d(Ωn,Ω)→ 0 imply that the sumof the first two terms goes to zero.

(QED)

The uniqueness of solutions of TOP(D) is not knownin general, we have to rely on subsequences of a mini-mizing sequence for cost as follows.

Theorem 14 Let κnn be a sequence decreasing tozero. We assume that ϕn = χΩn,ωn

κn∈ Φκn(k, r) satisfies

J (ϕn) = infϕ∈Φκn (k,r) J (ϕ), where un and J (ϕn) aregiven by (5) and (6) with ϕ = ϕn, respectively. Then wehave lim infn→∞ J (ϕn)= infΩ∈Lip(k,r) J(Ω). Moreoverwe have Ω∗ ∈ Lip(k, r) and a subsequence ϕnmm ofϕnn such that lim infn→∞ J (ϕn)= limm→∞ J (ϕnm)and d(Ωnm ,Ω

∗)→ 0, where Ω∗ is a solution of the prob-lem TOP(D).

References

[1] H. Azegami, S. Kaizu and K. Takeuchi, Regular solution totopology optimization problems of continua, JSIAM Letters,3 (2011), 1–4.

[2] M.P.Bendsøe and O.Sigmund, Toplology Optimization: The-ory, Methods and Applications, Springer-Verlag, Berlin, 2003.

[3] D. Chenais, On the existence of a solution in a domain iden-tification problem, J. Math. Anal. Appl., 52 (1975), 189–219.

[4] K. J. Falconer, The Geometry of Fractal Sets, CambridgeUniv. Press, Cambridge, 1985.

– 36 –


An exhaustive search method to find all small solutions of

a multivariate modular linear equation

Hui Zhang1 and Tsuyoshi Takagi2

1 Graduate School of Mathematics, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka 819-0395, Japan

2 Institute of Mathematics for Industry, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka819-0395, Japan

E-mail h-zhang math.kyushu-u.ac.jp

Received April 2, 2012, Accepted May 30, 2012

Abstract

We present an exhaustive search method to find all small solutions of a multivariate modularlinear equation over the integers on the basis of lattice enumeration technique. Previous meth-ods become ineffective when the bound in the definition of small solutions becomes large. Ouralgorithm can find all the solutions in a given bound; therefore, it can cope with problemswith large bounds. We demonstrate the superiority of our algorithm by applying it to theattack on the RSA-CRT with small secret exponent.

Keywords multivariate modular linear equation, lattice enumeration, cryptanalysis on RSA

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

In this paper, we shall focus on the problem of findinginteger solutions to the modular linear equation

a1x1 + a2x2 + · · ·+ anxn = c mod N, (1)

where N is a known integer modulus; a1, a2, . . . , an, care known integers in [0, N − 1], and x1, x2, . . . , xnare unknowns. Note that such an equation usually hasmany integer solutions. However, we are interested inthe solutions in the bounded region BR, where BR =(x1, x2, . . . , xn) ∈ Rn : |xi| ≤ Bi for i = 1, . . . , n, fora bound B = (B1, . . . , Bn) ∈ Zn. We call such solu-tions small solutions. The problem of finding a certainsmall solution of (1) is crucial in many cryptographicapplications. For example, it is a key component in thesecurity proofs of the RSA-OAEP encryption scheme [1]and the number field sieve [2]. In addition, it lies at theheart of many cryptanalytic results, such as attacks onRSA with small secret CRT-exponents [3], RSA signa-tures with redundancy [4], and the ElGamal signaturein the GNU Privacy Guard (GPG) system [5], etc.In the previous work [1, 4, 5], this problem is treated

as the closest vector problem in a lattice defined by thelinear homogenous equation of (1), and a small solu-tion can be probably found by using the Babai’s near-est plane algorithm (NPA) [6]. However, the NPA-basedmethod does not guarantee that a small solution canalways be found. In addition, the NPA-based methodcan only output at most one small solution, thus itbecomes ineffective when it is used to find a specificsmall solution in a large bounded region, as the num-ber of small solutions increases with the size of thebounded region. Kurosawa et al. [7] proposed a methodbased on the Euclidian algorithm which can find all the

small solutions of a bivariate modular equation. How-ever, their method only works for bivariate equations,namely n = 2, and the bounded region is also limited to(x1, x2) ∈ R2 : 0 < x1 ≤ N1/2, 0 < x2 ≤ N1/2.In this paper, we propose an exhaustive search method

which can find all the small solutions to equations withn (n ≥ 2) unknowns. It is a development of the NPA-based method. In our method, we exhaustively searchall the possible solutions in an n-dimensional rectangularparallelepiped which covers the bounded region by usingKannan’s enumeration strategy [8]. In above mentionedcryptographic applications, n is small (n ≤ 5), while themodular N is quite large (N is an integer of hundreds orthousands of bits length). Our exhaustive search takesexponential time in n and polynomial time in logN when(B2

1 +B22 + · · ·+B2

n)n/2 ≤ O(logN)N .

2. Preliminaries and previous methods

2.1 Some basic knowledge on lattice

First, some descriptions of the notations are needed.Vectors are written in bold type according to the follow-ing conventions:

u = (u1, u2, . . . , un), ek = (ek,1, ek,2, . . . , ek,n).

All the lattices we consider are full rank integer lattices.Let ∥u∥ be the Euclidean norm of vector u. The notation⌈x⌉ (⌊x⌋) denotes the nearest integer greater (smaller)than or equals to x, respectively.Next, we recall some basic notions and theorems from

the algorithmic geometry of numbers.Let v1,v2, . . . ,vn be a set of linearly independent

vectors for vi ∈ Rn, 1 ≤ i ≤ n. The lattice L gener-ated by v1,v2, . . . ,vn is the set of linear combinations

– 37 –

JSIAM Letters Vol. 4 (2012) pp.37–40 Hui Zhang et al.

of v1,v2, . . . ,vn with coefficients in Z:

L = a1v1 + a2v2 + · · ·+ anvn : a1, a2, . . . , an ∈ Z.

Any set of independent vectors that generates L is calleda basis for L. We write basis vectors for L as the rows ofan n-by-nmatrixM . The determinant of a lattice det(L)is by definition the absolute value of the determinant ofany lattice basis which corresponds to the volume of theparallelepiped spanned by the basis vectors. A latticehas many bases. The goal of lattice reduction is to findbases consisting of reasonably short and almost orthogo-nal vectors. LLL [9] is one of the most famous reductionalgorithms.CVP is the problem of finding the closest lattice vector

to a given space vector. Babai’s nearest plane algorithm(NPA) [6] is an approximation algorithm for solving theCVP. Babai also gave the following theorem:

Theorem 1 Suppose L is an n-dimensional lattice. Letu ∈ L be the nearest lattice point of the target t, wheret ∈ Rn. The nearest plane algorithm finds a lattice pointv such that ∥t− v∥ ≤ 2n/2∥t− u∥.

2.2 Previous methods

The NPA-based method can be sketched as follows: Itis easy to figure out an arbitrary integer solution s of(1). Let Lhomo be the lattice which is composed of allthe integer solutions of the homogeneous linear equationa1x1 + · · ·+ anxn = 0 mod N . Then, the NPA can findu′,u′ ∈ Lhomo, which is a ‘closest’ vector to s. Notethat s − u′ is a solution of (1). Since u′ is close to s,x′ = s− u′ must be a fairly small solution of (1).

Theorem 2 The upper bound on the size of the so-lution found by the NPA-based method is Πn

i=1|x′i| ≤(2(n/2)(n−logn)/Cn)N , where Cn = πn/2/Γ(n/2 + 1) isthe proportional constant in the volume calculating for-mula of the n-dimensional sphere.

Proof If we assume gcd(a1, . . . , an, N) = 1, then byNguyen’s work [5], we get det(Lhomo) = N . Suppose Sis the set of integer solutions of (1) and r is an arbitraryinteger solution of (1), then we have S = Lhomo + r.Let SN be an n-dimension sphere with center 0 andvolume N , then there must be one solution s ∈ S inSN . The radius of SN , denoted as R, satisfies Rn =N/Cn. By Theorem 1, we get ∥x′∥ ≤ 2n/2∥s∥ ≤ 2n/2R.Therefore, Πn

i=1|x′i| ≤ (∥x′∥/n1/2)n ≤ (2n/2R/n1/2)n ≤(2(n/2)(n−logn)/Cn)N .

(QED)

Theorem 2 shows that the NPA-based method al-ways outputs a solution x′ which is relatively small.However, as the NPA is an approximate algorithm,when the bounded region is small, i.e. Πn

i=1 Bi ≤(2(n/2)(n−logn)/Cn)N , the NPA-based method can notguarantee that the output is always a small solution; onthe other hand, when the bounded region is large, theremust be some small solutions that can never be foundby the NPA-based method. Moreover, the NPA-basedmethod can only output one solution which is unrelatedto the choice of the arbitrary solution s. Thus, it is in-effective to find a specific small solution when the smallsolution is not unique. However, finding a specific small

−14 −12 −10 −8 −6 −4 −2 0 2 40

10

20

30

40

50

60

70

80

90

100

r

Su

cces

s R

ate

(%)

n=2

n=3

n=4

n=5

Fig. 1. Success rates of the NPA-based method to find a specificsmall solution (2 ≤ n ≤ 5).

solution is precisely expected in some cryptanalysis ap-plications.We implement the NPA-based method and test its

performance. The bound is set as Πni=1Bi = 2rN where r

is a parameter to control the size of the bounded region.The equation a1x1+ · · ·+anxn = c mod N is generatedin the following way: The module N is a randomly cho-sen 1024-bit integer, and the coefficients ai (i = 1, . . . , n)are integers randomly chosen from [1, N − 1]. The val-ues Bi (i = 1, . . . , n) are set to be equal in the ex-periment. Thus, for fixed n and r, Bi are determinedonce N is chosen. We randomly generate an integer so-lution xi ∈ [0, Bi] and compute the constant term c asc = a1x1 + a2x2 + · · · + anxn mod N . In our experi-ment, we suppose that (x1, . . . , xn) is the specific smallsolution. If the program outputs (x1, . . . , xn), we say itsucceeds in finding a specific small solution. We ran-domly generate 1000 equations for each n and r, and getthe success rates, shown in Fig. 1. We only investigated2 ≤ n ≤ 5 which are the most usual cases encounteredin applications. In our experiment the success rates keep100% when r ≤ −12 and they decrease as r increases.When r ≥ 4, the success rates fall to 0.

3. Our exhaustive search method

In this section, we present our exhaustive searchmethod to find all the small solutions of (1). Our methoddevelops the NPA-based method by adding an exhaus-tive search module. The basic idea is to completelyenumerate the solutions of (1) in a rectangular paral-lelepiped which covers the bounded region, then pick outall the small solutions. We adopt Kannan’s enumerationstrategy in this procedure.The NPA-based method can always output a rela-

tively small solution, supposing x′. All the solutions inthe bounded region can be found just by tiling the funda-mental domains of Lhomo around x′ until the boundedregion is completely covered. Note that the choice ofthe basis affects the number of fundamental domains in-volved, and we use the LLL-reduced one, denoted byu1, . . . ,un. Accordingly, the solutions of (1) can beexpressed as x = x′+

∑ni=1 αiui, where α1, . . . , αn ∈ Z.

The solutions that lie in the bounded region must have

– 38 –


length shorter than B = (∑n

i=1B2i )

1/2, and we can findall the possible coefficient vectors (α1, . . . , αn) whichmake ∥x∥ ≤ B hold in the following way:Let u∗

1, . . . ,u∗n be the Gram-Schmidt orthogonal-

ization of u1, . . . ,un. If we give an index i ≤ n andvalues αi+1, . . . , αn ∈ Z, then we want to find a range ofpossible values of αi. We write x in the form

x =∑i−1

j=1 αjuj + αiui + x′′,

x′′ =∑n

j=i+1 αjuj + x′.

The component of x in the direction of u∗i is

αiu∗i + tu∗

i (t ∈ Q),

where tu∗i is the component of x′′ in the direction of u∗

i .Then we have

∥x∥ ≤ B ⇒ ∥αiu∗i + tu∗

i ∥ ≤ B ⇒ |αi + t| ≤ B∥u∗

i ∥⇒ −t− B

∥u∗i ∥≤ αi ≤ −t+ B

∥u∗i ∥.

In this way, we can recursively get all the possible coef-ficient vectors (α1, α2, . . . , αn).We shall give a self-contained description of our ex-

haustive search method in the form of an algorithm. Itis intended to be sufficiently detailed to allow a straight-forward implementation.

Algorithm Exhaustive Search

Input: Number of unknowns n ∈ Z+, (Z+ is the set ofall the positive integers), modulus N ∈ Z+, coefficientsai ∈ Z/NZ, constant item c ∈ Z/NZ and bound Bi ∈Z+, i = 1, 2, . . . , n.

Output: Set of all the small solutions Ssmall: Ssmall =(x1, x2, . . . , xn) ∈ Zn : a1x1 + a2x2 + · · · + anxn = cmod N, |xi| ≤ Bi.1. Find n independent vectors g1, . . . , gn which satisfya1gi1 + a2gi2 + · · ·+ angin = 0 mod N .

2. M ′ ←

g11 g12 · · · g1ng21 g22 · · · g2n...

.... . .

...gn1 gn2 · · · gnn

.

3. k ← gcd(a1, a2, . . . , an, N),check if the determinant of M ′ equals to N/k.If det(M ′) = N/k, then redo steps 1–3.

4. M ← LLL(M ′).5. If c = 0 then6. si ← random integer in [1, N), for i = 1, . . . , n− 1;sn ← a−1

n (c− a1s1 + · · ·+ an−1sn−1) mod N .7. u′ ← NPA (M, s).8. x′ ← s− u′.9. else10. x′ = 011. end if.12. u∗

1, . . . ,u∗n ← Gram-Schmidt(u1, . . . ,un).

13. B ← (∑n

i=1B2i )

1/2, x′′ ← x′;14. t← (u∗

n,x′′)/(u∗

n,u∗n);

uppern ← ⌈−t+B/∥u∗n∥⌉;

lowern ← ⌊−t−B/∥u∗n∥⌋;

15. For αn = lowern to uppern Do:x′′ ← x′ + αnun;t← (u∗

n−1,x′′)/(u∗

n−1,u∗n−1);

uppern−1 ← ⌈−t+B/∥u∗n−1∥⌉;

lowern−1 ← ⌊−t−B/∥u∗n−1∥⌋;

For αn−1 = lowern−1 to uppern−1 Do:· · · · · · · · · · · · · · · · · · · · · · · · · · ·(ommit the middle terms)· · · · · · · · · · · · · · · · · · · · · · · · · · ·For α2 = lower2 to upper2 Do:

x′′ ← x′ +∑n

i=2 αiui;t← (u∗

1,x′′)/(u∗

1,u∗1);

upper1 ← ⌈−t+B/∥u∗1∥⌉;

lower1 ← ⌊−t−B/∥u∗1∥⌋;

For α1 = lower1 to upper1 Do:x′′ ← x′ +

∑ni=1 αiui;

If |x′′i | ≤ Bi for all i, append x′′ to Ssmall

end if.16. Return Ssmall

Steps 1–3 try to find a basis of Lhomo. Accordingto Nguyen’s work [5], det(Lhomo) = N/k. We check ifdet(M ′) equals to N/k in step 3. If not, then that meansthe vectors chosen in step 1 do not constitute a basis ofLhomo and we should redo step 1. Step 4 reduces the ba-sis by using the LLL algorithm. Step 5 tests if c is equalto 0. If c = 0, then step 6 finds an arbitrary solution s of(1). Step 7 finds the ‘closest’ lattice vector u′ by usingthe nearest plane algorithm and step 8 gets a relativelysmall solution x′. If c = 0, we can simply let x′ = 0.Steps 12–16 are the exhaustive search procedures.Note that we conduct the exhaustive search based on

the result of the NPA-based method. It makes the co-efficients αi small. As the elements of ui are of similarmagnitude with large modulus N , small αi can speedup the exhaustive search, especially for the case of largebounded region.

4. Complexity of the algorithm

Our algorithm consists of two parts: i) Steps 1–11:compute a relatively small solution x′; ii) Steps 12–16:exhaustively search all the small solutions. The maincomponents of the first part are the LLL reductionalgorithm and the nearest plane algorithm. Accordingto the literature [6, 9], the complexity of this part isO(n5(logN)2). The complexity of the second part de-pends strongly on the distribution of the coefficients in(1). More precisely, the first minimum of the lattice canbe arbitrary small, leading to weaker bounds on the com-plexity. We give a result of an average case by assumingthat the bounded region is large enough to contain atleast one fundamental domain of Lhomo.We use the number of the solutions Num that should

be checked to measure the complexity of the exhaustivesearch. It is clear that

Num =∏n

i=1

(1 + 2 B

∥u∗i ∥

)=∏n

i=1

(∥u∗

i ∥+2B∥u∗

i ∥

).

According to the assumption, we have ∥u∗i ∥ < B. In

addition∏n

i=1 ∥u∗i ∥ = det(Lhomo), therefore

Num <∏n

i=1

(3B

∥u∗i ∥

)= (3B)n

det(Lhomo).

If we assume that gcd(a1, . . . , an, N) = 1, then wehave Num < 3nBn/N and the complexity in terms

– 39 –


of the required number of arithmetic operations isO(3n(Bn/N)(logN)2). Therefore, for fixed n, if

Bn =(B2

1 +B22 + · · ·+B2

n

)n/2 ∼ O(logN) ·N,then the complexity will be polynomial in logN .

5. Application: attack the RSA-CRT

We shall demonstrate the superiority of our algorithmby applying it to the attack on RSA scheme with smallCRT-exponents.Let N = p1p2 be an RSA modulus. The public ex-

ponent e and the secret exponent d satisfy the equationed = 1 mod ϕ(N), where ϕ(N) = (p1 − 1)(p2 − 1) isEuler’s totient function. The small CRT-exponents, i.e.,exponent d such that d1 = d mod (p1 − 1) and d2 = dmod (p2−1) are both small, enable us to efficiently raiseto the dth power modulo p1 and p2, respectively; the re-sults can then be combined using the Chinese RemainderTheorem(CRT), yielding a solution modulo N . However,there is a lattice-based attack [3] on RSA with smallCRT-exponents as follows:We assume that e < ϕ(N) and 1/2 < p1/p2 < 2,

i.e. p1 and p2 have about the same size. Equationsed1 = 1 mod (p1−1) and ed2 = 1 mod (p2−1) can berewritten as

ed1 = 1 + k1(p1 − 1), ed2 = 1 + k2(p2 − 1), (2)

where k1 and k2 are positive integers. Multiplying thesetwo equations gives

(ed1 + k1 − 1)(ed2 + k2 − 1) = k1k2N. (3)

Eq. (3) can be linearized as

ex+ (1−N)y + z + e2w = −1 (4)

with unknowns x = d1(k2 − 1) + d2(k1 − 1), y = k1k2,z = k1 + k2, w = d1d2.Notice that, from y and z, one can recover the un-

knowns k1 and k2 and then recover the unknown pa-rameters d1 and d2 from x and w. After that, we canfigure out p1 and p2 from (2).Bleichenbacher and May [3] attacked one of the pa-

rameters suggested by Galbraith et al. [10] and wecall this attack BM’s attack. Denote the bit length ofN, e,maxdi by nN , ne, nd, then in the case of nN =1024 and ne = 512, the success rate of BM’s attackreaches 100% when nd ≤ 200. However, when nd ≤ 204the success rate falls to about 90% and the it keepsfalling as nd increases.In our experiment, we regarded (4) as a modular equa-

tion ex+(1−N)y+z = −1 mod e2 which can be solvedby using our algorithm. We compared our results withBleichenbacher and May’s in the case of nN = 1024 andne = 512 in the following table. The success rates ofBleichenbacher and May’s work which fall to 0 whennd ≤ 208. Our algorithm, by contrast, ensures the suc-cess rate to be 100% even for a larger nd which pro-vides an efficient approach to attack this kind of schemefor large nd. The experiment results can provide refer-ence for setting the security parameters of RSA-CRTschemes.

Table 1. Comparison on attacking the RSA-CRT.

nd BM’s SR Our SR Our RT

≤ 200 bits 100% 100% 0.070s

≤ 202 bits 91.1% 100% 0.073s

≤ 204 bits 68.3% 100% 0.405s

≤ 206 bits 11.7% 100% 5.253s

≤ 208 bits 0% 100% 21.278s

nd: bit length of the secret keysSR: success rateRT: running time

6. Conclusions

We described an exhaustive search method for find-ing all the small solutions of a multivariate modular lin-ear equation. Compared with the previous methods, ourmethod can solve the problem in some cryptographic ap-plications with a larger bound. The experiment resultsshow that it successfully improves the attack on RSA-CRT with small secrete exponents.

References

[1] E. Fujisaki, T. Okamoto, D. Pointcheval and J. Stern, RSA-

OAEP is secure under the RSA assumption, in: Proc. ofCRYPTO 2001, J. Kilian ed., LNCS, Vol. 2139, pp. 260–274,Springer-Verlag, Berlin, 2001.

[2] A. Joux and R. Lercier, Improvements to the general number

field sieve for discrete logarithms in prime fields. A compar-ison with the gaussian integer method, Math. Comput., 72(2003), 953–967.

[3] D. Bleichenbacher and A. May, New attacks on RSA with

small secret CRT-exponents, in: Proc. of PKC 2006, M. Yunget al. eds., LNCS, Vol. 3958, pp. 1–13, Springer-Verlag, Berlin,2006.

[4] J. Misarsky, A multiplicative attack using LLL algorithm on

RSA signatures with redundancy, in: Proc. of CRYPTO 1997,B. S. Kaliski ed., LNCS, Vol. 1294, pp. 221–234, Springer-Verlag, Berlin, 1997.

[5] P. Q. Nguyen, Can we trust cryptographic software? Crypto-

graphic flaws in GNU Privacy Guard v1.2.3, in: Proc. of EU-ROCRYPT 2004, C. Cachin and J. Camenisch eds., LNCS,Vol. 3027, pp. 555–570, Springer-Verlag, Berlin, 2004.

[6] L. Babai, On Lovasz lattice reduction and the nearest latticepoint problem, Combinatorica, 6 (1986), 1–13.

[7] K. Kurosawa, K. Schmidt-Samoa and T. Takagi, A completeand explicit security reduction algorithm for RSA-based cryp-

tosystems, in: Proc. of ASIACRYPT 2003, C. S. Laih ed.,LNCS, Vol. 2894, pp. 474–491, Springer, Berlin, 2003.

[8] R. Kannan, Improved algorithms for integer programmingand related lattice problems, in: Proc. of STOC 1983, D. S.

Johnson et al. eds., pp. 193–206, ACM, New York, 1983.[9] A. K. Lenstra, H. W. Lenstra and L. Lovasz, Factoring poly-

nomials with rational coefficients, Math. Ann., 261 (1982),515–534.

[10] S. D. Galbraith, C. Heneghan and J. F. McKee, Tunable bal-ancing of RSA, in: Proc. of ACISP 2005, C. Boyd and J. M.Gonzalez Nieto eds, LNCS, Vol. 3574, pp. 280–292, Springer-

Verlag, Berlin, 2005.

– 40 –


A parameter optimization technique

for a weighted Jacobi-type preconditioner

Akira Imakura1, Tetsuya Sakurai1, Kohsuke Sumiyoshi2 and Hideo Matsufuru3

1 University of Tsukuba, 1-1-1, Tennodai Tsukuba, Ibaraki 305-8577, Japan2 Numazu College of Technology, Ooka 3600, Numazu, Shizuoka 410-8501, Japan3 High Energy Accelerator Research Organization, 1-1 Oho, Tsukuba, Ibaraki 305-0801, Japan

E-mail imakura ccs.tsukuba.ac.jp

Received May 9, 2012, Accepted June 7, 2012

Abstract

The Jacobi preconditioner is well known as a preconditioner with high parallel efficiency tosolve very large linear systems. However, the Jacobi preconditioner does not always showthe great improvement of the convergence rate, because of the poor convergence property ofthe Jacobi method. In this paper, in order to improve the quality of the Jacobi precondi-tioner without loss its parallel efficiency, we introduce a weighted Jacobi-type preconditioner,and propose an optimization technique for the weight parameter. The numerical experimentsindicate that the proposed preconditioner has higher quality and is more efficient than thetraditional Jacobi preconditioner.

Keywords large linear systems, a weighted Jacobi-type preconditioner, parameter optimiza-tion, highly parallel computation


1. Introduction

In this paper, we consider the preconditioning tech-niques for the Krylov subspace methods to solve the verylarge, but sparse, linear systems of the form:

Ax = b, A ∈ Rn×n, x, b ∈ Rn, (1)

where the coefficient matrix A is assumed to be non-symmetric and nonsingular. These linear systems oftenarise from the discretization of the partial differentialequations in the fields of the computational science andengineering.Recent large scale simulations require to solve very

large linear systems (1), and these are often one of themost time-consuming parts of the simulations. In thiscase, high parallel efficiency is recognized as extremelyimportant more than high speed and/or high accuracyfor the Krylov subspace methods and also precondition-ing techniques.The Jacobi preconditioner, which is the variable-type

preconditioning technique using some iterations of theJacobi method, especially shows the very high parallelefficiency, because the Jacobi method does not have anysequential operations like forward and/or backward sub-stitution in each iteration. The Jacobi preconditioneralso has some advantages for solving very large linearsystems (1) that it is not required to construct any pre-conditioning matrices and it is available even if the coef-ficient matrix is used only as the matrix-vector formula.However, since the Jacobi method has the strict con-

vergence condition and the poor convergence property,the Jacobi preconditioner does not always show the greatimprovement of the convergence rate. In this paper, inorder to improve the quality of the Jacobi precondi-

tioner without loss of its parallel efficiency, we introducea weighted Jacobi-type preconditioner, and propose anoptimization technique for the weight parameter of theweighted Jacobi-type preconditioner.This paper is organized as follows. In the next sec-

tion, we briefly describe the Jacobi preconditioner andintroduce a weighted Jacobi-type preconditioner. In Sec-tion 3, we propose a parameter optimization techniquefor the weighted Jacobi-type preconditioner. Then, wetest the performance of the proposed preconditionerfrom some numerical experiments in Section 4 and fi-nally we make some conclusions in Section 5.

2. The Jacobi preconditioner and a

weighted Jacobi-type preconditioner

Preconditioning techniques play a very important rolein improving the convergence rate of the Krylov sub-space methods. They transform the linear systems (1)into more suitable systems for the Krylov subspacemethods, i.e.,

K−11 AK−1

2 y = K−11 b, x = K−1

2 y,

where K = K1K2 is called as the preconditioning ma-trix, and is generally required to K−1

1 AK−12 ≈ I in some

sense. For details we refer to [1] and references therein.The incomplete factorization-type preconditioners,

typified by the ILU(0) preconditioner, construct the pre-conditioning matrix K ≈ A, such that the systemsKz = w appeared in each iterations of the Krylov sub-space methods should be easy to solve, e.g., by using theincomplete LU decompositionOn the other hand, the variable-type preconditioners

– 41 –

JSIAM Letters Vol. 4 (2012) pp.41–44 Akira Imakura et al.

roughly solve the system

Az = w (2)

by an iterative method to obtain the approximation ofA−1w, instead of solving Kz = w. For solving (2),the stationary iterative methods, such as the Jacobimethod, the Gauss-Seidel method and the SOR method,are widely used and reported their efficiency [2].In what follows, we briefly describe and introduce two

variable-type preconditioners: the Jacobi preconditionerand a weighted Jacobi-type preconditioner.

2.1 The Jacobi preconditionerThe stationary iterative methods are based on trans-

forming (1) into a fixed-point equation:

x = f(x),

and the solution of the linear system (1) is computedas the fixed-point of a vector-valued function f by aniterative method.Let M be a nonsingular matrix and N be a ma-

trix such that A = M − N . Then by setting f(x) =M−1Nx +M−1b, the fixed-point (solution of (1)) canbe computed from the recurrence formula

xk+1 = f(xk) =M−1Nxk +M−1b, k = 0, 1, 2, . . .

with an initial vector x0.Let G :=M−1N and ρ(G) be the iteration matrix and

its spectral radius. Then the stationary iterative meth-ods converge to the exact solution x∗ = A−1b for anyinitial vector x0 if and only if the spectral radius of theiteration matrix satisfies the inequality ρ(G) < 1. For theconvergence rate of the stationary iterative methods, wealso have the following relation:

limk→∞

(maxx0∈Rn

∥ek∥2∥e0∥2

)1/k

= ρ(G),

where e0 := x0 − x∗ and ek := xk − x∗ are the ini-tial error vector and the error vector at the kth step,respectively [3].The Jacobi method is the simplest stationary itera-

tive method. Let AD,−AL,−AU be the diagonal part,the strict lower triangular part and the strict upper tri-angular part of A, respectively. Then the Jacobi methoddefines the matrices M := AD, N := AL + AU , and itsrecurrence formula is given next.

xk+1 = A−1D (AL +AU )xk +A−1

D b. (3)

The Jacobi preconditioner is termed as the variable-type preconditioner using some iterations of the Jacobimethod to solve (2). Since any sequential operations likeforward and/or backward substitution does not exist inthe recurrence formula (3), the Jacobi method and alsothe Jacobi preconditioner have high parallel efficiency.

2.2 A weighted Jacobi-type preconditionerThe Jacobi preconditioner has high parallel efficiency;

however, the Jacobi preconditioner does not always showthe great improvement of the convergence rate, becausethe spectrum radius of the iteration matrix of the Jacobimethod is often ρ(G) > 1.In this section in order to improve the quality of the

Jacobi preconditioner, we introduce an improvement ofthe Jacobi method, which is named as a weighted Jacobi-type method. We also introduce a weighted Jacobi-typepreconditioner.The weighted Jacobi method is well known as an im-

provement of the Jacobi method, and its recurrence for-mula is shown by

xk+1 = ω[A−1

D (AL +AU )xk +A−1D b

]+ (1− ω)xk

= xk + ωA−1D (b−Axk),

where ω ∈ R is called the weight parameter [3, Sec-tion 13.2]. This improvement method is also called thedamped Jacobi method or the relaxed Jacobi method.Here, we note that the high parallel efficiency of the

(weighted) Jacobi method comes from the fact that thematrix M is a diagonal. From this observation, we canextend the (weighted) Jacobi method without loss of itshigh parallel efficiency as follows:

xk+1 = xk + ωD−1(b−Axk), (4)

where D ∈ Rn×n is any nonsingular diagonal matrix.In this paper, we name the method based on recur-

rence formula (4) as a weighted Jacobi-type method. Wealso name the variable-type preconditioner using someiterations of the weighted Jacobi-type method to solve(2) as a weighted Jacobi-type preconditioner.

3. A parameter optimization technique

The weighted Jacobi-type preconditioner introducedin Section 2 may have a high potential to achieve a greatimprovement of the Krylov subspace method. However,the quality depends strongly on the weight parameter ωand also the diagonal matrix D.In this section, we analyse the relationship between

the weight parameter and the convergence rate of theweighted Jacobi-type method. Then from the analysis,we propose a parameter optimization technique for theweighted Jacobi-type preconditioner.

3.1 Convergence analysis on the weighted Jacobi-typemethod

The weighted Jacobi-type method can also be recog-nized as the stationary iterative method with the initialmatrix partition

A =Mω −Nω, Mω =1

ωD, Nω =

1

ωD −A,

and its iteration matrix can be written by

Gω :=M−1ω Nω = I − ωD−1A. (5)

Therefore, the convergence rate of the weighted Jacobi-type method is based on ρ(Gω).Here, for the relationship between the weight param-

eter ω and the corresponding spectral radius ρ(Gω), wederive the following theorem.

Theorem 3.1 Let A ∈ Rn×n be a nonsingular matrix,and D ∈ Rn×n be a nonsingular diagonal matrix. Wealso let C(γ, ρ) be the inner region of the circle with thecenter γ ∈ R and the radius ρ ∈ R on the complex plane,

– 42 –


and the pair of γ∗, ρ∗ be defined by

(γ∗, ρ∗) := arg minγ,ρ∈R

∣∣∣∣ργ∣∣∣∣

such that

λi(D−1A) ∈ C(γ, ρ), i = 1, 2, . . . , n,

where λi(D−1A) are the eigenvalues of D−1A.

Then we have

argminω∈R

ρ(Gω) =1

γ∗, min

ω∈Rρ(Gω) =

∣∣∣∣ρ∗γ∗∣∣∣∣ . (6)

Proof From (5), the spectral radius ρ(Gω) can berewritten as follows:

ρ(Gω) = ρ(I − ωD−1A)

= maxi|1− ωλi(D−1A)|

= |ω|maxi|1/ω − λi(D−1A)|.

Then, we have

minω∈R

ρ(Gω) = minω,ρ∈R

|ωρ| s.t. λi(D−1A) ∈ C(1/ω, ρ)

= minγ,ρ∈R

∣∣∣∣ργ∣∣∣∣ s.t. λi(D

−1A) ∈ C(γ, ρ).

Therefore, (6) is proved.(QED)

We also derive the following theorem for the con-vergence condition of the weighted Jacobi-type methodwith the optimized weight parameter.

Theorem 3.2 Let ωopt := 1/γ∗ be the optimizedweight parameter of the weighted Jacobi-type method.Then, the spectral radius of the weighted Jacobi-typemethod with the optimized weight parameter ωopt sat-isfies the following inequality:

ρ(Gωopt) =

∣∣∣∣ρ∗γ∗∣∣∣∣ < 1 (7)

if and only if λi(D−1A) satisfies

Re(λi(D−1A)) > 0, i = 1, 2, . . . , n (8)

or

Re(λi(D−1A)) < 0, i = 1, 2, . . . , n. (9)

Proof We firstly prove (8) or (9) ⇒ (7). From the re-lation

z ∈ C|Re(z) > 0 ⊂ limγ→∞

C(γ, γ),

z ∈ C|Re(z) < 0 ⊂ limγ→∞

C(−γ, γ),

there exists a circle C(γ, ρ) such that

λi(D−1A) ∈ C(γ, ρ),

∣∣∣∣ρ∗γ∗∣∣∣∣ ≤ ∣∣∣∣ργ

∣∣∣∣ < 1

for any λi(D−1A) satisfying (8) or (9).

Next, we prove (7) ⇒ (8) or (9). From (7), there is acircle C(γ, ρ) such that

λi(D−1A) ∈ C(γ, ρ),

∣∣∣∣ργ∣∣∣∣ < 1.

Fig. 1. Relationship between eigenvalues of D−1A and γ∗, ρ∗.The symbols ⋆ and • denote the extreme and interior eigenval-ues, respectively.

We also have

C(γ, ρ) ⊂ z ∈ C|Re(z) > 0,

C(−γ, ρ) ⊂ z ∈ C|Re(z) < 0,

for any γ, ρ ∈ R, 0 < ρ < γ. Therefore, if ρ(Gωopt) < 1,then all eigenvalues λi(D

−1A) satisfy (8) or (9).(QED)

Theorems 3.1 and 3.2 mean that the weight parameterω of the weighted Jacobi-type method can be optimizedby the extreme eigenvalues of D−1A; see Fig. 1.

3.2 An optimization technique for the weight parameterof the weighted Jacobi-type preconditioner

Based on the results of Theorems 3.1 and 3.2, we pro-pose an optimization technique for the weight parameterof the weighted Jacobi-type preconditioner.The basic idea of our optimization technique is based

on the so-called the off-line tuning, and it can be shownas follows:

Algorithm 1 A parameter optimization technique

1: Initialization:Set an initial guess x0 and a diagonal matrix D.

2: Optimization:

a: Compute (approximately) the extreme eigenval-ues of D−1A.

b: Optimize (approximately) the weight parameterωopt from the computed extreme eigenvalues.

3: Application:Apply the weighted Jacobi-type preconditionedKrylov subspace method with the optimized weightparameter ωopt to the linear systems (1).

Here the extreme eigenvalues can be efficiently computedby some iterations of the Arnoldi method. Parallelizationtechniques of the Arnoldi method have been widely stud-ied, and are implemented in the Parallel ARPACK [4].In Algorithm 1, at the same time as optimizing

ωopt, we can obtain the corresponding spectral radiusρ(Gωopt). Therefore, if we have some choices for the di-agonal matrix D, we can also select the best by applyingthe optimization with respect to each diagonal matrixD.It is also expected that the optimization technique can

be naturally extended to variable-type preconditionersusing other weighted stationary iterative methods by al-most the same way.

– 43 –


Table 1. Characteristics of the test problems.

Matrix name n Nnz Application area

AF23560 23560 484256 Fluid dynamicsCHIPCOOL1 20082 281150 Model reductionPOISSON3DA 13514 352762 Fluid dynamics

XENON2 157464 3866688 Materials

4. Numerical experiments and results

In this section, we evaluate the performance of theproposed preconditioner, and compare it with the Jacobipreconditioner by test problems from [5].The characteristics of the coefficient matrices of the

test problems are shown in Table 1. The values n,Nnzdenote the number of dimension and the number ofnonzero elements, respectively. We set b = [1, 1, . . . , 1]T

as the right-hand side, x0 = [0, 0, . . . , 0]T for theinitial guess, and the stopping criterion was set as∥rk∥2/∥b∥2 ≤ 10−10.For the proposed preconditioner, we set the diago-

nal matrix D = AD as well as the Jacobi precondi-tioner. The number of iterations of the Jacobi methodand the weighted Jacobi-type method for the precon-ditioners are set as 20. The number of iterations of theArnoldi method for optimization is also set as 20. We usethe GMRES method for the Krylov subspace method.The numerical experiments were implemented with

the standard Fortran 77 in double precision arithmeticon the Intel Xeon X5550 (2.67GHz).

4.1 Numerical results

We present the numerical results in Table 2. In thistable, a symbol † denotes that the method did not con-verge within 1000 iterations.Firstly, we consider the relationship between the spec-

tral radius ρ(G) and number of iterations Iter. The spec-tral radius of the Jacobi preconditioner shows ρ(G) > 1,except for CHIPCOOL1. For these problems, the Jacobipreconditioned GMRES method shows the poor conver-gence property, because the convergence condition of theJacobi method did not be satisfied.On the other hand, by optimizing the weight param-

eter ω, the spectral radius of the proposed precondi-tioner satisfies ρ(G) < 1 for all test problems. Then theproposed preconditioner leads to the better convergenceproperty than the Jacobi preconditioner.Next we consider the computation time for the opti-

mization toptimize and the total computation time ttotal.We can see from Table 2 that toptimize is relativelysmaller than ttotal for all test problems. This is basedon the fact that the most time-consuming part of theoptimization is to compute the extreme eigenvalues by20 iterations of the Arnoldi method. This computationalcost is almost comparable with the cost of one iterationof the Jacobi and the weighted Jacobi-type precondi-tioned GMRES method.From the better convergence and the negligibly small

computation time for optimization, the proposed precon-ditioner can solve the linear systems with much smallercomputation time than the Jacobi preconditioner.

Table 2. Convergence results (Precond: preconditioner, Iter :number of iterations, toptimize : computation time for optimiza-tion, ttotal : total computation time) of the preconditioned GM-

RES method.

AF23560

Precond ω ρ(G) IterTime [sec.]

toptimize ttotal

Non −− −− † −− †Jacobi −− 6.491 † −− †Proposed 0.046 0.999 660 6.30× 10−2 2.69× 101

CHIPCOOL1


toptimize ttotal

Non −− −− † −− †Jacobi −− 0.994 45 −− 7.35× 10−1

Proposed 1.077 0.994 71 4.40× 10−2 1.23× 100

POISSON3DA


toptimize ttotal

Non −− −− 184 −− 7.77× 10−1

Jacobi −− 1.259 239 −− 4.90× 100

Proposed 0.885 0.998 33 4.40× 10−2 6.06× 10−1

XENON2


toptimize ttotal

Non −− −− † −− †Jacobi −− 2.154 † −− †Proposed 0.634 0.999 386 5.06× 10−1 1.09× 102

5. Conclusions

In this paper, in order to improve the quality of the Ja-cobi preconditioner without loss of its parallel efficiency,we have introduced the weighted Jacobi-type precondi-tioner, and proposed the optimization technique for theweight parameter of the preconditioner.From our numerical experiments, we have learned that

the proposed preconditioner has higher quality and ismore efficiently than the traditional Jacobi precondi-tioner for solving very large but sparse linear systems.For future work, we should apply the proposed precon-

ditioner to the problems from the real applications andevaluate its efficiency in highly parallel computation.

Acknowledgments

This work is supported in part by Strategic Programsfor Innovative Research Field 5 “The origin of matterand the universe”, CREST and KAKENHI (Grant Nos.20105004, 20105005, 21246018, 22540296 and 23105702).

References

[1] M.Benzi, Preconditioning techniques for large linear systems:a survey, J. Comput. Phys., 182 (2002), 418–477.

[2] K. Abe and S. -L. Zhang, A variable preconditioning usingthe SOR method for GCR-like methods, Int. J. Numer. Anal.Mod., 2 (2005), 147–161.

[3] Y. Saad, Iterative Methods for Sparse Linear Systems, 2nded., SIAM, Philadelphia, 2003.

[4] Parallel ARPACK, http://www.caam.rice.edu/~kristyn/

parpack_home.html.

[5] The University of Florida Sparse Matrix Collection,http://www.cise.ufl.edu/research/sparse/matrices/.

– 44 –

JSIAM Letters Vol.4 (2012)

ISBN : 978-4-9905076-3-3

ISSN : 1883-0609

©2012 The Japan Society for Industrial and Applied Mathematics

Publisher :


4F, Nihon Gakkai Center Building

2-4-16, Yayoi, Bunkyo-ku, Tokyo, 113-0032 Japan

tel. +81-3-5684-8649 / fax. +81-3-5684-8663

J S I A Mjsiaml.jsiam.org/ebooks/JSIAMLetters_vol4-2012.pdf · Akira Imakura, Tetsuya Sakurai,...

Documents

Transcript of J S I A Mjsiaml.jsiam.org/ebooks/JSIAMLetters_vol4-2012.pdf · Akira Imakura, Tetsuya Sakurai,...