J S I A Mjsiaml.jsiam.org/ebooks/JSIAMLetters_vol2-2010.pdf · Daichi Yanagisawa, Akiyasu Tomoeda,...

J S I A M

The Japan Society for Industrial and Applied Mathematics

Vol.2 (2010) pp.1-134


Vol.2 (2010) pp.1-134

Editorial Board

Chief Editor Yoshimasa Nakamura (Kyoto University)

Vice-Chief Editor Kazuo Kishimoto (Tsukuba University)

Associate Editors Reiji Suda (University of Tokyo)

Satoshi Tsujimoto (Kyoto University)

Masashi Iwasaki (Kyoto Prefectural University)

Norikazu Saito (University of Tokyo)

Koh-ichi Nagao (Kanto Gakuin University)

Koichi Kato (Japan Institute for Pacific Studies)

Saburo Kakei (Rikkyo University)

Atsushi Nagai (Nihon University)

Takeshi Mandai (Osaka Electro-Communication University)

Ryuichi Ashino (Osaka Kyoiku University)

Ken Umeno (NiCT)

Yuzuru Sato (Hokkaido University)

Daisuke Takahashi (Waseda University)

Katsuhiro Nishinari (University of Tokyo)

Hitoshi Imai (University of Tokushima)

Nobito Yamamoto (University of Electro-Communications)

Takahiro Katagiri (University of Tokyo)

Tetsuya Sakurai (Tsukuba University)

Yoshitaka Watanabe (Kyushu University)

Takeshi Ogita (Tokyo Woman's Christian University)

Takashi Suzuki (Osaka University)

Yoshihiro Shikata

Tatsuo Oyama (National Graduate Institute for Policy Studies)

Tetsuo Ichimori (Osaka Institute of Technology)

Masami Hagiya (University of Tokyo)

Yasuyuki Tsukada (NTT Communication Science Laboratories)

Hideyuki Azegami (Nagoya University)

Kenji Shirota (Ibaraki University)

Naoyuki Ishimura (Hitotsubashi University)

Jiro Akahori (Ritsumeikan University)

Ken Nakamula (Tokyo Metropolitan University)

Miho Aoki (Shimane University)

Keiko Imai (Chuo University)

Ichiro Kataoka (HITACHI)

Shin-Ichi Nakano (Gunma University)

Maiko Shigeno (Tsukuba University)

Ichiro Hagiwara (Tokyo Institute of Technology)

Fumiko Sugiyama (Kyoto University)

Naoshi Nishimura (Kyoto University)

Hiromichi Itou (Gunma University)

Contents

Shape optimization problem of elastic bodies for controlling contact pressure ・・・ 1-4 Takahiro Iwai, Akinobu Sugimoto, Taiki Aoyama and Hideyuki Azegami

An ID-based key sharing scheme based on discrete logarithm problem over a product of three primes

・・・ 5-8

Yasuyuki Murakami and Masao Kasahara

Inner angle of triangle on unit circle made of consecutive three points generated by chaotic map

・・・ 9-12

Ryo Takahashi, Etsushi Nameda and Ken Umeno

Error estimates with explicit constants for the tanh rule and the DE formula for indefinite integrals

・・・ 13-16

Tomoaki Okayama, Takayasu Matsuo and Masaaki Sugihara

Improvements in the computation of the Hasse-Witt matrix ・・・ 17-20 Hiroki Komoto, Shunji Kozaki and Kazuto Matsuo

On the convergence of the V-type hyperplane constrained method for singular value decomposition

・・・ 21-24

Kenichi Yadani, Koichi Kondo and Masashi Iwasaki

Mixed double-multiple precision version of hyperplane constrained method for singular value decomposition

・・・ 25-28

Kenichi Yadani, Koichi Kondo and Masashi Iwasaki

A knapsack public-key cryptosystem with cyclic code over GF(2) ・・・ 29-32 Yasuyuki Murakami and Takeshi Nasako

New apportionment methods and their quota property ・・・ 33-36 Tetsuo Ichimori

Numerical solution to shape optimization problems for non-stationary Navier-Stokes problems

・・・ 37-40

Yutaro Iwata, Hideyuki Azegami, Taiki Aoyama and Eiji Katamine

A block sparse approximate inverse with cutoff preconditioner for semi-sparse linear systems derived from Molecular Orbital calculations

・・・ 41-44

Ikuro Yamazaki, Masayuki Okada, Hiroto Tadano, Tetsuya Sakurai and Keita Teranishi

Finite element computation for scattering problems of micro-hologram using DtN map ・・・ 45-48 Yosuke Mizuyama, Takamasa Shinde, Masahisa Tabata and Daisuke Tagami

Discontinuous Galerkin FEM of hybrid type ・・・ 49-52 Issei Oikawa and Fumio Kikuchi

A circular and radial slit mapping of unbounded multiply connected domains ・・・ 53-56 Kaname Amano and Dai Okano

A comparative study of principal component analysis on term structure of interest rates ・・・ 57-60 Nien-Lin Liu

Excluded volume effect in queueing theory ・・・ 61-64 Daichi Yanagisawa, Akiyasu Tomoeda, Rui Jiang and Katsuhiro Nishinari

Modeling of contagious downgrades and its application to multi-downgrade protection ・・・ 65-68 Hidetoshi Nakagawa

Differential qd algorithm for totally nonnegative Hessenberg matrices: introduction of origin shifts and relationship with the discrete hungry Lotka-Volterra system

・・・ 69-72

Yusaku Yamamoto and Takeshi Fukaya

Solutions of Sakaki-Kakei equations of type 3, 5 and 6 ・・・ 73-76 Koichi Kondo

A strategy of reducing the inner iteration counts for the variable preconditioned GCR(m) method

・・・ 77-80

Kensuke Aihara, Emiko Ishiwata and Kuniyoshi Abe

On a knapsack based cryptosystem using real quadratic and cubic fields ・・・ 81-84 Keiichiro Nishimoto and Ken Nakamula

Cryptanalysis of the birational permutation signature scheme over a non-commutative ring

・・・ 85-88

Naoki Ogura and Uchiyama Shigenori

Erratum to “Cryptanalysis of the birational permutation signature scheme over a non-commutative ring” [JSIAM Letters, 2 (2010), 85-88]

・・・ 89

Naoki Ogura and Shigenori Uchiyama

Proposal and efficient implementation of multiple division divide-and-conquer algorithm for SVD

・・・ 91-94

Yutaka Kuwajima, Youichiro Shimizu and Takaomi Shigehara

Box-ball systems related to the nonautonomous ultradiscrete Toda equation on the finite lattice

・・・ 95-98

Kazuki Maeda and Satoshi Tsujimoto

Hybridized discontinuous Galerkin method with lifting operator ・・・ 99-102 Issei Oikawa

Testing whether the Nikkei225 best bid/ask price path follows the first order discrete Markov chain - an approach in terms of the total “ρ-variation" -

・・・ 103-106

Meng Li and Kazuo Kishimoto

Numerical identification of nonhyperbolicity of the Lorenz system through Lyapunov vectors

・・・ 107-110

Yoshitaka Saiki and Miki U. Kobayashi

Mean breakdown points for compressed sensing by uniformly distributed matrices ・・・ 111-114 Ryuichi Ashino and Rémi Vaillancourt

A quadrature-based eigensolver with a Krylov subspace method for shifted linear systems for Hermitian eigenproblems in lattice QCD

・・・ 115-118

Hiroshi Ohno, Yoshinobu Kuramashi, Tetsuya Sakurai and Hiroto Tadano

Algorithm for computing Jordan basis ・・・ 119-122 Kenji Kudo, Yoshiaki Kakinuma, Kazuyuki Hiraoka, Hiroki Hashiguchi, Yutaka Kuwajima

and Takaomi Shigehara

On the pass rate of NIST statistical test suite for randomness ・・・ 123-126 Akihiro Yamaguchi, Takaaki Seo and Keisuke Yoshikawa

Parallel stochastic estimation method of eigenvalue distribution ・・・ 127-130 Yasunori Futamura, Hiroto Tadano and Tetsuya Sakurai

Error-controlling algorithm for simultaneous block-diagonalization and its application to independent component analysis

・・・ 131-134

Takanori Maehara and Kazuo Murota

JSIAM Letters Vol.2 (2010) pp.1–4 c©2010 Japan Society for Industrial and Applied Mathematics

Shape optimization problem of elastic bodies

for controlling contact pressure

Takahiro Iwai1, Akinobu Sugimoto1, Taiki Aoyama1 and Hideyuki Azegami1

Graduate School of Information Science, Nagoya University, A4-2(780) Furo-cho, Chikusa-ku,Nagoya 464-8601, Japan1

E-mail [email protected]

Received September 30, 2009, Accepted December 18, 2009

Abstract

The present paper describes a numerical solution to shape optimization problems of contactingelastic bodies for controlling contact pressure. The contacting elastic problem is formulatedas the minimization of potential energy with a constraint for penetration based on the largedeformation theory. The contact pressure is defined as a Lagrange multiplier for the constraintof penetration in the minimization problem. An error norm of the contact pressure to a desireddistribution is chosen as an objective functional. The shape derivative of the functional istheoretically evaluated. Numerical solutions are constructed by the traction method.

Keywords calculus of variations, shape optimization, elastic contact, shape derivative, trac-tion method

Research Activity Group Mathematical Design

1. Introduction

The problem of determining the deformation and con-tact pressure in contacting elastic bodies appears in thedesign of products in which the role of contact is critical,such as tires, shoes, and implants. In order to improvethe contact pressure and efficiency in the design process,the development of a numerical solution to shape opti-mization problems of contacting elastic bodies is in nec-essary.

The elastic contact problem can be formulated as aboundary-value problem involving a geometrically non-linear elastic equation with constraint inequality for pen-etration [1]. An algorithm of the finite element methodthat passes the patch test for the contact problem wasdeveloped by Chen and Hisada [2].

Shape optimization problems for domains in whichboundary value problems of partial differential equationsare defined have been investigated extensively. Generaltheories on shape derivatives are described in a numberof studies [3–7]. A reshaping algorithm for shape op-timization problems having a smoothing operation tocompensate for a lack of regularity in the shape deriva-tives was previously presented by the authors [8,9]. Thisalgorithm is referred to as the traction method becausedomain variations are obtained by solving the boundaryvalue problem of an elliptic partial differential equation,such as the elastic problem, using the shape derivativefor the Neumann condition [8], which is the tractioncondition in the elastic problem, or the Robin condition[9], which is the traction condition with a distributedspring. The mathematical considerations for the tractionmethod are described in [10]. Another algorithm for themoving boundary using the Laplace operator on theboundary was proposed by Mohammadi and Pironneau[11].

Therefore, if a method by which to evaluate the shapederivatives for shape optimization problems could bedetermined, the shape optimization problems could besolved. In the present paper, we construct a shape op-timization problem for controlling contact pressure asa minimization problem of the error norm between thecontact pressure and a desired distribution. The goals ofthe present paper are to demonstrate how to evaluatethe shape derivative of the error norm theoretically andto present the results obtained in a numerical example.

2. Elastic problem including contact

Throughout this paper, D denotes a fixed boundeddomain (connected open set) in R

d, d = 2, 3, as shown inFig. 1. For s = 1, 2, 3, r > 0 and M > 0, we introduce aclass of sub-domains of D, which is called the admissibleset of domains Ws,∞(r,M) = Ws,∞(D, r,M), in thefollowing definition.

Definition 1 (Admissible set of domains) A sub-

domain Ω of D, such that Ω ⊆ D, belongs to Ws,∞(r,M)if and only if the following conditions are satisfied. Ω is

composed of disjoint subdomains ΩA, ΩB, and ΩC such

that ΩA∩ΩB = ΩB∩ΩC = ΩC∩ΩA = ∅ and Ω= ΩA∪ΩB

∪ΩC. The boundaries ∂ΩA, ∂ΩB, and ∂ΩC are of class

p

¡pA

B

C

¡0

¡cD

¡AB

¡BC

d(x, Bt)

xAt

Bt ºAt(x)

Fig. 1. Contacting elastic bodies and distance vector d(x, ΩBt).

– 1 –

JSIAM Letters Vol. 2 (2010) pp.1–4 Takahiro Iwai et al.

W s,∞ in the sense of [3]. That is, a finite number of open

balls of radius r covers the boundary, and, in each open

ball, the boundary is identified with a level set of a W s,∞

function defined on the open ball. Moreover, all of the

W s,∞ norms of these functions are bounded from above

by M . With ∂ΩA and ∂ΩC, we associate Γp ⊂ ∂ΩA, Γc ⊂∂ΩA \ Γp ∪ ΓAB, and Γ0 ⊂ ∂ΩC \ ΓBC, where ΓAB =∂ΩA ∩ ∂ΩB and ΓBC = ∂ΩB ∩ ∂ΩC. Finally, suppose

that meas ΓBC, meas Γ0 > 0.

Let p : Γp × (0, T ) → Rd be the nonzero traction. By

p, the elastic bodies are translated statically by x(X, t) :Ω × (0, T ) ∋ (X, t) 7→ x ∈ R

d and are deform by u = x

−X under u = 0 on Γ0. For t ∈ (0, T ), let Ωt = X +u | ∀X ∈ Ω. In Ωt, penetration g(u) can be defined as

g(u) = −d(x,ΩBt) · νAt(x) ≤ 0 on Γc, (1)

where d(x,ΩBt) ∈ Rd denotes the distance vector from

x to ΩBt, and νAt denotes the normal on ΩAt (Fig. 1).Let us assume that the material is a Saint-Venant

material, such that the second Piola-Kirchhoff stressS(u) ∈ R

d×d is related to the Green-Lagrange strainE(u) ∈ R

d×d by

S(u) = CmE(u),

where, using the notation F (u) = ∇x = (∂xi/∂Xj)ij ,we have

E(u) =1

2

(

F (u)F T(u) − I)

= EL(u) +1

2EBL(u,u),

EL(u) =1

2

(

F (u) + F T(u))

,

EBL(u,v) =1

2

(

F (u)F T(v) + F (u)F T(v))

,

and Cm ∈ L∞(D; Rd×d×d×d), m ∈ A,B,C, is the stiff-ness having ellipticity, that is, there exists αm > 0 suchthat ξ · Cmξ ≥ αm|ξ|2 for all ξ ∈ ξ ∈ R

d×d | ξ = ξT.Since Saint-Venant materials have potential energy,

the elastic problem including contact is given by theequilibrium equation at t = T . Hereinafter, let p and u

denote the p and u at t = T . The equilibrium equa-tion at t = T is originally given by the Cauchy stress.Using S(u) and E(u), we can convert the equilibriumequation to the weak form of the equilibrium equationwritten in the total Lagrange description by multiplyingthe equilibrium equation by a variational displacementv and integrating over Ω.

Problem 2 (Elastic problem including contact)Let Ω ∈ Ws,∞(r,M), and let λ be the Lagrange multi-

plier, having the meaning of the contact pressure, for

the constraint of penetration g in (1). Then, find (u, λ)∈ H1(Ω; Rd+1) with respect to p ∈ H1(D; Rd) such that

∫

Ω

S(u) · δE(u,v) dX

=

∫

Γp

p · v dγ +

∫

Γc

(λgu(u) · v + µg(u)) dγ

+

∫

Γ0

(u · δS(u,v)ν + v · S(u)ν) dγ,

g(u) ≤ 0, g(v) ≤ 0, λ ≥ 0, µ ≥ 0 on Γc

for all (v, µ) ∈ H1(Ω; Rd+1), where ν denotes the nor-

mal, gu = ∂g/∂u, δF (v) = (∂vi/∂Xj)ij, and

δE(u,v) =1

2

(

δF T(v)F (u) + F T(u)δF (v))

= EL(v) + EBL(u,v),

δS(u,v) = CmδE(u,v).

Problem 2 can be converted into velocity representationas follows.

Problem 3 (Velocity form of Problem 2) Let

(u, λ) be the solution of Problem 2. Find (u, λ) ∈ H1

(Ω; Rd+1) with respect to p ∈ H1(D; Rd) such that∫

Ω

(

S(u) · δE(u,v) + S(u) · δE(u,v))

dX

=

∫

Γp

p · v dγ +

∫

Γc

(

λgu(u) · v + µgu(u) · u)

dγ

+

∫

Γ0

(

u · δS(u,v)ν + v · S(u)ν)

dγ

for all (v, µ) ∈ H1(Ω; Rd+1), where ˙( · ) = ∂( · )/∂t, and

S(u) = CmE(u), E(u) = EL(u) + EBL(u,u), (2)

δE(u,v) = EL(v) + EBL(u,v) + EBL(u, v)

= EBL(u,v). (3)

Since Problem 3 is formulated in bilinear form for(u, λ) and (v, µ), its Galerkin approximation is readilyconsidered. Then, we can apply the Newton-Raphsonmethod to solve Problem 2 using the Galerkin approxi-mation.

3. Shape optimization problem

As described in the introduction, let us define costfunctionals and a shape optimization problem.

Definition 4 (Cost functionals J0 and J1) Let

(u, λ) be a solution to Problem 2 for Ω ∈ Ws,∞(r,M).Let J0 be the functional for the error norm between the

contact pressure λ and αλ0 using a fixed element λ0 ∈W 2,∞(D; R) having a shape of distribution of desired

contact pressure and a variable α ∈ R controlling the

magnitude, and let J1 be the functional for a domain

measure constraint:

J0(Ω, λ, α) =

∫

Γc

|λ − αλ0|2dγ,

J1(Ω) = m0 −

∫

ΩB

dX,

where m0 > 0 is a constant such that J1(Ω0) ≤ 0 for

some Ω0 ∈ Ws,∞(r,M).

Problem 5 (Shape optimization) Let (u, λ) be a

solution to Problem 2 for Ω ∈ Ws,∞(r,M) with respect

to fixed p ∈ W 2,∞(D; Rd). Find Ω, for J0 and J1 as

Definitions 4 with λ0 ∈ W 2,∞(D; R), such that

minΩ∈Ws,∞(r,M), α∈R

J0(Ω, λ, α) | J1(Ω) ≤ 0.

Since, as a result of the correspondence with the char-acteristic functions for domains, Ws,∞(r,M) is compact

– 2 –


with respect to the L2(D) topology [3], we can approacha local solution by constructing a series of domains fromsome Ω0 such that J1(Ω0) ≤ 0 by looking for descent do-main variations under J1(Ω) ≤ 0. Therefore, we definethe following set of domain variations.

Definition 6 (Domain variations) Let

Us,∞ = ρ ∈ W s,∞0 (D; Rd) | ‖ρ‖ ≤ 1

be a set of domain variations, and let the new domain

Ωǫρ from Ω ∈ Ws,∞(r,M) be constructed with domain

variation ρ ∈ Us,∞ and a small constant ǫ > 0 as

Ωǫρ = x + ǫρ | ∀x ∈ Ω.

Note 7 (Domain variations) To guarantee that Ωǫρ

∈ Ws,∞(r,M), we need more constraints using a for-

mulation similar to the contact problem. For the sake of

simplicity, in the present paper, we assume that the

W s,∞ norm for ∂Ω is sufficiently smaller than M and

that ǫ is sufficiently small such that Ωǫρ ∈ Ws,∞(r,M).

To determine ρ, let us construct the following prob-lem.

Problem 8 (Optimum domain variation) Let

(u, λ) and (uǫρ, λǫρ) be solutions to Problem 2 for Ωand Ωǫρ ∈ Ws,∞(r,M) with a small fixed constant ǫ > 0and ρ ∈ Us,∞ by the fixed p ∈ W 2,∞(D; Rd). With J0

and J1 as Definitions 4 with λ0 ∈ W 2,∞(D; R), find ρ

such that

minρ∈Us,∞, α∈R

J0(Ωǫρ, λǫρ, α) | J1(Ωǫρ) ≤ 0.

4. Solution to Problem 8

Next, we evaluate the shape derivatives in the samemanner as [12] and present the solution.

4.1 Shape derivatives

We define the shape derivative of J l, l = 0, 1, as fol-lows.

Definition 9 (Shape derivatives) For J0 in Prob-

lem 8, we define the shape derivative of J0 with respect

to ρ ∈ Us,∞ by

J0′(Ω, λ, α)(ρ) = limǫ→+0

J0(Ωǫρ, λǫρ, α) − J0(Ω, λ, α)

ǫ.

J1′(Ω)(ρ) are also defined in the same manner.

In order to evaluate J0′, we introduce the LagrangianL 0(Ω,u, λ,v0, µ0, α) for the minimization problem ofJ0 subject to Problem 2, using the Lagrange multipliers(v0, µ0) ∈ H1(Ω; Rd+1) for Problem 2, as

L0 = J0(Ω, λ, α) −

∫

Ω

S(u) · δE(u,v0) dX

+

∫

Γp

p · v0dγ +

∫

Γc

(

λgu(u) · v0 + µ0g(u))

dγ

+

∫

Γ0

(

u · δS(u,v0)ν + v0 · S(u)ν)

dγ.

The stationary condition of L 0 can be determined asfollows. If Ω is a local minimum point in Problem 8 and(u, λ) is the solution of Problem 2, we have α ∈ R from

∂J0/∂α = 0 as

α =

∫

Γc

λλ0 dγ

/∫

Γc

λ2 dγ. (4)

Moreover, if (u, λ) ∈ H1(Ω; Rd+1) denotes the arbitraryvariations of (u, λ) at a fixed Ω, we have

L0′(Ω,u, λ,v0, α)(u, λ)

= J0′(Ω, λ, α)(λ) −

∫

Ω


−

∫

Ω

S(u) · δE(u,v0)dX

+

∫

Γc

(

λgu(u) · v0 + µ0gu(u) · u)

dγ

+

∫

Γ0

(

u · δS(u,v0)ν + v0 · S(u)ν)

dγ = 0, (5)

where ´( · ) are defined by replacing ˙( · ) with ´( · ) in (2)and (3).

If we set the adjoint problem for J0 as follows, thissolution satisfies (5).

Problem 10 (Adjoint problem for J0) Let (u, λ)be the solution of Problem 2. Find (v0, µ0)∈H1(Ω; Rd+1)such that

∫

Ω

S(u) · δE(u,v0) dX +

∫

Ω


=

∫

Γc

2(λ − αλ0)λdγ

+

∫

Γc

(

λgu(u) · v0 + µ0gu(u) · u)

dγ

+

∫

Γ0

(

u · δS(u,v0)ν + v0 · S(u)ν)

dγ

for all (u, λ) ∈ H1(Ω; Rd+1).

Comparing Problem 10 with Problem 3 reveals that(v0, µ0) is computed with the coefficient matrix, whichis constructed by the Galerkin method for Problem 3and is transposed, and with the force term of 2(λ−αλ0)

at the adjoint position to λ.In addition, let ρ ∈ Us,∞ be the arbitrary variation

of Ω at a fixed (u, λ,v0, µ0), which we can extend toH1(D; R2(d+1)) [3]. Then, based on Lemmas 3 and 4in [12], we have

L0′(Ω,u, λ,v0, α)(ρ)

=

∫

Γc

G0Jν · ρ dγ +

∫

∂Ω

G0aν · ρ dγ +

∫

Γp

G0pν · ρ dγ

+

∫

Γc

G0cν · ρ dγ +

∫

Γ0

G00ν · ρ dγ,

where ∇ν = ∇ · ν, κ = ∆ν, and

G0J = ∇ν |λ − αλ0|

2+ κ |λ − αλ0|

2, (6)

G0a = −S(u) · δE(u,v0)

=

−S(u) · δE(u,v0) on ∂Ω \ Γ0,

−EL(u)ν · σ(v0)ν on Γ0,(7)

– 3 –


pA

B

C

¡0

¡c

X1

X2X

3

Fig. 2. Finite element model.

G0p = ∇ν(p · v0) + κp · v0, (8)

G0c = (∇ν + κ)

(

λgu(u) · v0 + µ0g(u))

, (9)

G00 = 2EL(u)ν · σ(v0)ν.

Therefore, we have the following result.

Theorem 11 (Shape derivative for J0) Let Ω ∈W2,∞(r,M) and assume that Γc ∪ Γp is of class

W 3,∞. Suppose that (u, λ) is the solution of Prob-

lem 2 with respect to p ∈ W 2,∞(D; Rd) and Cm ∈W 2,∞(D; Rd×d×d×d), and that (v0, µ0) is the solution

of Problem 10. Recall that α is defined by (4). Then, the

shape derivative J0′(Ω, λ, α)(ρ) is given as

J0′(Ω, λ, α)(ρ)

=

∫

∂Ω

G0ν · ρ dγ

=

∫

Γc

(G0J +G0

a +G0c)ν · ρ dγ +

∫

Γp

(G0a +G0

p)ν · ρ dγ

−

∫

Γ0

G0aν · ρ dγ +

∫

∂Ω\Γc∪Γp∪Γ0

G0aν · ρ dγ,

where G0J , G0

a, G0p, and G0

c are defined as (6)through (9). Furthermore, shape gradient G0ν belongs

to W 1,∞(D; Rd).

The shape gradient for J1 is obtained as G1ν = ν.

4.2 Solution

Using the shape gradients above, we can solve Prob-lem 8 by means of the algorithm using the sequentialquadratic programming method [13], in which the trac-tion method is used to obtain the descent domain vari-ations for J0 and J1.

5. Numerical example

Following [2], we have developed a program based onan algorithm used in the finite-element method witheight-node hexahedral elements. Figure 2 shows a quar-ter of the target model in which ΩA is a half ellipsoidwith axes of 500 mm, 250 mm, and 500 mm, and ΩB andΩC are rectangular bodies of 500 mm × 50 mm × 500mm and 500 mm × 100 mm × 500 mm, respectively.We used Young’s moduli of 5× 109, 5× 105, and 5× 106

[Pa] for ΩA, ΩB, and ΩC, respectively, and a Poisson’sratio of 0.3 for Ω. We assumed that p is a nodal forceof |p| = 1 kN and that ΓAB is the point of contact inthe initial state. We also assumed that, on Γ0, u2 = 0and symmetry conditions. Finally, we assumed that onlyΓBC is variable, that Γc is a quadratic curved surface ofapproximately 350 mm × 350 mm, and that λ0 = 1.Figure 3 shows that the distribution of contact pres-

Initial shape

Optimized shape

Fig. 3. Shapes of ΩB and ΩC and contact pressures on ∂ΩB.

0.7

0.8

0.9

1.0

0 5 10 15 20 25

Rates t

o initia

l valu

es

Iteration number of reshaping

Error norm

of pressure: J 0

Domain measure

of B: J 1

Fig. 4. Iteration history with respect to reshaping.

sure became uniform. Furthermore, Fig. 4 shows thatJ0 decayed monotonically and then converged, whereasJ1 remained unchanged.

Acknowledgments

This research was supported by JSPS KAKENHI(20540113).

References

[1] P. Wriggers, Computational Contact Mechanics, 2nd ed.,Springer-Verlag, Heidelberg, 2006.

[2] X. Chen and T. Hisada, Development of a finite element con-

tact analysis algorithm to pass the patch test, JSME Int. J.Ser. A, 49 (2006), 483–491.

[3] D. Chenais, On the existence of a solution in a domain iden-tification problem, J. Math. Anal. Appl., 52 (1975), 189–219.

[4] J. Simon, Differentiation with respect to the domain inboundary value problems, Numer.Funct.Anal.Opt., 2 (1980),649–687.

[5] O. Pironneau, Optimal Shape Design for Elliptic Systems,Springer-Verlag, New York, 1984.

[6] J. Sokolowski and J. -P. Zolesio, Introduction to Shape Op-timization: Shape Sensitivity Analysis, Springer-Verlag, New

York, 1992.[7] J. Haslinger and R. A. E. Makinen, Introduction to Shape

Optimization: Theory, Approximation, and Computation,SIAM, Philadelphia, 2003.

[8] H. Azegami, A solution to domain optimization problems (inJapanese), Trans. JSME Ser. A, 60 (1994), 1479–1486.

[9] H. Azegami and K. Takeuchi, A smoothing method for shape

optimization: traction method using the Robin condition, Int.J. Comput. Meth., 3 (2006), 21–33.

[10] S. Kaizu and H. Azegami, Optimal shape problems and trac-tion method (in Japanese), Trans. JSIAM, 16 (2006), 277–

290.[11] B. Mohammadi and O. Pironneau, Applied Shape Optimiza-

tion for Fluids, Oxford Univ. Press, Oxford, 2001.

[12] G. Allaire, F. Jouve and A. -M. Toader, Structural optimiza-tion using sensitivity analysis and a level-set method, J. Com-put. Phys., 194 (2004), 363–393.

[13] H. Azegami, Solution to boundary shape optimization prob-

lems, in: High Performance Structures and Materials II, C.A. Brebbia and W. P. de Wilde eds., pp. 589–598, WIT Press,Southampton, 2004.

– 4 –


An ID-based key sharing scheme based on discrete

logarithm problem over a product of three primes

Yasuyuki Murakami1 and Masao Kasahara2

Department of Telecommunications and Computer Networks, Faculty of Information andCommunication Engineering, Osaka Electro-Communication University, 18-8, Hatsu-cho,Neyagawa-shi, Osaka 572-8530, Japan1

Faculty of Informatics, Osaka Gakuin University, 2-36-1, Kishibe minami, Suita-shi, Osaka564-8511, Japan2


Received November 2, 2009, Accepted December 30, 2009

Abstract

In 1990, the present authors proposed the first ID-based non-interacrive key sharing scheme(ID-NIKS) based on the discrete logarithm problem (DLP) over a composite number n. Witha rapid progress of computer system for the last two decades, ID-NIKS based on DLP over n

would have more chance to be applied practically. However, there existed no secure ID-NIKSbased on DLP over n against the square-root attack when n is a product of three primenumbers. In this paper, we propose an ID-NIKS based on DLP over a product of three primenumbers which can circumvent the square-root attack.

Keywords ID-based cryptosystem, non-interactive key sharing, discrete logarithm problem,factoring problem

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

The discrete logarithm problem (DLP) has been ex-tensively studied and successfully applied to the variouscryptographic technologies such as Diffie-Hellman publickey distribution scheme [1].

In the conventional DLP, usually, a prime number isused for the modulus. However, DLP can be consideredin a more general issue where the modulus is a compositenumber, although in such case the discrete logarithmdoes not necessarily exist. Hereinafter we shall denoteDLP over a composite number n by DLP(n).

In Sept. 1990, the present authors firstly discussedDLP over composite number and presented an ID-basednon-interactive key sharing scheme (ID-NIKS) referredto as MK1 [2]. In Dec. 1990, they presented an improvedversion of MK1, referred to as MK2 [3]. In 1991, Mau-rer and Yacobi presented a scheme referred to as MY [4],which is similar to our scheme, MK1. Maurer and Yacobiproposed improved versions of their scheme later [5, 6].All of these schemes can be regarded as a generalizedversion of Diffie-Hellman key sharing scheme using ID asa public key. Unfortunately these schemes except MK2cannot circumvent the square-root attack as was dis-cussed in [7]. In MK2, a product of two prime numbersis used as the modulus n.

With a rapid progress of computer system for the lasttwo decades, ID-NIKS based on DLP over a compositemodulus would have more chance to be applied prac-tically. In SCIS2005, Abe, Kunihiro and Ohta discussedthe practical parameters of ID-based key sharing schemeusing DLP over a composite modulus n [8]. They sug-gested that the modulus n should be a product of three

prime numbers in MY for a practical realization. Theirsuggestion is very interesting. However, unfortunately,their suggestion could not be successful at that time,because there existed no secure ID-NIKS using DLP(n)against the square-root attack for the case where n isa product of three prime numbers. From the practi-cal viewpoint, it is very important to construct a se-cure scheme against the square-root attack when usinga product of three prime numbers as the modulus n.

In this paper, we shall firstly discuss in detail the dis-crete logarithm problem over n in the case where n is aproduct of three prime numbers. We give the conditionsthat are required for designing ID-NIKS over DLP(n).We also show that, for an arbitrary element e such that(e/n) = 1, either e or −e has the discrete logarithm overn under the proposed conditions. We then present a newID-NIKS based on DLP(n) over a product of three primenumbers which can circumvent the square-root attack.

2. Preliminaries

2.1 Definitions

Several definitions are given first.

Definition 1 Additive group Zn, and multiplicative

group Z∗n and Z

♯n are defined as follows:

Zn = 0, 1, 2, . . . , n − 1,

Z∗n = x | x ∈ Zn, gcd(x, n) = 1 ,

Z♯n =

x∣

∣

∣x ∈ Z

∗n,(x

n

)

= 1

,

where (x/n) denotes the Jacobi symbol.

– 5 –

JSIAM Letters Vol. 2 (2010) pp.5–8 Yasuyuki Murakami and Masao Kasahara

Definition 2 The cyclic multiplicative group generated

by g ∈ Z∗n is denoted by 〈g〉n. That is, the cyclic multi-

plicative group 〈g〉n for an arbitrary element g ∈ Z∗n is

represented as follows:

〈g〉n = y ∈ Z∗n | y ≡ gx (mod n), x ∈ Z,

where Z denotes the integer set.

Definition 3 The maximum generator etc., are defined

as follows:

ϕ(n): Euler function i.e. the order of Z∗n,

ordn(a): the minimum positive integer e, which is

called the order, such that ae ≡ 1 (mod n)for an integer a,

λ(n): Carmichael function of n i.e, maxordn(a) |a ∈ Z

∗n,

Maximum generator: elements of order λ(n) in Z∗n,

Sn: the set of maximum generators in Z∗n.

2.2 DLP over composite number nThe problem to determine x such that y ≡ gx from the

given y and g is called the discrete logarithm problem.In this problem, in general, a prime number is used asthe modulus. However, it is possible to consider a moregeneral discrete logarithm problem using a compositenumber as the modulus.

As is well known, the multiplicative group Z∗n is a

cyclic multiplicative group only when n is 2, 4, an oddprime number, or an exponent of an odd prime number.The primitive element exists only in those cases. Whenthe composite number is used as the modulus, the maxi-mum generator is used instead of the primitive element.

Let us consider the following relation,

y ≡ gx (mod n), (1)

for g ∈ Sn. In general, for any x ∈ Zλ(n) there existsy ∈ Z

∗n satisfying (1). However, it is not always true

that, for any y ∈ Z∗n there exists x ∈ Zλ(n) satisfying

(1). We shall refer to the problem for the determinationof x from given y and g over n as DLP(n).

2.3 Square-root attack

If DLP(n) can be solved with the base g in a polyno-mial time, then the factoring problem of n can be solvedin an expected polynomial time [9]. Indeed, any attackerwho is able to compute the discrete logarithm x of anarbitrary element e ∈ Z

∗n can find a factor of n with the

following algorithm:

Square-Root Attack Step 1: Choose e′ randomly from Z

∗n.

Step 2: Let e ≡ e′2 (mod n).

Step 3: Compute the discrete logarithm x = logg edetermined from given e and g over n. If e doesnot have a discrete logarithm then goto Step 1.

Step 4: If gx/2 ≡ ±e′ (mod n) then goto Step 1.

Step 5: Factors of n can be obtained as gcd(gx/2±e′, n).

It was shown that the scheme using DLP(n) with a pow-

ered element in Z∗n is not secure against the square-root

attack [7].When applying DLP(n) to ID-based key sharing

scheme, it should be noted that the trusted center (TC)can be used as an oracle of solving DLP(n). Namely anattacker presents his/her forged ID for TC to obtain thediscrete logarithm of the wanted value. This means thatthe use of one-way hash function is essential for beingsecure against the square-root attack.

3. DLP over product of 3 primes

Here, we discuss the DLP(n) when n is written byn = pqr. Further, we assume that the factors of n satisfythe following condition.

Condition 4 Odd prime numbers p, q and r satisfy

the following relations:

p = 2p′ + 1

q = 2q′ + 1

r = 2r′ + 1,

where gcd(p′, q′) = gcd(q′, r′) = gcd(r′, p′) = 1.

It should be noted that p′, q′ and r′ are not necessarilyrequired to be prime numbers.

We also assume that the following conditions are sat-isfied.

Condition 5 The modulus n satisfies the following re-

lation:(

−1

n

)

= 1. (2)

Condition 6 The maximum generator g ∈ Z∗n satisfies

the following relations:

− 1 6∈ 〈g〉n, (3)( g

n

)

= 1. (4)

The following lemmas on the Legendre symbol arewell-known, where p is a prime number.

Lemma 7 For an integer a, it follows that(

a

p

)

≡ ap−1

2 (mod p).

Lemma 8(

−1

p

)

= (−1)p−1

2 =

1 if p ≡ 1 (mod 4),

−1 if p ≡ 3 (mod 4).

Lemma 9(

2

p

)

= (−1)p2−1

8 =

1 if p ≡ 1, 7 (mod 8),

−1 if p ≡ 3, 5 (mod 8).

Relating to Conditions 5 and 6, the following lemmasare important.

Lemma 10 Let p, q and r be odd prime numbers that

satisfy Condition 4. The necessary and sufficient condi-

tion for the composite number n = pqr to satisfy Condi-

tion 5 is the following:

(p, q, r) ≡ (1, 3, 3), (3, 1, 3), (3, 3, 1) (mod 4). (5)

– 6 –


Proof From (2), it follows that:((

−1

p

)

,

(

−1

q

)

,

(

−1

r

))

= (1,−1,−1), (−1, 1,−1),

(−1,−1, 1), (1, 1, 1).

Only the last value (1, 1, 1) does not satisfy Condition 4because 4|p − 1, 4|q − 1 and 4|r − 1. Consequently, (5)is obtained from Lemma 8. The converse is straightfor-ward.

(QED)

Lemma 11 Let g ∈ Z∗n be a maximum generator. The

necessary and sufficient condition for g satisfies the re-

lation (g/n) = 1 is the following:((

g

p

)

,

(

g

q

)

,(g

r

)

)

= (1,−1,−1), (−1, 1,−1),

(−1,−1, 1).

Proof From (4), it follows that:((

g

p

)

,

(

g

q

)

,(g

r

)

)

= (1,−1,−1), (−1, 1,−1),

(−1,−1, 1), (1, 1, 1).

However, it is impossible that g is a maximum generatorin the last value (1, 1, 1) for the following reason. When(g/p) = 1 holds, it holds that ordp(g)|(p− 1)/2, becauseg(p−1)/2 ≡ 1 (mod p) holds from Lemma 7. Similarly,ordq(g)|(q− 1)/2 and ordr(g)|(r − 1)/2 also hold. Thus,it follows that ordn(g) = lcm(ordp(g), ordq(g), ordr(g))|ϕ(n)/8 < λ(n). This concludes the proof. The converseis straightforward.

(QED)

Corollary 12 Letting gp ∈ Sp, gq ∈ Sq and gr ∈ Sr,

if an element g ∈ Z∗n satisfies one of the following con-

gruences, then g is the maximum generator that satisfies

the relation (g/n) = 1:

g ≡

g2p (mod p),

gq (mod q),

gr (mod r),

g ≡

gp (mod p),

g2q (mod q),

gr (mod r),

g ≡

gp (mod p),

gq (mod q),

g2r (mod r).

Lemma 13 If Conditions 4 to 6 are satisfied, the

Z♯n can be decomposed into residue classes of 〈g〉n with

1,−1 as coset leaders (see Table 1).

Proof The Z♯n is a multiplicative group of order

ϕ(n)/2. From Condition 6, 〈g〉n forms a subgroup of Z♯n.

Consequently, Z♯n can be decomposed into residue classes

of 〈g〉n. Since Conditions 5 and 6 are satisfied, 1,−1can be used as coset leaders. The order of Z

♯n is ϕ(n)/2.

Since λ(n) = ϕ(n)/4 from Condition 4, it follows that2|〈g〉n| = |Z♯

n|. Then all the elements are exhausted.(QED)

Table 1. Residue class decomposition of Z♯

n (n = pqr).

〈g〉n 1 g g2 · · · gλ(n)−1

−〈g〉n −1 −g −g2 · · · −gλ(n)−1

We show a small example of residue class decompositionof Z

♯n in Table 2.

The following theorem can be derived from Lemma13.

Theorem 14 If Conditions 4 to 6 are satisfied, letting

e ∈ Z♯n, either e or −e has a discrete logarithm over n

with g as the base.

4. Proposed scheme

We shall propose here a new ID-NIKS using DLP(n)where n is a product of three prime numbers.

Let us denote the identity information of User k asIDk. Let ek ∈ Z

∗n be the public key of User k which

is corresponding to IDk, and sk, the secret key of Userk. We assume that TC can solve DLP over each primefactor of n. TC can then compute the discrete logarithmof ek over n from the discrete logarithms over all theprime factors of n with the Chinese remainder theorem.Let KAB denote the shared key between Users A and B.

4.1 Preparation of TC

TC generates a composite modulus n = pqr and amaximum generator g so that they may satisfy Condi-tions 4 to 6. TC publicizes α satisfying (α/n) = −1. TCalso publicizes a one-way hash function h(·) which mapsbit-strings of arbitrary finite length to elements in Z

∗n so

that anyone can compute ek from IDk.

4.2 Registration of User

From Theorem 14, one and only one of ek, −ek, αek

and −αek has the discrete logarithm over n = pqr forany α such that (α/n) = −1. TC computes the secretkey sk of User k as the discrete logarithm over n asfollows:

ek = h(IDk),

e′k =

ek if(ek

n

)

= 1,

αek if(ek

n

)

= −1,

sk ≡

logg e′k (mod λ(n)) if e′k ∈ 〈g〉n,

logg −e′k (mod λ(n)) if e′k 6∈ 〈g〉n.

TC sends sk to User k in a secure channel. It shouldbe noted that sk can be computed with the Chineseremainder theorem from the discrete logarithms of e′kover p, q and r.

4.3 Non-interactive key sharing

User A can generate the shared key KAB as follows:

eB = h(IDB),

– 7 –


Table 2. Small example of residue class decomposition of Z♯

n when n = 3 · 5 · 11, (g = 112).

i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

gi 1 112 4 118 16 142 64 73 91 127 34 13 136 52 49 43 31 7 124 28−gi 164 53 161 47 149 23 101 92 74 38 131 152 29 113 116 122 134 158 41 137

e′B =

eB if(eB

n

)

= 1,

αeB if(eB

n

)

= −1,

KAB ≡ e′2sA

B ≡ g2sAsB (mod n).

4.4 Theorems for particular cases

In the following theorems, prime numbers (p, q, r) ≡(3, 3, 1) (mod 4) are assumed to be used in the proposedscheme without loss of generality.

The maximum generator g is assumed to satisfy thefollowing congruences:

g ≡

g2p (mod p),

gq (mod q),

gr (mod r),

where gp ∈ Sp, gq ∈ Sq and gr ∈ Sr.

Theorem 15 Let e ∈ Z♯n. Then we have e ∈ 〈g〉n if

and only if (e/p) = 1.

Proof From Theorem 14, e belongs to either 〈g〉n or−〈g〉n. If e ∈ 〈g〉n, e can be uniquely represented ase ≡ gi (mod n) where i ∈ Zλ(n). Thus,

(

e

p

)

=

(

gi

p

)

=

(

gip

p

)2

= 1.

If e 6∈ 〈g〉n, e can be uniquely represented as e ≡ −gi

(mod n) where i ∈ Zλ(n). Thus,(

e

p

)

=

(

−gi

p

)

=

(

−1

p

)(

gi

p

)

= −1.

Consequently, if (e/p) = 1 then e ∈ 〈g〉n, which is theconverse.

(QED)

From Theorem 15, it is evident that TC can determinewhether e′k ∈ 〈g〉n or not without computing the discretelogarithm of e′k.

Theorem 16 If p, q and r satisfy one of the following

congruences, the relation (2/n) = −1 holds.

(p, q, r) ≡ (3, 3, 5), (7, 7, 5), (7, 3, 1) (mod 8).

Proof From Lemma 9,(

2

n

)

=

(

2

p

)(

2

q

)(

2

r

)

= −1.

(QED)

Theorem 16 yields the condition that α = 2 can beused.

5. Conclusions

We have discussed in detail the discrete logarithmproblem over n where n is a product of three prime

numbers. We have shown the theorem that either e or−e has the discrete logarithm over n for an arbitraryelement e such that (e/n) = 1. We then have proposed anew ID-NIKS using the discrete logarithm problem overa product of three prime numbers based on this theo-rem. It should be noted that the proposed scheme cancircumvent the square-root attack.

References

[1] W. Diffie and M. E. Hellman, New directions in cryptography,IEEE Trans. Infom. Theory, 22 (1976), 644–654.

[2] Y. Murakami and M. Kasahara, An ID-based key distributionsystem (in Japanese), IEICE Tech. Rep. ISEC, 90 (1990), 29–36.

[3] Y. Murakami and M. Kasahara, The discrete logarithm prob-lem under a composite modulus (in Japanese), IEICE Tech.Rep. ISEC, 90 (1990), 33–40.

[4] U. M.Maurer and Y.Yacobi, Non-interactive public key cryp-tography, Advances in Cryptology – EUROCRYPT’91, Lec-ture Notes in Computer Science, Springer-Verlag, Vol. 547,pp. 498–507, 1991.

[5] U. M. Maurer and Y. Yacobi, A remark on non-interactivepublic-key distribution system, Advances in Cryptology –EUROCRYPT’92, Lecture Notes in Computer Science,Springer-Verlag, Vol. 658, pp. 458–460, 1992.

[6] U. M. Maurer and Y. Yacobi, A non-interactive public-keydistribution system, Designs, Codes and Cryptography, 9

(1996), 305–316.[7] Y. Murakami, M. Kasahara, Murakami-Kasahara ID-based

key sharing scheme revisited, – in comparison with Maurer-Yacobi scheme –, IEICE Tech. Rep. ISEC, 105 (2005), 9–16.

[8] W. Abe, N. Kunihiro and K. Ohta, Maurer-Yacobi ID-based encryption scheme revisited (in Japanese), Proc. ofthe 2005 Symposium on Cryptography and Information Se-curity (2005), 2011–2016.

[9] A. J. Menezes, P. C. Oorschot and S. A. Vanstone, Handbookof applied cryptography, USA, CRC Press, 1996.

– 8 –


Inner angle of triangle on unit circle made of

consecutive three points generated by chaotic map

Ryo Takahashi1,2, Etsushi Nameda1,3 and Ken Umeno1,4

Next Generation Mobile Communications Laboratory, Center for Intellectual Property Strate-gies, RIKEN, 2-1, Hirosawa, Wako-shi, Saitama 351-0198, Japan1

Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, 2-1, Hirosawa, Wako-shi,Saitama 351-0198, Japan2

Flucto-Order Functions Research Team, Advanced Science Institute, RIKEN, Hanyang Uni-versity, Fusion Technology Center 5F, 17 Haendang-dong, Seongdong-gu, Seoul 133-791,Korea3

National Institute of Information and Communications Technology, 4-2-1 Nukui-Kitamachi,Koganei-shi, Tokyo 184-8795, Japan4



Abstract

Inner angle of triangle made of consecutive three points on unit circle is investigated. We usetwo methods to plot points. One is plotting random numbers whose distribution is uniform.Another is plotting consecutive numbers obtained by a map which generates complex chaoticsequences with constant power. We focus on an inner angle at the middle point in consecutivethree points. An angle for chaotic sequence is found to be different from one for uniformrandom sequence.

Keywords solvable chaos, Chebyshev map, Bernoulli map

Research Activity Group Applied Chaos

1. Introduction

Several solvable chaos models are proposed and in-vestigated [1–3]. Using these models, many properties ofchaos are newly found. These solvable models also en-able us to investigate performances of applications usingchaotic properties [4–6]. However, it is still difficult tojudge whether the observed data are chaotic data or purerandom data. This problem, even now, attracts many re-searchers’ interests.

On the other hand, many chaotic sequences are pro-posed to be used as spreading sequences in Code Di-vision Multiple Access (CDMA) which is one of radiodata transmission techniques [5–9]. Many researcherswho have investigated chaos focused on how to use thesechaotic properties in telecommunication systems. Thesedays, complex chaotic spreading sequence with constantpower was proposed to be used in CDMA [9] and its per-formance is investigated [6]. Each of the real and imagi-nary parts is a chaotic sequence and the invariant mea-sure is already known. In addition, one of the unique fea-tures of this complex chaotic sequences is to realize theconstant power. It means that, in complex plane, data ofthe complex chaotic sequence are put on the unit circle.

In this paper, we investigate differences of propertiesbetween chaotic number sequences and random numbersequences. For this aim, we focus on an inner angle oftriangle made of consecutive three points on unit circle.Particularly, an inner angle at the middle point ofconsecutive three points is focused on. Here, we usetwo methods to plot points. One is plotting points on

unit circle randomly and uniformly. Another is plottingpoints obtained by a map which generates complexchaotic sequence with constant power mentioned above.This is one of the solvable chaotic models. The details ofthis chaotic sequence are mentioned below. We comparean angle for chaotic points with one for uniform randompoints.

2. Complex chaotic sequence with con-

stant power

Complex chaotic sequence with constant power can beobtained by the Chebyshev polynomial [6, 9].

First, we consider the real part of this sequence. TheChebyshev chaotic sequence Xj is well known as follows:

Xp,j+1 = Tp(Xp,j), p ≥ 2. (1)

Here, Tp(x) is the p-th order Chebyshev polynomial de-fined by

Tp(cos φ) = cos(pφ) (2)

and j is time or position in this sequence. It is knownthat this Chebyshev map is ergodic and it has an ergodicinvariant measure given by

ρ(x)dx =dφ

2π=

dx

π√

1 − x2. (3)

It satisfies the orthogonal relation∫ 1

−1

Ti(x)Tj(x)ρ(x)dx = δi,j

1 + δi,0

2, (4)

– 9 –

JSIAM Letters Vol. 2 (2010) pp.9–12 Ryo Takahashi et al.

1 0

1

0

1

Re

Im

1

Fig. 1. An example of complex chaotic spreading sequence withconstant power. Lines are drawn between the consecutive data.

where δi,j is the Kronecker delta function. A Lyapunovexponent of the sequence generated by Tp(x) is given bylog p.

Next, we consider the imaginary part sequence whichrealizes a constant power of the complex chaotic se-quences with the above real part. The imaginary partis obtained by the following map:

Yp,j+1 = T ′p(Xp,j , Yp,j), (5)

where T ′p(x, y) is defined by

T ′p(cosφ, sin φ) = sin(pφ), (6)

p and φ are the same as those of real part. Finally, thecomplex chaotic sequence Zp,j is defined as

Zp,j+1 = Fp(Zp,j), Zp,j = Xp,j + iYp,j . (7)

Here, sequence Xp,j generated by the Chebyshev mapis set at the real part and Yp,j is done at the imaginarypart. The signal power of this complex sequence is givenas

|Zp,j |2 = X2

p,j + Y 2p,j

= cos2(pφ) + sin2(pφ)

= 1. (8)

This means that these complex sequences have the con-stant signal power, namely unity 1. The imaginary partYp,j has the same invariant measure as Xp,j . Eventually,this complex chaotic sequence Zp,j can also be describedsimply as

F (eiφ) = eipφ, (9)

whose invariant measure is given by the following uni-form measure:

ρ(z)dz =dφ

2π. (10)

Fig. 1 shows one example of this sequence on complexplane, whose order is p = 6 and length is 100. It is foundthat each of the data in this sequence is on unit circle.It means that this sequence realizes the constant power.

-1 -0.5 1

-1

-0.5

1 Zi-1

Zi

Zi+1

Zi+2

O O'

α

α

θ

0.5

0.5

'

Fig. 2. An inner angle which we focus on here is obtained like thisfigure. In focusing on the consecutive three points, namely Zi−1,Zi and Zi+1, the objective inner angle in this step is α. The innerangle α′ in next step is obtained by the next consecutive points,

namely Zi, Zi+1 and Zi+2.

3. Inner angle of triangle made of con-

secutive three points on unit circle

The inner angle which we focus on here is defined asfollows. We plot points consecutively on the unit circleunder each of two procedures as were mentioned in theabove section. Then, we make a triangle by using con-secutive three points, namely Zi−1ZiZi+1. The anglewhich we focus on here is the inner angle at the middlepoint of consecutive three points, namely α = ∠Zi. Fig.2 shows an example. We investigate an expected valueof this angle.

First, we investigate the case where points on the unitcircle are plotted randomly and uniformly. Here, we setan angle between the segments OZi−1 and OZi+1 as θ.This value takes a positive value in a counterclockwisedirection from OZi−1 to OZi+1. By using the angle θ,the angle α can be represented as follows:

α =

2π − θ

2, Zi−1 → Zi → Zi+1

θ

2, Zi−1 → Zi+1 → Zi

(11)

where these expressions about α consist in the case thesepoints are in order Zi−1 → Zi → Zi+1 and Zi−1 →Zi+1 → Zi counterclockwise, respectively. Using theserelations, the expected value of α when the central angleis θ, namely 〈α〉θ , is obtained as

〈α〉θ =2π − θ

2·

θ

2π+

θ

2·2π − θ

2π

=1

2π(2πθ − θ2) (12)

Finally, under the assumption that points are uniformlydistributed on the unit circle, the expected value 〈α〉 canbe calculated as follows:

〈α〉 =1

2π

∫ 2π

0

〈α〉θdθ

=π

3. (13)

On the other hand, we investigate this value in the

– 10 –


Table 1. Zp,i−1 → Zp,i → Zp,i+1

θp(x) αp(x)

x < fp(x) < f2p(x) 2πf2

p(x) − x π1 + x − f2

p(x)

f2p(x) < x < fp(x) 2πf2

p(x) + 1 − x πx − f2

p(x)

fp(x) < f2p(x) < x 2πf2

p(x) + 1 − x πx − f2

p(x)

Table 2. Zp,i−1 → Zp,i+1 → Zp,i

θp(x) αp(x)

x < f2p(x) < fp(x) 2πf2

p(x) − x πf2

p(x) − x

fp(x) < x < f2p(x) 2πf2

p(x) − x πf2

p(x) − x

f2p(x) < fp(x) < x 2πf2

p(x) + 1 − x π1 − x + f2

p(x)

case where points generated by the map, namely (9),are plotted. Instead of investigating properties in thischaotic map, we investigate the following Bernoulli map:

fp(x) = px − ⌊px⌋, p ≥ 2. (14)

Here, x is a real number in [0, 1), p is an integer number,and ⌊·⌋ is a floor function. The quantity x in (14) andthe angle φ in (9) satisfy the following relation: φ = 2πx.The points Zp,i−1, Zp,i and Zp,i+1 correspond to realnumbers x, fp(x) and f2

p (x) = fp(fp(x)), respectively.The angles θp and αp obtained by using these pointsx, fp(x) and f2

p (x) can be defined as functions of x.Similarly as (11), we can find the following relation:

αp(x) =

2π − θp(x)

2, Zp,i−1 → Zp,i → Zp,i+1,

θp(x)

2, Zp,i−1 → Zp,i+1 → Zp,i.

(15)

Considering the order of the points Zp,i−1, Zp,i andZp,i+1 and the corresponding magnitudes of the real val-ues x, fp(x) and f2

p (x), these cases can be classified intosix cases. Tables 1 and 2 show the values of θp(x) andαp(x) in these six cases. Using these relations, we canwrite out dynamics of these three points and the dynam-ics of αp while the point x starts from 0 to 1. Figs. 3-(a)and (b) show these dynamics for p = 3. As was shownin Fig. 3-(a), the dynamics of the points x, fp(x) andf2

p (x) for arbitrary p can be represented as

y = x, (16)

y = px − j, (17)

y = p2x − i, (18)

respectively. Here j = 0, 1, . . . , p− 1, i = 0, 1, . . . , p2 − 1and 0 ≤ y ≤ 1. Since the point x is distributed uniformly

in [0, 1), the expected αp can be obtained by∫ 1

0αp(x)dx.

It means that we can obtain that value as sum of the areafor αp(x) in Fig. 3-(b). For example, in the case wherep is set at 3 shown in Fig. 3, the expected value of αp isgiven as

〈αp=3〉 =7

18π. (19)

From this result, it is found that the angle αp in thecase where three points are given by the map (14) forq = 3 is different from the one in the case where these

0

1

0

y

y = xy = fp(x)

y = fp2(x)

x0

αp /

2p

i

p(p-1) i+1

p(p-1)i+1

p2-1

p-i

2p

i+1

2p1/2

0

1/4

1x

i-th "V"-shape

(a)

(b)

1

Fig. 3. (a): dynamics of the consecutive three points x, fp(x) andf2

p(x) for p = 3. (b): dynamics of inner angle obtained from the

dynamics of three points in (a).

points are set randomly. Focusing on these figures forthe arbitrary p, we can find several relations. From thefigures which shows the dynamics of the three pointslike Fig. 3-(a), it is found that there are points wherethree points take the same value, namely a triplet point.We can find that the permutation among these threepoints is reset at this triplet point. By using (16)–(18),it can be obtained that triplet points appear (p − 1)times except a starting point. Compared with this, thedynamics of αp is also found to be cyclic whose periodis 1/(1 − p). When x is equal to the integral multiple of1/(1 − p), three points take the same value as it shouldbe. It means that focusing on the dynamics of αp in oneperiod only is sufficient to investigate the whole of that.It is also found that focusing on the dynamics for theparameter j = 0, i = 0, 1, . . . , p is sufficient to investi-gate. Furthermore, we focus on each of “V”-shape areasin one period in the figure about the dynamics of αp. Adot-line box in Fig 3-(b) represents one part where one“V”-shape covers. This part is defined between x′

i andx′

i+1 where the dynamics of fp(x) and f2p (x) cross i-th

and (i + 1)-th, respectively. The dynamics of fp(x) andf2

p (x) cross p times except a starting point in one period.The quantity x′

i is obtained as

x′i =

i

p(p − 1). (20)

Using the tables and (18), the left-side and right-side

– 11 –


lines in the “V”-shape can be obtained as

αp

2π= −

1

2(p2 − 1)x +

i + 1

2, (21)

αp

2π=

1

2(p2 − 1)x −

i + 1

2, (22)

respectively. A point where αp is equal to zero in the“V”-shape appears when the dynamics of x and f2

p (x)cross. This point x′′

i is obtained as

x′′i =

i + 1

p2 − 1. (23)

It is found that one point x′′i appears inevitably between

the points x′i and x′

i+1. This relation can be obtained asfollows. We set lengths h and k as the distances |x′

i+1 −x′

i| and |x′′i −x′

i|, respectively. We can obtain a difference(h − k) as

h − k =i + 1

p(p − 1)(p + 1). (24)

Since p ≥ 2, (h − k) is always positive. From the aboveresults, it is found that the dynamics of αp is representedby the consecutive “V”-shapes like Fig. 3-(b). Area ofone “V”-shape, namely Si, is given by

Si =(p − i)2 + (i + 1)2

4p2(p2 − 1). (25)

Finally, since p “V”-shapes are contained in one periodand (p − 1) periods are included in [0, 1), the total areaof αp/2π in [0, 1), namely S, is obtained as

S = (p − 1)

p−1∑

i=0

Si

=1

12

(

2 +1

p

)

. (26)

It means that 〈αp〉 is given by

〈αp〉 = 2πS

=π

6

(

2 +1

p

)

. (27)

From these results, it is found that this 〈αp〉 for thechaotic map (14) is obviously different from 〈α〉 for theuniform random plotting, namely (13). In addition, it isalso found that 〈αp〉 in the limit p → ∞ is equal to 〈α〉for the uniform random plotting.

4. Conclusions and discussions

In the present paper, we have focused on an inner an-gle generated by consecutive three points on unit circle.Here, we have used two methods to plot three points onunit circle. One is plotting random number distributeduniformly on the unit circle. Another is plotting thepoints obtained by the map which can generate the com-plex chaotic sequence with constant power. Instead ofusing this chaotic map, the Bernoulli map can be usedfor this investigation. We have calculated the expectedangles in both cases. In particular, we obtained the an-alytical expression of the angle in the case where theBernoulli map for arbitrary p is used. We have found

that the angle obtained from chaotic map is differentfrom the one from the uniform random numbers. In ad-dition, using the analytical expression of the angle, wehave also found that the angle for the chaotic map ap-proaches asymptotically to the one for uniform randomnumbers. The former angle in the limit p → ∞ is equalto the latter.

Previously, we investigated performance of the abovechaotic sequences with constant power as the spreadingsequences in the complex CDMA which is one of the datatransmission technique [6]. In [6], Signal-to-InterferenceRatio (SIR) which is one of the values to evaluate theperformance is calculated analytically. Then, we foundthat SIR in the case where the chaotic sequences areapplied is the same with the one where the uniform ran-dom numbers are applied. However, these results in thepresent paper show the differences about properties be-tween the chaotic number sequence and random numbersequence. These days, many methods are proposed forjudging whether the observed sequences are chaotic orpurely random. However, the judgment only by thesemethods is still difficult. This property about the innerangle investigated here has a potential for this judgment.The frequency of appearance of the six cases shown inTables 1 and 2 for the consecutive three points is consid-ered as essential to show differences. The judgment bythis property with other conventional methods would beinvestigated as a future work.

Acknowledgments

This work was supported in part by the Japan Societyfor the Promotion of Science, Grant-in-Aid for YoungScientists (Start-up), 20800074, 2009.

References

[1] K. Umeno, Method of constructing exactly solvable chaos,Phys. Rev. E, 55 (1997), 5280–5284.

[2] J. A. Gonzalez and R. Pino, Chaotic and stochastic functions,Physica A, 276 (2000), 425–440.

[3] J. A. Gonzalez and L. Trujillo, Statistical independence ofgeneralized chaotic sequences, J. Phys. Soc. Jpn., 75 (2006),023002-1–023002-4.

[4] K. Umeno, Chaotic Monte Carlo computation: A dynamicaleffect of random-number generations, Jpn. J. Appl. Phys., 39

(2000), 1442–1456.[5] K. Umeno and K. Kitayama, Improvement of SNR with

chaotic spreading sequences for CDMA, Proc. of the 1999IEEE Inform. Theory Workshop, p. 106, 1999.

[6] R. Takahashi and K. Umeno, Performance analysis of com-plex CDMA using complex chaotic spreading sequence withconstant power, IEICE Trans. Fundamentals, E92-A (2009),3394–3397.

[7] T. Kohda and A. Tsuneda, Pseudonoise sequences by chaoticnonlinear maps and their correlation properties, IEICE Trans.Commun., E76-B (1993), 855–862.

[8] G. Mazzini, G. Setti and R. Rovatti, Chaotic complex spread-ing sequences for asynchronous DS-CDMA. I: System mod-eling and results, IEEE Trans. Circuits and Systems I, 44

(1997), 937–947.[9] K. Umeno, CDMA and OFDM communications systems

based on 2D exactly solvable chaos (in Japanese), Proc. 55th

Natl. Cong. of Theoretical and Applied Mechanics, pp.191–192, 2006.

– 12 –


Error estimates with explicit constants for the tanh rule

and the DE formula for indefinite integrals

Tomoaki Okayama1, Takayasu Matsuo1 and Masaaki Sugihara1

Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1,Hongo, Bunkyo, Tokyo 113-8656, Japan1

E-mail Tomoaki [email protected]

Received September 25, 2009, Accepted January 8, 2010

Abstract

The tanh rule and the double-exponential (DE) formula are known as efficient quadraturerules for definite integrals over a finite interval (a, b). In this note we consider a numericalmethod for indefinite integrals obtained by applying the tanh rule or the DE formula tothe integration over the interval (a, x) for each x. For these methods the conventional erroranalyses yield error estimates depending on x, which are impractical. We here present errorestimates that do not depend on x, and furthermore, with explicit constants.

Keywords error estimates, numerical indefinite integration, tanh rule, DE formula

Research Activity Group Scientific Computation and Numerical Analysis

1. Introduction

The tanh rule and the double-exponential (DE) for-

mula are quadrature rules for definite integrals∫ b

af(t) dt

by means of the trapezoidal formula incorporated witha variable transformation t = ψa,b(τ). That is,

∫ b

a

f(t) dt =

∫ ∞

−∞

f(ψa,b(τ))ψ′a,b(τ) dτ

≈ h

N∑

k=−N

f(ψa,b(kh))ψ′a,b(kh),

where ψa,b : R → (a, b). In the tanh rule and the DEformula, the SE transformation ψSE

a,b and the DE trans-formation ψDE

a,b are employed [1,2] respectively, which aredefined by

ψSE

a,b(τ) =b− a

2tanh

(τ

2

)

+b+ a

2,

ψDE

a,b(τ) =b− a

2tanh

(π

2sinh(τ)

)

+b+ a

2.

It is known that the tanh rule and the DE formulaare quite efficient quadrature rules, which has also beenproved by theoretical error analyses [3–6].

These quadrature rules can be also applied to indef-

inite integrals∫ x

af(t) dt, x ∈ (a, b) by simply adjusting

the variable transformation as follows:∫ x

a

f(t) dt =

∫ ∞

−∞

f(ψa,x(τ))ψ′a,x(τ) dτ

≈ h

N∑

k=−N

f(ψa,x(kh))ψ′a,x(kh). (1)

This approach has been taken by several authors [7–9].Although this approach works pretty well in practice(see the discussion below), there has been a difficultyin its error analysis; if we simply apply the error anal-

yses mentioned above, the assumptions on the functionf in the resulting theorems would depend on x, whichvaries continuously in [a, b] (compare Theorem 4 (or 5,respectively) and Theorem 6 (or 7)). In practice, it isnearly impossible to check whether the assumptions arefulfilled for all x or not.

There is another approximation formula for indefiniteintegrals, which is called the Sinc indefinite integration.In the formula, the Sinc approximation is utilized witha variable transformation as follows:∫ x

a

f(t) dt =

∫ ψ−1

a,b(x)

−∞

f(ψa,b(τ))ψ′a,b(τ) dτ

≈N

∑

k=−N

f(ψa,b(kh))ψ′a,b(kh)J(k, h)(ψ−1

a,b(x)),

where J(k, h)(x) is defined with the aid of the sine inte-gral function Si(x) =

∫ x

0 [sin(σ)/σ] dσ as

J(k, h)(x) = h

1

2+

1

πSi

[

π(x

h− k

)]

.

There are also two types of variable transformations,i.e. ψSE

a,b and ψDE

a,b [10,11]. We call these formulas the SE-Sinc indefinite integration and the DE-Sinc indefiniteintegration, respectively.

The main advantage of the Sinc indefinite integra-tion is that the variable transformation is fixed: ψa,b(τ),whereas ψa,x(τ) depends on x in the case of the tanhrule or the DE formula. As a consequence, it is possibleto give error analysis that does not depend on x in thecase of the Sinc indefinite integration. It has, however,its own drawbacks that it is indispensable to computethe special function (sine integral), and that it has beennumerically observed that the actual convergence ratesof the SE/DE-sinc indefinite integration are much lowerthan those of the tanh rule/DE formula (see Fig. 1).

– 13 –

JSIAM Letters Vol. 2 (2010) pp.13–16 Tomoaki Okayama et al.

In this note we will give new error analyses for thetanh rule and the DE formula such that the assump-tions do not depend on x, which explain the numericalobservation. The key here is the result of the presentauthors [12], which enables us to replace the assump-tions depending on x with those which do not dependon x. Furthermore, we give explicit forms of all constantsappearing in the error analyses, based on the estimatesby the present authors [6]. Then we present numericalexamples demonstrating the error estimates. Finally we

extend these results to the case of integrals:∫ q(x)

p(x)f(t) dt

(A ≤ x ≤ B).

2. Preliminaries: existing error estimates

Let us first introduce the following function space.

Definition 1 Let D be a simply-connected domain

which satisfies (a, b) ⊂ D , and let K be a positive con-

stant. Then H∞K (D) denotes the family of all functions

f that are analytic on D , and satisfy for all z in D the

condition |f(z)| ≤ K.

In what follows D is a transformed domain by eitherψSE

a,b or ψDE

a,b from a strip domain Dd = ζ ∈ C : | Im ζ| <d, where d is a positive constant. That is, the trans-formed domains are expressed as

ψSE

a,b(Dd) = z = ψSE

a,b(ζ) : ζ ∈ Dd,

ψDE

a,b(Dd) = z = ψDE

a,b(ζ) : ζ ∈ Dd,

respectively. With these notations, the error of theSE/DE-Sinc indefinite integration has been estimatedas follows.

Theorem 2 (Okayama et al. [6, Theorem 2.7])Let f ∈ H∞

K (ψSE

a,b(Dd)) for d with 0 < d < π, and let h

be selected as h =√

πd/N . Then it holds for all N that

maxa≤x≤b

∣

∣

∣

∣

∣

∫ x

a

f(t) dt−N

∑

k=−N

f(ψSE

a,b(kh))[ψSE

a,b]′

(kh)

J(k, h)([ψSE

a,b]−1(x))

∣

∣

∣

∣

∣

≤ C1

(

1.1 +C2

1 − e−2√πd

√

π

d

)

e−√πdN ,

where C1 and C2 are constants defined by

C1 = 2K(b− a), (2)

C2 =1

cos2(

d

2

) . (3)


K (ψDE

a,b(Dd)) for d with 0 < d < π/2, and let

h be selected as h = log(2dN)/N . Then it holds for all

N with N ≥ e/(2d) that

maxa≤x≤b

∣

∣

∣

∣

∣

∫ x

a

f(t) dt−N

∑

k=−N

f(ψDE

a,b(kh))[ψDE

a,b]′

(kh)

J(k, h)([ψDE

a,b]−1(x))

∣

∣

∣

∣

∣

≤C1

d

(

eπ +C3

1 − e−π e

)

log(2dN)

Ne−πdN/ log(2dN),

where C1 is defined by (2) and C3 is defined by

C3 =1

cos2(π

2sin d

)

cos d. (4)

The error estimates for the tanh rule and the DE for-mula have been given as stated below.


K (ψSE

a,b(Dd)) for d with 0 < d < π, and let hbe selected as

h =

√

2πd

N. (5)

Then it holds for all N that∣

∣

∣

∣

∣

∫ b

a

f(t) dt− hN

∑

k=−N

f(ψSE

a,b(kh))[ψSE

a,b]′

(kh)

∣

∣

∣

∣

∣

≤ C1

(

1 +2C2

1 − e−√

2πd

)

e−√

2πdN ,

where C1 and C2 are defined by (2) and (3), respectively.


K (ψDE

a,b(Dd)) for d with 0 < d < π/2, and let

h be selected as

h =log(4dN)

N. (6)

Then it holds for all N with N ≥ e/(4d) that∣

∣

∣

∣

∣

∫ b

a

f(t) dt− h

N∑

k=−N

f(ψDE

a,b(kh))[ψDE

a,b]′

(kh)

∣

∣

∣

∣

∣

≤ C1

[

eπ/2 +2C3

1 − e−(π/2) e

]

e−2πdN/ log(4dN),


3. Error estimates for the tanh rule/DE

formula for indefinite integrals

Simply applying Theorem 4 or 5 to the case of theapproximation (1), we have the next results.

Theorem 6 Let 0 < d < π and f ∈ H∞K (ψSE

a,x(Dd))for all x ∈ [a, b] uniformly, and let h be selected by (5).Then it holds for all N and x ∈ [a, b] that

∣

∣

∣

∣

∣

∫ x

a

f(t) dt− hN

∑

k=−N

f(ψSE

a,x(kh))[ψSE

a,x]′

(kh)

∣

∣

∣

∣

∣

≤ 2K(x− a)

(

1 +2C2

1 − e−√

2πd

)

e−√

2πdN ,

where C2 is defined by (3).

Theorem 7 Let 0 < d < π/2 and f ∈ H∞K (ψDE

a,x(Dd))for all x ∈ [a, b] uniformly, and let h be selected as (6).Then it holds for all N with N ≥ e/(4d) and x ∈ [a, b]that∣

∣

∣

∣

∣

∫ x

a

f(t) dt− h

N∑

k=−N

f(ψDE

a,x(kh))[ψDE

a,x]′

(kh)

∣

∣

∣

∣

∣

– 14 –


≤ 2K(x− a)

[

eπ/2 +2C3

1 − e−(π/2) e

]


where C3 is defined by (4).

In these theorems, the assumptions on f depend onx, like f ∈ H∞

K (ψSE

a,x(Dd)) or f ∈ H∞K (ψDE

a,x(Dd)), butsuch a condition is not easy to check for all x ∈ [a, b].In this paper, we replace the condition with that whichdoes not depend on x by using the following results.

Lemma 8 (Okayama et al. [12, Lemma 2]) Let f∈ H∞

K (ψSE

a,b(Dd)) for d with 0 < d < π. Then f ∈H∞K (ψSE

a,x(Dd)) uniformly for all x ∈ [a, b].

Lemma 9 (Okayama et al. [12, Lemma 9]) Let f∈ H∞

K (ψDE

a,b(Dd) ∪ b) for d with 0 < d < π/2. Then

f ∈ H∞K (ψDE

a,x(Dd)) uniformly for all x ∈ [a, b].

These lemmas state that it is sufficient to check onlythe case of x = b. From this and 2K(x− a) ≤ C1, The-orems 6 and 7 can be rewritten as follows.

Theorem 10 Let the assumptions in Theorem 4 be ful-

filled. Then it holds for all N that

maxa≤x≤b

∣

∣

∣

∣

∣

∫ x

a

f(t) dt− h

N∑

k=−N

f(ψSE

a,x(kh))[ψSE

a,x]′

(kh)

∣

∣

∣

∣

∣

≤ C1

(

1 +2C2

1 − e−√

2πd

)

e−√

2πdN ,



filled. Furthermore, let f be analytic on b and |f(b)| ≤K. Then it holds for all N with N ≥ e/(4d) that

maxa≤x≤b

∣

∣

∣

∣

∣

∫ x

a

f(t) dt− hN

∑

k=−N

f(ψDE

a,x(kh))[ψDE

a,x]′

(kh)

∣

∣

∣

∣

∣

≤ C1

[

eπ/2 +2C3

1 − e−(π/2) e

]



Note that the assumptions in these theorems do notdepend on x any more. Furthermore, the assumptionson f in Theorem 10 are exactly the same as in Theorem2, but the convergence rate in Theorem 10 is higher thanthat in Theorem 2. The similar thing can be said in thecase of Theorems 3 and 11.

4. Numerical examples

In this section we show numerical examples thatdemonstrate the theorems described above. Computa-tion programs were written in C with double-precisionfloating point arithmetic. The GNU Scientific Librarywas used for evaluating the function Si(x).

Let us consider the test function f(t) =√

1 + t2 andits indefinite integral for x ∈ [−1, 1]:∫ x

−1

f(t) dt =

√2 + x

√1 + x2 + arcsinh(1)+arcsinh(x)

2.

This function f satisfies the assumptions in Theorem 2and Theorem 10 with K = 2 and d = π/2, and alsosatisfies the assumptions in Theorem 3 and Theorem 11

1e-16

1e-14

1e-12

1e-10

1e-08

1e-06

0.0001

0.01

1

0 20 40 60 80 100 120 140

Max

imu

m E

rro

r

N

SE-Sinc indefinite integration

tanh rule

DE-Sinc indefinite integration

DE formula

Fig. 1. Comparison of convergence between the SE/DE-Sinc in-definite integration and the tanh rule/DE formula.

1e-16

1e-14

1e-12

1e-10

1e-08

1e-06

0.0001

0.01

1

0 20 40 60 80 100 120 140

Max

imu

m E

rro

r

N

Error estimate of tanh rule

Observed error of tanh rule

Error estimate of DE formula

Observed error of DE formula

Fig. 2. Error estimates of the tanh rule and the DE formula.

with K = 2 and d = π/6. The error of the each approx-imation is investigated on 2001 equally-spaced points,i.e. x = −1,−0.999,−0.998, . . . , 0.999, 1, and the maxi-mum of the absolute errors is plotted in Fig. 1. From thisfigure we can see that the convergence rate of the tanhrule/DE formula is faster than the one of the SE/DE-Sinc indefinite integration. Furthermore, from Fig. 2 wecan see that the error estimates in Theorems 10 and 11are actually sharp upper bounds of the observed error.

5. Extension to the general interval

The results above for the integral∫ x

af(t) dt can be

extended to the integral∫ q(x)

p(x)f(t) dt, where a ≤ p(x) ≤

q(x) ≤ b for x ∈ [A,B]. To this end, the following lemmais essential in the case of the tanh rule.

Lemma 12 Let f ∈ H∞K (ψSE

a,b(Dd)) for d with 0 <d < π. Then it holds uniformly for all x1 and x2 with

a ≤ x1 ≤ x2 ≤ b that f ∈ H∞K (ψSE

x1,x2(Dd)).

This lemma holds from Lemma 8 and the next lemma.

Lemma 13 Let f ∈ H∞K (ψSE

a,b(Dd)) for d with 0 < d <π. Then f ∈ H∞

K (ψSE

x,b(Dd)) uniformly for all x ∈ [a, b].

We omit the proof since it is almost the same asLemma 8. Thus we have the error estimates for the tanhrule in the case of the integral

∫ q(x)

p(x) f(t) dt as follows.


– 15 –


filled. Then it holds for all N that

maxA≤x≤B

∣

∣

∣

∣

∣

∫ q(x)

p(x)

f(t) dt

−hN

∑

k=−N

f(ψSE

p(x),q(x)(kh))[ψSE

p(x),q(x)]′

(kh)

∣

∣

∣

∣

∣

≤ C1

(

1 +2C2

1 − e−√

2πd

)

e−√

2πdN ,


In the case of the DE formula, the following lemma isessential.

Lemma 15 Let f ∈ H∞K (a ∪ ψDE

a,b(Dd) ∪ b) for dwith 0 < d < π/2. Then it holds uniformly for all x1 and

x2 with a ≤ x1 ≤ x2 ≤ b that f ∈ H∞K (ψDE

x1,x2(Dd)).

This lemma holds from Lemma 9 and the next lemma.

Lemma 16 Let f ∈ H∞K (ψDE

a,b(Dd) ∪ a) for d with

0 < d < π/2. Then f ∈ H∞K (ψDE

x,b(Dd)) uniformly for all

x ∈ [a, b].

We omit the proof since it is almost the same asLemma 9. Thus we have the error estimates for the DEformula in the case of the integral

∫ q(x)

p(x) f(t) dt as follows.


filled. Furthermore, let f be analytic on a and b, and

satisfy |f(a)| ≤ K and |f(b)| ≤ K. Then it holds for all

N with N ≥ e/(4d) that

maxA≤x≤B

∣

∣

∣

∣

∣

∫ q(x)

p(x)

f(t) dt

−hN

∑

k=−N

f(ψDE

p(x),q(x)(kh))[ψDE

p(x),q(x)]′

(kh)

∣

∣

∣

∣

∣

≤ C1

[

eπ/2 +2C3

1 − e−(π/2) e

]



This completes the desired extension.

6. Concluding remarks

We here summarize the differences between the SE/DE-Sinc indefinite integration (referred to as [A]) andthe tanh rule/DE formula (referred to as [B]).

(i) [B] converges faster than [A] under (almost) thesame assumptions (showing this theoretically is themain contribution of this paper).

(ii) As shown in [12], the results for [B] can be extendedto the integral

∫ x

a[f(t)/

√x− t] dt by using Lemma

8 and Theorem 6, or Lemma 9 and Theorem 7. How-ever, [A] can not handle such an integral since theintegrand has a singular point at t = x, which meansthe assumptions of Theorems 2 and 3 are not satis-fied.

(iii) Computation of the special function Si(x) is neededin the case of [A], but not in the case of [B].

(iv) All descriptions above are advantages of [B], but[B] also has disadvantages. In [A], for all x ∈ [a, b]

the evaluation points of the function f are fixed:f(ψa,b(kh)). In contrast, in the case of [B], theevaluation points are altered in accordance withx: f(ψa,x(kh)). This means that for each x re-evaluation of f is needed on different 2N+1 points,which causes the increase of the computational cost.

Future work includes the following issues. Similar anal-ysis of the cases in [7–9] is also possible by using theresults in the present paper. Furthermore, as describedabove in (ii), those results can be further extended to

the integral∫ q(x)

p(x) f(x, t) dt, and accordingly to the re-

peated integral∫ B

A[∫ q(x)

p(x)f(x, t) dt] dx. Constructing li-

braries with guaranteed accuracy is desired based on thetechniques in [13]. We are now working on these issues,and the results will be reported in a forthcoming paper.

Acknowledgments

This work was supported by the Grant-in-Aid for Sci-entific Research, MEXT, Japan. The first author wassupported by the Research Fellowship of JSPS.

References

[1] C. Schwartz, Numerical integration of analytic functions, J.Comput. Phys., 4 (1969), 19–29.

[2] H. Takahasi and M. Mori, Double exponential formulas fornumerical integration, Publ. Res. Inst. Math. Sci., 9 (1974),

721–741.[3] F. Stenger, Numerical Methods Based on Sinc and Analytic

Functions, Springer-Verlag, New York, 1993.[4] M. Sugihara, Optimality of the double exponential formula

– functional analysis approach –, Numer. Math., 75 (1997),379–395.

[5] K. Tanaka, M. Sugihara, K. Murota and M. Mori, Functionclasses for double exponential integration formulas, Numer.Math., 111 (2009), 631–655.

[6] T. Okayama, T. Matsuo and M. Sugihara, Error estimateswith explicit constants for Sinc approximation, Sinc quadra-ture and Sinc indefinite integration, Mathematical Engineer-ing Tech. Rep. 2009-01, The Univ. of Tokyo, 2009, http://www.keisu.t.u-tokyo.ac.jp/research/techrep/.

[7] B. V. Riley, The numerical solution of Volterra integral equa-tions with nonsmooth solutions based on sinc approximation,Appl. Numer. Math., 9 (1992), 249–257.

[8] M. Mori, A. Nurmuhammad and T. Murai, Numerical solu-tion of Volterra integral equations with weakly singular kernelbased on the DE-sinc method, Jpn J. Indust. Appl. Math., 25

(2008), 165–183.[9] T. Okayama, T. Matsuo and M. Sugihara, Sinc-collocation

methods for weakly singular Fredholm integral equations ofthe second kind, J. Comput. Appl. Math., in press.

[10] S. Haber, Two formulas for numerical indefinite integration,Math. Comp., 60 (1993), 279–296.

[11] M. Muhammad and M. Mori, Double exponential formulasfor numerical indefinite integration, J. Comput. Appl. Math.,161 (2003), 431–448.

[12] T. Okayama, T. Matsuo and M. Sugihara, Approximate for-mulae for fractional derivatives by means of Sinc methods, J.Concr. Appl. Math., 8 (2010), 470–488.

[13] N. Yamanaka, T. Okayama, S. Oishi and T. Ogita, A fastverified automatic integration algorithm using double expo-nential formula, RIMS Kokyuroku, 1638 (2009), 146–158.

– 16 –


Improvements in the computation

of the Hasse-Witt matrix

Hiroki Komoto1, Shunji Kozaki1 and Kazuto Matsuo1

Institute of Information Security, 2-14-1, Tsuruya-cho Kanagawa-ku, Yokohama 221-0835,Japan1


Received November 17, 2009, Accepted January 8, 2010

Abstract

The Hasse-Witt matrix of a hyperelliptic curve gives partial information for the order ofthe Jacobian of the curve, therefore the Hasse-Witt matrices can be used for point countingof hyperelliptic curves. Bostan, Gaudry and Schost improved the Chudnovsky-Chudnovskyalgorithm and computed the Hasse-Witt matrices by using their improved algorithm for con-structing hyperelliptic cryptosystems. The both algorithms need p-adic integers with finiteprecision as the base operations. This paper shows improvements in the computation of theHasse-Witt matrix that reduces the required precision of the p-adic integers.

Keywords hyperelliptic curves, hyperelliptic curve cryptosystems, point counting, Hasse-Witt matrices


1. Introduction

Hyperelliptic curve cryptosystems are constructed onrational point groups of the Jacobians of hyperellipticcurves defined over finite fields. Their security dependson the difficulty of the discrete logarithm problems onthe rational point groups. The complexity of the prob-lems is strongly affected by the group orders. Therefore,in order to construct secure hyperelliptic curve cryp-tosystems, one needs to know the group orders. The or-der can be obtained from the characteristic polynomialof the Frobenius map on the Jacobian. The residues ofthe coefficients in the characteristic polynomial modulop can be derived from the Hasse-Witt matrix, where pis the characteristic of the field over which the curve isdefined. Therefore, the Hasse-Witt matrices can be usedfor computing the orders of the Jacobians of hyperellip-tic curves and more general curves [1–5].

The Hasse-Witt matrix consists of coefficients of apower of a polynomial defining the curve. These co-efficients can be computed by using the Chudnovsky-Chudnovsky algorithm [6] for a linear recurrence withthe polynomial coefficients. Bostan, Gaudry and Schost[3, 5] improved the Chudnovsky-Chudnovsky algorithmand computed the Hasse-Witt matrices for constructinghyperelliptic curve cryptosystems over finite fields of rel-atively large characteristics. These algorithms essentiallyneed p-adic integers with finite precision.

This paper shows a method to speed up the compu-tation of the Hasse-Witt matrix. The proposed methodreduces the required precision of the p-adic integers byusing the reversals of polynomials.

This paper is organized as follows. Section 2 definesthe Hasse-Witt matrix and describes the computation ofthe Hasse-Witt matrix. Section 3 shows improvements inthe computation of the Hasse-Witt matrix and Section

4 shows experimental results for the improvements. Fi-nally, Section 5 concludes this paper.

2. Computation of the Hasse-Witt ma-

trix using a linear recurrence

This section defines the Hasse-Witt matrix of a hy-perelliptic curve and summarizes the computation of theHasse-Witt matrix using a linear recurrence.

Let p be an odd prime and Fp be a finite field of orderp. A hyperelliptic curve C over Fp of genus g is definedby

C : Y 2 = F (X), F (X) =

2g+1∑

i=0

fiXi ∈ Fp [X], (1)

where F (X) is a monic (i.e. f2g+1 = 1) square-free poly-nomial. For simplicity, g ≪ p and f0 6= 0 are assumed inthe following.

Definition 1 (Hasse-Witt matrix) Let hk denote

the coefficient of Xk in the polynomial (F (X))p−1

2 for

C. The Hasse-Witt matrix of C is defined by a g × gmatrix over Fp whose (i, j)-th component is hjp−i:

H =

hp−1 h2p−1 · · · hgp−1

hp−2 h2p−2 · · · hgp−2

......

. . ....

hp−g h2p−g · · · hgp−g

.

The following theorem is known for the Hasse-Wittmatrix.

Theorem 2 (Manin [7]) Let χp(X) denote the char-

acteristic polynomial of the p-power Frobenius map on

the Jacobian of C and H denote the Hasse-Witt matrix

of C, then

χp(X) ≡ (−1)gXg det (H − XI) mod p,

– 17 –

JSIAM Letters Vol. 2 (2010) pp.17–20 Hiroki Komoto et al.

where I is a g × g unit matrix.

Therefore, the residues of the coefficients in the charac-teristic polynomial χp(X) modulo p can be computedfrom the Hasse-Witt matrix H by using Theorem 2.

An usual polynomial powering, such as the binary

method, can compute all the coefficients of (F (X))p−1

2

within O(M(gp) log p) operations in Fp , where M(n) de-notes the cost for a multiplication of polynomials ofdegree less than n in Fp [X]. On the other hand, theChudnovsky-Chudnovsky algorithm in [6, Section 6] cancompute the Hasse-Witt matrix within O(gω+1

M(√

p)+g3

M(√

p) log p) operations, where O(gω) is the complex-ity of a g×g matrix multiplication. Bostan, Gaudry andSchost [3, 5] improved the Chudnovsky-Chudnovsky al-gorithm and showed that, by using their improved algo-rithm, the Hasse-Witt matrix can be computed withinO(gω+1√p + g3

M(√

p)) operations. Therefore, these al-gorithms are asymptotically faster than the algorithmsusing polynomial powering. In the following, we describethe computation of the Hasse-Witt matrix in [3, 5], re-stricting ourselves to what we need for our improvementsin the later section.

Let rational functions ri(X) with a variable X for 1 ≤i ≤ 2g + 1 be

ri(X) =

fi

(

ip + 1

2− X

)

f0X, (2)

and a (2g + 1) × (2g + 1) matrix A(X) be

A(X) =

0 1 · · · 00 0 · · · 0...

.... . . 0

0 0 · · · 1r2g+1(X) r2g(X) · · · r1(X)

. (3)

For any positive integer k, A(k) denotes the matrixwhich is obtained by substituting X = k. Let hk be asin Definition 1 and a (2g+1)-dimensional column vectorUk be

Uk = t(

hk−2g · · · hk−1 hk

)

, (4)

where h−2g = · · · = h

−1 = 0. Then one can obtain alinear recurrence given by

Uk = A(k)Uk−1

= A(k)A(k − 1) · · ·A(1)U0 (5)

from [8, Chapter IV] (see also [9, Problem 4] and [3, p.52]). Therefore, one can compute

Ujp−1 = t(

hjp−2g−1 · · · hjp−2 hjp−1

)

for 1 ≤ j ≤ g by using the linear recurrence (5) startingfrom

U0 = t(

0 · · · 0 fp−1

2

0

)

,

which can be obtained from the constant term h0 = fp−1

2

0

in (F (X))p−1

2 . Then the Hasse-Witt matrix H can beobtained from the components of the vectors Ujp−1 for1 ≤ j ≤ g.

However, since the components in A(X) contain the

rational functions (2), divisions by p are involved in com-puting Ujp−1. Therefore, those algorithms cannot be ex-ecuted over Fp . The algorithm in [3,5] used p-adic inte-gers Zp for computing the Hasse-Witt matrix, because p-adic numbers that contain Zp permit divisions by p andthe residues of elements in Zp modulo p give elementsin Fp . The Hasse-Witt matrix can be obtained using Zp

as follows. First, one lifts the coefficients of F (X) to Zp

(i.e. one considers F (X) in Zp[X]) and computes all thevectors Up−1, U2p−1, · · · , Ugp−1 over Zp. Then, by re-ducing the components of these vectors modulo p, onecan obtain the Hasse-Witt matrix over Fp .

[3,5] computed those vectors over Zp with finite pre-cision as follows. Let a matrix B(k) := f0kA(k) over Zp,then the linear recurrence (5) implies

Ujp−1 =1

f jp−10 (jp − 1)!

B(jp − 1) · · ·B(1)U0

for 1 ≤ j ≤ g. By the assumption g ≪ p, the factorialterm (jp − 1)! in the denominator is exactly divided bypj−1. So, Ugp−1 for the case j = g requires the divisionby the highest power of p, i.e. pg−1. Therefore, in orderto obtain the Hasse-Witt matrix over Fp , it is enoughto compute the vectors Up−1, U2p−1, · · · , Ugp−1 over Zp

with the finite precision up to g, whose arithmetic canbe done in Z/pg

Z.The next section shows improvements in the computa-

tion of the Hasse-Witt matrix by reducing the requiredprecision.

3. Improvements

This section shows that the precision of the p-adic in-tegers in the algorithms using the linear recurrence canbe reduced. Moreover, one can further reduce the preci-sion by using the linear recurrence with the coefficientsof the reversal of a polynomial.

3.1 Reducing the precision of the p-adic integers

In the algorithms using the linear recurrence (5), thedivisions by p occur at the computation in

Ujp = A(jp)Ujp−1, 1 ≤ j < g.

The (2g + 1)-th component of Ujp is

hjp =wj

p,

where

wj =

f2g+1

(

(2g + 1)p + 1

2− jp

)

f0jhjp−2g−1

+ · · · +

f1

(

p + 1

2− jp

)

f0jhjp−1 ∈ Zp.

If wj is computed over Z/pgZ, then the significant pre-

cision of hjp is reduced from g to g−1, i.e. hjp should bein Z/pg−1

Z. Note that hjp is in Zp, because hjp is the

coefficient in (F (X))p−1

2 . Therefore, if Ujp is given, thenUkp for k ≥ j can be computed with the same preci-sion as Ujp, whose precision is lower than Ujp−1. On theother hand, the components in the Hasse-Witt matrix

– 18 –


can be obtained from the components of

Ujp = t(

hjp−2g · · · hjp−1 hjp

)

for 1 ≤ j < g. Therefore, one can reduce the precisionby computing Up, U2p, . . . , U(g−1)p and Ugp−1 as follows.

In order to obtain the Hasse-Witt matrix over Fp , itis enough to compute

Ugp−1 = A(gp − 1) · · ·A ((g − 1)p + 1) U(g−1)p

over Z/pZ. Moreover, since the division by p occurs atthe computation in U(g−1)p = A((g − 1)p)U(g−1)p−1, itis enough to compute

U(g−1)p = A ((g − 1)p) · · ·A ((g − 2)p + 1) U(g−2)p

over Z/p2Z. Similarly, it is enough to compute

Ujp = A (jp) · · ·A ((j − 1)p + 1) U(j−1)p

over Z/pg−j+1Z for 1 ≤ j < g. Consequently, one can

reduce the precision by computing the vectors such that

Up = A(p) · · ·A(1)U0 over Z/pgZ,

U2p = A(2p) · · ·A(p+1)Up over Z/pg−1Z,

...

Ugp−1 = A(gp− 1) · · ·A ((g − 1)p + 1) U(g−1)p over Z/pZ

for obtaining the Hasse-Witt matrix.

3.2 Using the reversal of F (X)

The precision of the p-adic integers in Section 3.1 canbe further reduced by using the reversal of F (X). Thissection shows a method to compute the Hasse-Witt ma-trix using the reversal and estimates the efficiency of themethod.

Let the reversal [10, p. 254] of F (X) denoted by

rev(F (X)) := Xdeg F F (1/X).

Then, we can see that

rev((F (X))p−1

2 ) = (rev(F (X)))p−1

2 . (6)

Therefore, the coefficients of the higher degree in

(F (X))p−1

2 can be obtained by computing the coeffi-

cients of the lower degree in (rev(F (X)))p−1

2 using alinear recurrence similar to (5).

Let A(X) denote a (2g + 1) × (2g + 1) matrix with

the components defined by (2) for (rev(F (X)))p−1

2 sim-

ilar to (3). Let hk denote the coefficient of Xk in

(rev(F (X)))p−1

2 and a (2g +1)-dimensional column vec-

tor Uk be

Uk = t(

hk−2g · · · hk−1 hk

)

similar to (4). Since (6) implies

h (2g+1)(p−1)

2−i

= hi

for 0 ≤ i ≤ (2g + 1)(p − 1)/2, one can obtain the g-thcolumn components hgp−1, hgp−2, · · · and hgp−g in theHasse-Witt matrix by computing

U p−1

2

= t(

h p−1

2−2g · · · h p−1

2−1 h p−1

2

)

= t(

hgp+g · · · hgp−g+1 hgp−g

)

from U0 = t ( 0 · · · 0 fp−1

2

2g+1) using the linear re-

currence. Similarly, one can obtain the other compo-nents in the Hasse-Witt matrix by computing U p−1

2+ip

for 0 < i < ⌊g/2⌋. Consequently, one can obtain theHasse-Witt matrix by computing

Up, U2p, · · · , U⌈ g

2 ⌉p−1

and U p−1

2

, U p−1

2+p, · · · , U⌊ g

2 ⌋p−p+1

2

,

instead of computing Up, · · · , U(g−1)p, Ugp−1. Applyingthe result in Section 3.1 to these vectors, one can reducethe precision for obtaining the Hasse-Witt matrix. Thatis, one can compute

Up over Z/p⌈g

2 ⌉Z, · · · , U⌈ g

2 ⌉p−1 over Z/pZ

for F (X) and

U p−1

2

over Z/p⌊g

2 ⌋Z, · · · , U⌊ g

2 ⌋p−p+1

2

over Z/pZ

for rev(F (X)).In the following, we roughly estimate the efficiency

of the proposed method for g = 2 and 3. Let S(j) de-note the cost in bit operations of computing Uk+p fromUk over Z/pj

Z for any integer k ≥ 0. In the followingdiscussion, we assume that a multiplication in Z/pj

Z

costs mjα for a constant m and a real number α with1 < α ≤ 2.

In the case of g = 2, the proposed method computesUp−1 from U0 over Z/pZ within S(1) bit operations

and U p−1

2

from U0 over Z/pZ within S(1) bit opera-

tions. So, the proposed method needs about 2S(1) bitoperations. On the other hand, the previous methodin [3, 5] computes Up−1 from U0 over Z/p2

Z withinS(2) bit operations and U2p−1 from Up−1 over Z/p2

Z

within S(2) bit operations. So, the previous methodneeds about 2S(2) bit operations in total. Consequently,the proposed method can compute the Hasse-Witt ma-trix S(2)/S(1) times faster than the previous method.Assuming that S(j) is dominated by multiplications inZ/pj

Z, we have S(2)/S(1) = 2α/1α = 2α. Therefore, wecan expect that the proposed method is about 2 to 4times faster than the previous method.

In the case of g = 3, the proposed method computesUp from U0 over Z/p2

Z within S(2) bit operations, U2p−1

from Up over Z/pZ within S(1) bit operations and U p−1

2

from U0 over Z/pZ within S(1) bit operations. So, theproposed method needs about S(2) + 2S(1) bit oper-ations. On the other hand, the previous method com-putes Up−1 from U0 over Z/p3

Z within S(3) bit opera-tions, U2p−1 from Up−1 over Z/p3

Z within S(3) bit oper-ations and U3p−1 from U2p−1 over Z/p3

Z within S(3) bitoperations. So, the previous method needs about 3S(3)bit operations. Consequently, the proposed method cancompute the Hasse-Witt matrix 3S(3)/(S(2) + 2S(1))times faster than the previous method. Assuming thatS(j) is dominated by multiplications in Z/pj

Z, we have3S(3)/(S(2)+2S(1)) = 3 ·3α/(2α +2 ·1α) = 3α+1/(2α +2). Therefore, we can expect that the proposed method

– 19 –


is about 2 to 4.5 times faster than the previous method.

4. Experimental results

This section shows experimental results of the compu-tation of the Hasse-Witt matrices of hyperelliptic curvesof genus 2 and 3 by using the proposed method in Sec-tion 3.2.

We implemented the algorithm shown in [3] usingMagma V2.15-10 [11]. We computed the Hasse-Witt ma-trices of hyperelliptic curves defined over Fp for 16 ≤log2 p ≤ 32 by using the proposed method. We also com-puted the Hasse-Witt matrices with the fixed precisionshown in [3] for comparison. These experiments were runon an AMD Opteron 246 2.0GHz.

Fig. 1 shows the result for hyperelliptic curves of genus2. The vertical axis denotes time in seconds to computethe Hasse-Witt matrix, and the horizontal axis denotesbit length of p. In Fig. 1, “Original” denotes the timeto compute the Hasse-Witt matrix by using the methodin [3], and “This work” denotes the time to computethe Hasse-Witt matrix by using the proposed method inSection 3.2. Similarly, Fig. 2 shows the result for hyper-elliptic curves of genus 3.

The result for genus 2 hyperelliptic curves shows thatthe proposed method can compute the Hasse-Witt ma-trix about 1.5 to 2.2 times faster than the previousmethod. The result for genus 3 hyperelliptic curvesshows that the proposed method can compute the Hasse-Witt matrix about 1.6 to 2.0 times faster than the pre-vious method. These results show that the proposedmethod more efficiently compute the Hasse-Witt matri-ces than the previous method.

However, the ratios of the proposed method to theprevious method in the results are smaller than the es-timations in Section 3.2. One of the reasons for the dif-ference is that the cost of multiplications in Z/pj

Z isstrongly affected by word operations rather than bit op-erations, because the bit length of p is less than the sizeof a word on the CPU so that a multiplication in Z/pj

Z

is executed in a few words for 1 ≤ j ≤ 3.

5. Conclusion

This paper proposes improvements in the computa-tion of the Hasse-Witt matrix of a hyperelliptic curveusing a linear recurrence. The proposed method usesthe reversal of a polynomial in order to reduce the pre-cision of the p-adic integers for computing the Hasse-Witt matrix, so that the proposed method can speedup the computation of the Hasse-Witt matrix. The ex-perimental results show that the proposed method cancompute the Hasse-Witt matrices of hyperelliptic curvesof genus 2 about 1.5 to 2.2 times faster than the previousmethod [3] and the proposed method can compute theHasse-Witt matrices of hyperelliptic curves of genus 3about 1.6 to 2.0 times faster than the previous method.

References

[1] P. Gaudry and R. Harley, Counting points on hyperellipticcurves over finite fields, in: Proc. of the 4th Int. Symposiumon Algorithmic Number Theory, W. Bosma ed., Lect. Notes

0

2000

4000

6000

20 25 30

Timing [sec.]

log2(p) [bits]

Original

This work

Fig. 1. Time for computing the Hasse-Witt matrices of hyper-elliptic curves of genus 2 using Magma on AMD Opteron 246

2GHz.

0

5000

10000

15000

20000

20 25 30

Timing [sec.]

log2(p) [bits]

Original

This work

Fig. 2. Time for computing the Hasse-Witt matrices of hyper-

elliptic curves of genus 3 using Magma on AMD Opteron 2462GHz.

Comput. Sci., Vol. 1838, pp. 313–332, Springer-Verlag, Berlin,2000.

[2] K. Matsuo, J. Chao and S. Tsujii, Baby step giant step al-

gorithms in point counting of hyperelliptic curves, IEICETrans., E86-A (2003), 1127–1134.

[3] A. Bostan, P. Gaudry and E. Schost, Linear recurrences

with polynomial coefficients and computation of the Cartier-Manin operator on hyperelliptic curves, in: Proc. of FiniteFields and Applications: 7th Int. Conf., Fq7, G. L. Mullen,A. Poli, and H. Stichtenoth, eds., Lect. Notes Comput. Sci.,Vol. 2948, pp. 40–58, Springer-Verlag, Berlin, 2004.

[4] M. Bauer, E. Teske and A. Weng, Point counting on Picardcurves in large characteristic, Math. Comp., 74 (2005), 1983–2005.

[5] A. Bostan, P. Gaudry and E. Schost, Linear recurrences withpolynomial coefficients and application to integer factor-ization and Cartier-Manin operator, SIAM J. Comput., 36

(2007), 1777–1806.[6] D. V. Chudnovsky and G. V. Chudnovsky, Approximations

and complex multiplication according to Ramanujan, in: Ra-manujan Revisited, pp. 375–472, Academic Press, Boston,1988.

[7] J. I. Manin, The Hasse-Witt matrix of an algebraic curve,Trans. AMS, 45 (1965), 245–264.

[8] L. Euler, Introduction to Analysis of the Infinite, Book I(translation by J. D. Blanton), Springer-Verlag, New York,1988.

[9] P. Flajolet and B. Salvy, The SIGSAM challenges: symbolic

asymptotics in practice, ACM SIGSAM Bull., 31 (1997) 36–47.

[10] J. von zur Gathen and J. Gerhard, Modern Computer Alge-bra, 2nd Edition, Cambridge Univ. Press, Cambridge, 2003.

[11] The Magma computational algebra system, http://magma.maths.usyd.edu.au/magma/.

– 20 –


On the convergence of the V-type hyperplane constrained

method for singular value decomposition

Kenichi Yadani1, Koichi Kondo2 and Masashi Iwasaki3

Graduate School of Informatics, Kyoto University, Yoshida-Hommachi, Sakyo-ku, Kyoto 606-8501, Japan1

Faculty of Science and Engineering, Doshisha University, 1-3, Tatara Miyakodani, KyotanabeCity, Kyoto 610-0394, Japan2

Department of Informatics and Environmental Science, Kyoto Prefectural University, 1-5,Nagaragi-cho, Shimogamo, Sakyo-ku, Kyoto 606-8522, Japan3


Received November 14, 2009, Accepted January 23, 2010

Abstract

In this paper, we investigate the convergence of the V-type hyperplane constrained methodfor singular value decomposition. The V-type method involves employing the Newton typeiteration to solve the nonlinear systems with the searching range of right singular vectorsconstrained on a hyperplane. First, we discuss the nonsingularity of the Jacobian matrixappearing in the Newton type iteration. Next, we clarify the convergence of the Newton typeiteration. Finally, we prove that singular value decomposition is computable by the V-typemethod.

Keywords singular value decomposition, nonlinear system, Newton’s iterative method

Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications

1. Introduction

For a rectangular matrix A ∈ Rm×n with m ≤ n,there exist suitable orthogonal matrices U = (u1 · · ·um) ∈ Rm×m and V = (v1 · · · vn) ∈ Rn×n such that

A = UΣV T, Σ = (S Om,n−m), S = (σiδij)i,j=1,...,m

(1)

where V T , Om,n−m and δij denote the transpose of V ,the m × (n − m) zero matrix and the Kronecker delta,respectively. The nonnegative σk are called singular val-ues. The columns of uk in U and vk in V are called leftand right singular vectors, respectively. Eq. (1) is thencalled the singular value decomposition (SVD) of A. Inthis paper, for simplicity, let us call the pairs (σk,uk,vk)for k = 1, . . . ,m the singular pairs. The SVD of A withm > n has the same form as that of AT with m ≤ n. So,it is enough to discuss the case where m ≤ n.

In [1], some of the authors proposed six types of SVDmethods, collectively named as SN-SVD. The six typesof SN-SVD are based on solving the nonlinear systemsrelated to SVD. Of them, three types employ the Newtontype iteration for solving the nonlinear systems. The restmake use of the inverse iteration instead of the Newtontype iteration. The methods using the Newton type it-eration are named in [1] as the PJU-type, the PJV-typeand the SJ-type methods. In particular, the solution ofevery nonlinear system appearing in the PJU-type andthe PJV-type methods is strictly constrained on a hyper-plane. Hence, the PJU-type method is also called the hy-perplane constrained method [2,3]. The searching rangeof the left singular vector uk is directly constrained inthe PJU-type. The PJV-type, however, differs from the

PJU-type in that the searching range of the right sin-gular vector vk is constrained. Although the PJU-typeand the PJV-type methods should be precisely called theU-type and the V-type hyperplane constrained method,respectively, in this paper, we simply call them the U-type method and the V-type method.

Several numerical experiments in [1] demonstrate thatthe U-type and the V-type are superior to the othertypes with respect to numerical accuracy. Some conver-gence theorems of the U-type method are proved in [2]. Itis shown in [3] that the U-type method generates moreaccurate SVD than some standard methods. A hybridmethod of the U-type method is also proposed. The V-type method, however, has not been investigated, exceptfor the elementary numerical results. The aim of this pa-per is to theoretically discuss some of the basic proper-ties related to the convergence of the V-type method.

2. The V-type Newton type iteration

In this section, we first explain an iterative methodemployed in the V-type method for a singular pair, andnext we investigate some of its properties.

Let us begin our analysis by considering the nonlinearsystem,

Av = σu, AT u = σv, ‖u‖2 = 1, ‖v‖2 = 1, (2)

Avk = 0, k = m + 1, . . . , n. (3)

Note here that the solutions of (2) become the sin-gular pairs (σk,uk,vk) for k = 1, . . . ,m. Let Vk =span(v1, . . . ,vk). Moreover, let V⊥

k denotes the orthogo-nal complement of Vk. Then all of vk in (3) are given bythe orthonormal basis of V⊥

m. For example, the singular

– 21 –

JSIAM Letters Vol. 2 (2010) pp.21–24 Kenichi Yadani et al.

vectors vm+1, . . . ,vn are computable by employing theGram-Schmidt method if v1, . . . ,vm are as given in (2).Therefore, in this paper, we mainly discuss an iterativemethod for computing the solutions of (2).

Let us replace (2) with the nonlinear system

Av = σu, AT u = σv, (z,v) = C, (4)

with an arbitrary vector z ∈ Rn and a constant C ∈R\0, where (z,v) = zT v. Then we immediately havethe following theorem.

Theorem 1 If (z,vk) 6= 0 for k = 1, . . . ,m, the

solutions of (4) are (σ,u,v) = (σk, αkuk, αkvk) for

k = 1, . . . ,m, where αk = C/(z,vk).

From the 2nd and the 3rd equations in (4), it is obvi-ous that σ is a function of u, namely, σ = σ(u) = (w,u)C−1,w = Az. Hence, (4) yields the nonlinear system,

H(x) = 0, (5)

H(x) :=

(

Av − σ(u)uAT u − σ(u)v

)

, x :=

(

u

v

)

. (6)

Applying Newton’s iterative method to (5), we obtainthe iteration,

x(ℓ+1) = Φ(x(ℓ)), ℓ = 0, 1, . . . , (7)

x(ℓ) :=

(

u(ℓ)

v(ℓ)

)

, (8)

Φ(x) := x − (J(x))−1H(x), (9)

J(x) :=

(

−σ(u)Im − uwT C−1 AAT − vwT C−1 −σ(u)In

)

. (10)

Let Γ be the hyperplane Γ = v | (z,v) = C. Then atheorem for the existing range of v(ℓ) is as follows.

Theorem 2 If v(0) ∈ Γ, namely, (z,v(0)) = C, then

v(ℓ) ∈ Γ for ℓ = 1, 2, . . . in (7).

Proof From (7) and (9), it is obvious that J(x(ℓ))(x(ℓ+1) − x(ℓ)) = −H(x(ℓ)). Combining it with (6), (8)and (10), we derive(

−σ(u(ℓ))Im−u(ℓ)wT

C

)

u(ℓ+1)+Av(ℓ+1)=−σ(u(ℓ))u(ℓ),

(

AT −v(ℓ)wT

C

)

u(ℓ+1) − σ(u(ℓ))v(ℓ+1) = −σ(u(ℓ))v(ℓ).

(11)

The elimination of v(ℓ+1) from the above two leads tou(ℓ+1) = −σ(u(ℓ))(B + pqT )−1p, where B = AAT −σ(u(ℓ))2Im,p = Av(ℓ) + σ(u(ℓ))u(ℓ) and q = −wC−1.With the help of the Sherman-Morrison formula (cf. [4]),

(B + pqT )−1 =

(

Im −B−1pqT

1 + (q, B−1p)

)

B−1,

we derive

u(ℓ+1) =−σ(u(ℓ))u(ℓ)

1 − (w, u(ℓ))C−1, u(ℓ) = B−1p. (12)

Substituting (12) into (11), we have

v(ℓ+1) =v(ℓ) − AT u(ℓ)

1− (w, u(ℓ))C−1. (13)

Suppose that v(ℓ) ∈ Γ for some ℓ, namely, (z,v(ℓ)) = C,

then from (13) and w = Az, it follows that (z,v(ℓ+1)) =(C − (w, u(ℓ)))/(1 − (w, u(ℓ))C−1) = C, which impliesthat v(ℓ+1) ∈ Γ. By induction, it is concluded thatv(ℓ+1) ∈ Γ for ℓ = 1, 2, . . . if v(0) ∈ Γ.

(QED)

In [1], some of the authors propose the iteration

x(ℓ+1) = Ψ(Φ(x(ℓ))), ℓ = 0, 1, . . . , (14)

Ψ(x) :=C

(z,v)x

in place of (7). We call the iteration (14) the V-typeNewton type iteration. The function Ψ is employed forscaling the singular vectors in order to keep v(ℓ) ∈ Γfor ℓ = 1, 2, . . . . However, it turns out from Theorem 2that Ψ acts as an identity mapping. In other words, theNewton type iteration (14) is theoretically reduced tothe iteration (7).

Theorem 3 If v(0) ∈ Γ, then the Newton type iteration

(14) is equivalent to Newton’s iteration (7) for ℓ = 0,1, . . . .

In the case of Newton type iteration (14), it is necessaryto compute the inverse of the Jacobian matrix J(x). Ac-cordingly, it is important to guarantee the nonsingularityand the boundedness of J(x). Some properties of J(x)are shown in the following theorems.

Theorem 4 The Jacobian matrix J(x) is nonsingular

if and only if

σ(u)n−m

−m

∑

i=1

[

σi(vi,z)(σi(vi,v) + σ(u)(ui,u))

×∏

j 6=i

(σ2j − σ(u)2)

]

C−1 +m∏

i=1

(σ2i − σ(u)2)

6= 0,

where∏

j 6=i denotes the product over j = 1, . . . , i−1, i+1, . . . ,m.

Proof As a formula for the determinant of a block ma-trix, it is shown in [4] that

det

(

B1 B2

B3 B4

)

= det(B4) det(B1 − B2B−14 B3). (15)

Let B1 = −σ(u)Im − uwT C−1, B2 = A,B3 = AT −vwT C−1 and B4 = −σ(u)In in (15), then the left-handside is just equal to the determinant of J(x) in (10).Hence it follows that

det(J(x)) = (−1)nσ(u)n−mD, (16)

D := det(

AAT − σ(u)2Im − (Av + σ(u)u)wT C−1)

.

Let UT u = u, V T v = v and V T z = z. Note here thatAAT = US2UT , Av = UΣv and w = Az = UΣz. Bytaking into account that det(UUT ) = 1, we derive

D = det(

S2 − σ(u)2Im − (Σv + σ(u)u)zT ΣT C−1)

.

Let B = S2 − σ(u)2Im, p = Σv + σ(u)u and q =−ΣzC−1. Then D becomes det(B + pqT ). From theformula for rank one update (cf. [4]), namely, det(B +pqT ) = det(B)(1 + (q, B−1p)), we have

D = det(B) + (q, adj(B)p), (17)

where adj(B) is the adjugate matrix of B. Since it is ob-

– 22 –


vious that B is the diagonal matrix with the diagonal en-tries σ2

i −σ(u)2, adj(B) is also diagonal with the (i, i)thelement given by

∏

j 6=i(σ2j −σ(u)2). Noting that the ith

entries of u, v and z are given by (ui,u), (vi,v) and(vi,z), respectively, the ith entries of p and q becomeσi(vi,v) + σ(u)(ui,u) and −σi(vi,z)C−1, respectively.Hence, from (17), it follows that

D = −m

∑

i=1

[

σi(vi,z)(σi(vi,v) + σ(u)(ui,u))

×∏

j 6=i

(σ2j − σ(u)2)

]

C−1+

m∏

i=1

(σ2i − σ(u)2). (18)

Thus, it is proved that the theorem holds.(QED)

In the case where x = αkxk, we have a particulartheorem concerning the nonsingularity of J(x).

Theorem 5 If the singular values of A are distinct and

not zero, then J(αkxk) for k = 1, . . . ,m are nonsingular.

Proof Let x = αkxk in (16), namely, u = αkuk andv = αkvk where αk = C/(z,vk). By taking into accountthat σ(αkuk) = σk, (ui, αkuk) = αkδik, (vi, αkvk) =αkδik and (vk,z) = C/αk in (18), we have det(J(αkxk))= 2(−1)n+1σn−m+2

k

∏

j 6=k(σ2j − σ2

k). Since σk 6= 0 andσj 6= σk, it is obvious that the right-hand side is notzero. Hence J(αkxk) is nonsingular.

(QED)

The inequality ‖(J(αkxk))−1‖ ≤ ‖ adj(J(αkxk))‖ /|det(J(αkxk))| and the boundedness of adj(J(αkxk))yield the following theorem for the boundedness of(J(αkxk))−1.

Theorem 6 If the singular values of A are distinct and

not zero, then there exists a positive constant M1 such

that

‖(J(αkxk))−1‖ ≤ M1. (19)

It is well-known that Newton’s iterative method hasthe quadratic convergence under some regularity as-sumptions. The convergence theorem for (7) or (14) isas follows.

Theorem 7 Suppose that (z,vk) 6= 0, v(0) ∈ Γ and

J(x(ℓ)) are nonsingular for ℓ = 0, 1, . . . , where x(0) is

close to αkxk, then the vector sequence x(ℓ)∞ℓ=0 gen-

erated by (7) or (14) converges to αkxk quadratically as

ℓ → ∞.

Proof From Theorem 3, the iteration (14) is equivalentto (7). We prove the convergence of (7).

Let ∆x = (∆uT ∆vT )T . Using (6), H(x + ∆x) canbe rewritten as

H(x + ∆x) =

(

A(v + ∆v) − σ(u + ∆u)(u + ∆u)AT (u + ∆u) − σ(u + ∆u)(v + ∆v)

)

.

Note that σ(u + ∆u) = σ(u) + σ(∆u) and σ(∆u) =(wT C−1)∆u. Then it follows from (10) that

H(x + ∆x) = H(x) + J(x)∆x − σ(∆u)∆x. (20)

Moreover, J(x + ∆x) is expressed as

J(x + ∆x) = J(x) − J ′(∆x), (21)

J ′(∆x) := σ(∆u)Im+n + ∆x(wT C−1 O1,n). (22)

Eq. (9) immediately gives us

Φ(x+∆x) = x+∆x−(J(x+∆x))−1H(x+∆x). (23)

From (21), the inverse of J(x + ∆x) becomes

J(x + ∆x))−1 = [Im+n − (J(x))−1J ′(∆x)]−1(J(x))−1.(24)

The formula (I − X)−1 =∑∞

i=0 Xi yields

(J(x + ∆x))−1

= (J(x))−1 + (J(x))−1J ′(∆x)(J(x))−1

+ [(J(x))−1J ′(∆x)]2(J(x))−1 + O(‖∆x‖3).

Here, O(‖∆x‖k) denotes an (m+n) dimensional vectorwhose entries are O(‖∆x‖k). Substituting (20) and (24)into (23), we derive

Φ(x + ∆x)

= Φ(x) − (J(x))−1J ′(∆x)(J(x))−1H(x)

+ σ(∆u)(J(x))−1∆x − (J(x))−1J ′(∆x)∆x

− [(J(x))−1J ′(∆x)]2(J(x))−1H(x)+O(‖∆x‖3).

Let x = αkxk, ∆x = ∆x(ℓ) := x(ℓ) − αkxk and notingthat H(αkxk) = 0,Φ(αkxk) = αkxk, then from (22),we have

Φ(x(ℓ)) = αkxk + ∆Φ(ℓ), (25)

where ∆Φ(ℓ) = −σ(∆u(ℓ))(J(αkxk))−1∆x(ℓ) + O(‖∆x(ℓ)‖3) and ∆x(ℓ) = ((∆u(ℓ))T (∆v(ℓ))T )T .

Note here that there exists a positive constant M2

such that∣

∣

∣σ(∆u(ℓ))

∣

∣

∣=

∣

∣

∣

∣

∣

(

w

On,1

)T

∆x(ℓ)C−1

∣

∣

∣

∣

∣

≤ M2‖∆x(ℓ)‖. (26)

Therefore, from (7), (19), (25) and (26), it is concludedthat ‖x(ℓ+1) − αkxk‖ ≤ M1M2‖x

(ℓ) − αkxk‖2. This

proves the quadratic convergence of (7).(QED)

Let us recall that from Theorem 2, v(ℓ) for all ℓ givenin (7) exist on Γ. On computer, however, v(ℓ) are notalways on Γ because of numerical error. Let us here in-troduce the following Newton type iteration

x(ℓ+1) = Ψ(Φ(x(ℓ)) + e(ℓ)), ℓ = 0, 1, . . . . (27)

where e(ℓ) ∈ Rm+n. In (27), Ψ plays an important rolefor keeping v(ℓ) ∈ Γ. The definition of Ψ immediatelyleads to the following theorem.

Theorem 8 The vectors v(ℓ) for ℓ = 1, 2, . . . in (27)satisfy v(ℓ) ∈ Γ.

It is remarkable that (27) does not have the same con-vergence as (7) or (14). Therefore, we obtain the follow-ing theorem for (27).

Theorem 9 Suppose that (z,vk) 6= 0 and J(x(ℓ)) are

nonsingular for ℓ = 0, 1, . . . . Let M3 be a positive con-

stant such that∥

∥Im+n − αkxkzT C−1∥

∥ ≤ M3, (28)

where z = (OTm,1 zT )T . If x(0) is close to αkxk and e(ℓ)

satisfies the condition that either (i) there exists a posi-

tive constant M4 such that ‖e(ℓ)‖ ≤ M4‖x(ℓ)−αkxk‖ for

ℓ = 0, 1, . . . where M3M4 < 1, or (ii) there exists a pos-

– 23 –


itive constant M5 such that ‖e(ℓ)‖ ≤ M5‖x(ℓ) − αkxk‖

2

for ℓ = 0, 1, . . . , then the vector sequence x(ℓ)∞ℓ=0 gen-

erated by (27) converges to αkxk as ℓ → ∞. The order

of convergence is either linear if (i) holds, or quadratic

if (ii) holds.

Proof Let us evaluate Ψ(x+∆x). Since (z,v+∆v) =(z,v)[1 + (z,∆v)/(z,v)], it holds that

C

(z,v + ∆v)=

C

(z,v)

[

1 −(z,∆v)

(z,v)+ O(‖∆x‖2)

]

.

Then we obtain

Ψ(x + ∆x) =C

(z,v)

[

x +

(

Im+n −x

(z,v)zT

)

∆x

]

+ O(‖∆x‖2). (29)

Substituting x = αkxk and ∆x = ∆Φ(ℓ) +e(ℓ) into (29)and using (25), (z, αkvk) = C and the assumption fore(ℓ), we derive

Ψ(Φ(x(ℓ)) + e(ℓ))

= αkxk +(Im+n−αkxkzT C−1)[

− (J(αkxk))−1∆x(ℓ)

×σ(∆u(ℓ))+e(ℓ)]

+O(‖∆x(ℓ)‖3 +‖e(ℓ)‖2). (30)

If e(ℓ) satisfies either (i) or (ii), then it follows from(19), (26), (27), (28) and (30) that ‖x(ℓ+1) − αkxk‖ ≤M3M4‖x

(ℓ)−αkxk‖, or ‖x(ℓ+1)−αkxk‖ ≤ M3(M1M2 +M5)‖x

(ℓ) − αkxk‖2, which implies the linear or the

quadratic convergence of (27).(QED)

3. The V-type hyperplane constrained

method

In this section, we show the convergence of the SVDmethod by using the V-type Newton type iteration se-quentially.

According to the following theorem, it turns out thatthe normal vector z of the hyperplane Γ is an impor-tant parameter to determine the existing range of thesolutions of (4).

Theorem 10 If z satisfies (z,vi) = 0 for i = 1, . . . , k,

then the solutions of (4) are (σ,u,v) = (σi, αiui, αivi)for i = k + 1, . . . ,m.

Proof Suppose that z satisfies (z,vi) = 0, then thepairs (σ,u,v) = (σi, αiui, αivi) do not satisfy (4) for i =1, . . . , k, since C 6= 0. Where (σ,u,v) = (σi, αiui, αivi)for i = k + 1, . . . ,m are the solutions of (4).

(QED)

In general, for Newton’s iterative method, it is noteasy to determine the initial values for computing allsolutions. Theorem 10 implies that if we select z from asuitable subspace, then we are able to restrict the limitof the vector sequence generated by the V-type Newtontype iteration.

Theorem 11 Suppose that there exists the limit x∗ =((u∗)T (v∗)T )T of x(ℓ)∞ℓ=0 generated by either (14) or

(27), for z ∈ V⊥k , then v∗ /∈ Vk for any initial value x(0).

Proof Since z ∈ V⊥k and C 6= 0, it holds that Γ∩Vk =

∅. Let us recall that v(ℓ) ∈ Γ in either Theorem 2 or 8.Then the completeness of Γ gives us v∗ ∈ Γ. Hence it isconcluded that v∗ /∈ Vk.

(QED)

In other words, all singular pairs are computable bythe V-type Newton type iteration if z is given from theorthogonal complement of the subspace spanned by thealready computed right singular vectors. Accordingly, analgorithm of the V-type hyperplane constrained methodis as follows.

1. for k = 1, . . . ,m2. Randomly select u(0) ∈ Rm, v(0) ∈ Rn, z ∈ V⊥

k−1.3. Compute a solution u∗, v∗ of (5) by the V-type

Newton type iteration.4. If the V-type Newton type iteration does not

converge, then go to 2.5. Set (σk,uk,vk) = (σ(u∗),u∗/‖u∗‖2,v

∗/‖v∗‖2).6. end for7. Compute the orthonormal basis of V⊥

m and adoptthem as vm+1, . . . ,vn.

With the help of Theorem 11, we obtain a theorem forthe convergence of the V-type hyperplane constrainedmethod for SVD.

Theorem 12 If all the V-type Newton type iterations

converge, then the V-type hyperplane constrained method

for SVD computes the SVD of A.

4. Conclusion

In this paper, we have discussed the convergence of theV-type hyperplane constrained method for SVD, whichemploys the V-type Newton type iteration for computingeach singular pair. First, we have clarified the nonsingu-larity of the Jacobian matrix appearing in the Newtontype iteration, and then showed the convergence of theNewton type iteration without and with numerical error.

Next, we have proved that the computed right sin-gular vector differs from any of those obtained in thepreceding iteration steps, if the normal vector of the hy-perplane is selected from the orthogonal complement ofthe subspace spanned by the right singular vectors fromthe preceding steps. Finally, we have reached the conclu-sion that SVD is computable by the V-type hyperplaneconstrained method.

Acknowledgments

The authors would like to thank the reviewer for hisor her careful reading and insightful suggestions.

References

[1] K. Kondo, S. Sugimoto and M. Iwasaki, An SVD algorithmbased on solving nonlinear systems (in Japanese), Trans.

JSIAM, 19 (2009), 81–103.[2] K. Yadani, K. Kondo and M. Iwasaki, A singular value decom-

position algorithm based on solving hyperplane constrained

nonlinear systems, Appl.Math.Comput., 216 (2010), 779–790.[3] K. Yadani, K. Kondo and M. Iwasaki, Numerical performance

of hyperplane constrained method and its hybrid method for

singular value decomposition, submitted.

[4] R. Piziak and P. L. Odell, Matrix Theory : From GeneralizedInverses to Jordan Form, Chapman & Hall/CRC, Boca Raton,

2007.

– 24 –


Mixed double-multiple precision version of hyperplane

constrained method for singular value decomposition

Kenichi Yadani1, Koichi Kondo2 and Masashi Iwasaki3

1 Graduate School of Informatics, Kyoto University, Yoshida-Hommachi, Sakyo-ku, Kyoto 606-8501, Japan

2 Faculty of Science and Engineering, Doshisha University, 1-3, Tatara Miyakodani, KyotanabeCity, Kyoto 610-0394, Japan

3 Department of Informatics and Environmental Science, Kyoto Prefectural University, 1-5,Nagaragi-cho, Shimogamo, Sakyo-ku, Kyoto 606-8522, Japan

E-mail yadani amp.i.kyoto-u.ac.jp

Received December 19, 2009, Accepted January 24, 2010

Abstract

In this paper, we design a mixed double-multiple precision version of the hyperplane con-strained method for singular value decomposition (SVD), which is based on solving nonlinearsystems with the solutions constrained on hyperplanes. We also propose its hybrid method inorder to shorten the running time. Through some numerical examples for matrices with smallsingular values, it is shown that, by new versions, the SVD is computable with high relativeaccuracy.

Keywords mixed precision, singular value decomposition, hyperplane constrained method


1. Introduction

The singular value decomposition (SVD) of a rectan-gular matrix A ∈ Rm×n with m ≤ n is defined as

A = UΣV T , Σ = (diag(σ1, σ2, . . . , σm)Om,n−m),

U = (u1 u2 · · · um), V = (v1 v2 · · · vn),

where σk are positive, and U, V are orthogonal squarematrices. We call σk, uk and vk the singular value, theleft singular vector and the right singular vector, re-spectively. The pairs (σk, uk, vk) for k = 1, 2, . . . , m arecalled singular pairs for simplicity.

In [1], a numerical SVD method named the hyperplaneconstrained method is proposed. This method generatesthe SVD of given matrix by solving the nonlinear sys-tems whose solutions are constrained on hyperplanes.In [2,3], we show the convergence of the hyperplane con-strained method. In [4], we demonstrate that the com-puted SVD is highly accurate in terms of residual error.We also propose the hybrid method of the hyperplaneconstrained method and other fast SVD method in or-der to shorten the running time. The purpose of thispaper is to propose the mixed double-multiple precisionversions of the hyperplane constrained method and itshybrid method for getting more accurate SVD of matrix,which is given with double precision format.

In Section 2, we first explain the original version ofthe hyperplane constrained method. By numerical ex-amples, it is shown that the computed singular values bythe original version are highly accurate in terms of abso-lute error. In double precision arithmetic, however, it isexpected that the computed singular values are not al-ways with high relative accuracy. In Section 3, we design

a mixed precision version of the hyperplane constrainedmethod. Some numerical examples are also shown. InSection 4, we propose a mixed precision version of thehybrid method, which is combining the hyperplane con-strained method with other fast SVD method. Throughnumerical examples, it is observed that, in mixed preci-sion arithmetic, the hybrid method runs faster than thehyperplane constrained method with keeping the accu-racy. In Section 5, we conclude this paper.

2. Hyperplane constrained method

We start from introducing the original hyperplaneconstrained method for SVD proposed in [1, 2, 4].

The original version employs the Newton type itera-tion for solving the nonlinear system with arbitrary pa-rameters C ∈ R\0 and z ∈ Rm,

H(u, v) = 0, (1)

where

H(u, v) =

(

Av − σ(v)uAT u − σ(v)v

)

,

σ(v) =wT v

C, w = AT z. (2)

Here, the solution of (1) becomes (u, v) = (αkuk, αkvk)with αk = C/(z, uk). And then σ(αkvk) becomes thesingular value of A. Note that αkuk lies on the hyper-plane Γ = u | (z, u) = C. One of the singular pairs isthen given by the following Newton type iteration.

1. function (σ, u, v, ℓ) = hppair(A, z, u, v, C, ℓmax)2. w := AT z

3. α := C/(zT u); u := αu; v := αv

– 25 –


4. σ := wT v/C5. ℓ := 06. while (ℓ ≤ ℓmax) do

7. H :=

(

Av − σu

AT u − σv

)

8. J :=

(

−σIm A − uwT C−1

AT −σIn − vwT C−1

)

9. e := J−1H

10.

(

u

v

)

:=

(

u

v

)

− e

11. α := C/(zT u); u := αu; v := αv

12. σ := wT v/C13. if (‖e‖∞ < 2−46‖u‖∞) then break; end if14. ℓ := ℓ + 115. end while16. u := u/‖u‖2; v := v/‖v‖2; σ := uT (Av)17. if (σ < 0) then σ := −σ; v := −v; end if

Moreover, using hppair with suitable z, all singularpairs are computed as follows.

1. function (Σ, U, V ) = hpsvd(A)2. C := 1; ℓmax := 303. for k = 1, 2, . . . , m do4. z := select(U⊥

k−1)

5. u := select(U⊥k−1); v := select(V⊥

k−1)6. (σk, uk, vk, ℓ) := hppair(A, z, u, v, C, ℓmax)7. if (ℓ > ℓmax) then8. uk :=gs(uk,Uk−1); vk :=gs(vk,Vk−1)9. σk := uT

k (Avk)10. end if11. end for12. (vm+1, vm+2, . . . , vn) := kernel(Vm)13. Σ := (diag(σ1, σ2, . . . , σm)Om,n−m)14. U := (u1 u2 · · · um); V := (v1 v2 · · · vn)

Let Uk = span(u1, u2, . . . , uk), Vk = span(v1, v2, . . . ,vk). The functions select(U⊥

k−1) and select(V⊥k−1)

generate the vectors, which are randomly selected fromthe orthogonal complements of Uk−1 and Vk−1 with thehelp of the Gram-Schmidt process, respectively. Thefunctions gs(uk,Uk−1) and gs(vk,Vk−1) orthonormalizeuk and vk to u1, u2, . . . , uk−1 and v1, v2, . . . , vk−1by the Gram-Schmidt process, respectively. The func-tion kernel(Vm) returns the orthonormal basis of theorthogonal complement of Vm. It is proved in [2] that theiteration from the 6th line to the 15th line in hppair hasquadratic convergence and the SVD of A is computableby hpsvd.

We next show some numerical examples for hpsvd. Wehere prepare the test matrices A1 and A2 with doubleprecision as Ai = U(Si Om,n−m)V T , where

S1 = diag

(

1, ε +(m − 2)(1 − ε)

m − 1, . . . , ε +

1 − ε

m − 1, ε

)

,

S2 = diag(1, ε1/(m−1), . . . , ε(m−2)/(m−1), ε),

and the orthogonal matrices U ∈ Rm×m, V ∈ Rn×n

are randomly given. Let us set m = 50, n = 100 andε = 10−13. Then the condition number, namely, the ratioof the maximal singular value to the minimal one, for Ai

becomes 1013. With respect to the numerical error, wecompare our hpsvd and the routine dgesvd in LAPACK

[5]. Let EH be the residual error defined as

EH =

∥

∥

∥

∥

(

Av − σu

AT u − σv

)∥

∥

∥

∥

∞

.

Moreover, let Eσ and Erσ be the absolute error and

the relative error of computed singular value σ, namely,Eσ = |σ − σ∗|, Er

σ = |σ − σ∗|/σ∗, where σ∗ denotesthe singular value with double precision format given bycasting the result from performing the hyperplane con-strained method in 224-bit precision arithmetic. We usethe multiple precision arithmetic library GMP 4.2.2 [6].Numerical experiments are carried out on our computerwith CPU: Intel Core i7 3.20GHz, memory: 3GB, OS:Linux kernel 2.6.26, compiler: gcc 4.3.2 and LAPACK3.2.1.

Figs. 1-(a) and 2-(a) show the residual errors EH . Indgesvd, the largest EH for A1 and A2 become 2−49.While, in hpsvd, all EH for both matrices are smallerthan 2−51. Figs. 1-(b) and 2-(b) give the graphs of Eσ.Though in dgesvd maxEσ = 2−49 for A1 and maxEσ =2−51 for A2, in hpsvd max Eσ = 2−53 for both matrices.It is remarkable that all singular values computed byhpsvd have 52-bit accuracy in terms of absolute error.

Figs. 1-(c) and 2-(c) show the graphs of Erσ. We ob-

serve that, in hpsvd, the relative error of the computedsmall singular value is no small in spite of the fact thatEH and Eσ are small. Let us recall that both matri-ces have no small condition number. In double precisionarithmetic, it seems to be not easy to compute the singu-lar values of such matrices with 53-bit relative accuracy.

3. Mixed precision version

In this section, we propose a mixed precision versionof the hyperplane constrained method for SVD. The keypoint is not to fix the precision of arithmetic, but toswitch between double and multiple ones skillfully. Themixed precision version of hppair is as follows.

1. function (σ,u,v) = hppair mix *(A,z,u,v,C,ℓmax)2. A := mp(A); z := mp(z); C := mp(C)3. w := AT z; w := double(w)4. u := mp(u); v := mp(v)5. α := C/(zT u); u := αu; v := αv

6. u := double(u); v := double(v)7. σ := wT v/C; σ := double(σ)8. ℓ := 09. while (ℓ ≤ ℓmax) do

10. H :=

(

Av − σu

AT u − σv

)

; H := double(H)

11. J :=

(

−σIm A − uwT C−1

AT −σIn − vwT C−1

)

12. e := J−1H ; e := mp(e)

13.

(

u

v

)

:=

(

u

v

)

− e

14. α := C/(zT u); u := αu; v := αv

15. u := double(u); v := double(v)16. σ := wT v/C; σ := double(σ)17. EH := ‖H‖∞/‖u‖∞18. if (‖e‖∞ < 2−53‖u‖∞) then break; end if19. ℓ := ℓ + 120. end while

– 26 –


21. u := u/‖u‖2; v := v/‖v‖2

22. if (σ < 0) then σ := −σ; v := −v; end if

Here, the variables with bar and without bar are storedas multiple and double precision floating point number,respectively. By the function mp, double precision num-ber is cast into multiple one. While, by the functiondouble, multiple precision number is reduced to doubleone. Of course, by double, some bits of multiple pre-cision number are eliminated. The input and output ofhppair mix * are with double precision. In this paper,we discuss two cases where the significand of multipleprecision number are 64 bits and 96 bits. As the multipleprecision arithmetic, hppair mix 64 and hppair mix 96

employ 64-bit precision arithmetic and 96-bit one, re-spectively.

The vector H is computed with multiple precision,and then reduced to the double precision format. Ittakes O((m+n)3) operations for computing e such thate = J−1H. This process, which is the dominant part ofhppair, is carried out with double precision. The com-puted e is cast to the multiple precision format, and thenu and v are updated as multiple precision vectors.

By replacing hppair with hppair mix * in hpsvd, wehave the following routine.

1. function (Σ, U, V )=hpsvd mix *(A)2. C := 1; ℓmax := 303. for k = 1, 2, . . . , m do4. z := select(U⊥

k−1)

5. u := select(U⊥k−1); v := select(V⊥

k−1)6. (σk, uk, vk) := hppair mix *(A, z, u, v, C, ℓmax)7. end for8. (vm+1, vm+2, . . . , vn) := kernel(Vm)9. Σ := (diag(σ1, σ2, . . . , σm)Om,n−m)

10. U := (u1 u2 · · · um); V := (v1 v2 · · · vn)

It is emphasized here that hpsvd mix * requires multipleprecision arithmetic in only hppair mix *. Let us recallthat hpsvd equips with the additional process for orthog-onalizing computed singular vectors in order to improvethe accuracy of them. The routine hpsvd mix * differsfrom the original hpsvd in which it does not employ theorthogonalization process.

Let Eu = ‖u − u∗‖∞/‖u∗‖2 and Ev = ‖v − v∗‖∞/‖v∗‖2 be the relative errors of computed singular vec-tors u and v, respectively. Similar to σ∗ in Section 2, wecompute u∗ and v∗ with the help of 224-bit precisionarithmetic. With respect to Er

σ, Eu and Ev, we com-pare the mixed precision hpsvd mix * with the originalhpsvd, LAPACK routine dgesvd and the routine iisvd

in [4] based on inverse iteration method.Figs. 1-(c) and 2-(c) illustrate the graphs of the rel-

ative error Erσ. In hpsvd mix 64, some of computed Er

σ

for A2 are larger than 2−53. In hpsvd mix 96, all Erσ

for both A1 and A2 become less than 2−53. Figs. 1-(d),2-(d) and 1-(e), 2-(e) show the errors Eu and Ev, respec-tively. In hpsvd mix 64, some Eu for A2 and some Ev

for A1, A2 are larger than 2−53. In hpsvd mix 96, all Eu

and Ev for A1 and A2 are smaller than 2−53. It is con-cluded that all singular pairs computed by hpsvd mix 96

have 53-bit accuracy. In hpsvd, dgesvd and iisvd, the

most of Eu and Ev for A1 and A2 are larger than 2−53.

4. Mixed precision hybrid version

In [4], the hybrid method is designed by combiningthe hyperplane constrained method with other fast SVDmethod. Based on the routine of the hybrid method, wegive the following routine as its mixed precision version.

1. function (Σ, U, V )=hybridsvd mix *(A)2. C := 1; ℓmax := 203. (σ1, . . . , σm, u1, . . . , um, v1, . . . , vm) := fast svd(A)4. for k = 1, 2, . . . , m do5. z := select(U⊥

k−1)6. (σk, uk, vk) := hppair mix *(A, z, uk, vk, C, ℓmax)7. end for8. (vm+1, vm+2, . . . , vn) := kernel(Vm)9. Σ := (diag(σ1, σ2, . . . , σm)Om,n−m)

10. U := (u1 u2 · · · um); V := (v1 v2 · · · vn)

Here the original hybrid routine hybridsvd shown in[4] is given by replacing hppair mix * with hppair inhybridsvd mix *. The routine hybridsvd mix * beginsto solve the SVD roughly by a fast SVD routine. Inthis paper, we adopt the LAPACK routine dgesvd asfast svd. The computed singular pairs by fast svd areused as the initial guesses of hppair mix *. The func-tion hppair mix * allows us to improve the accuracy ofthe computed singular pairs by fast svd. The approx-imate singular vectors by fast svd also play a key rolefor reducing the number of iterations from the 9th line tothe 20th line in hppair mix *. Similar to hpsvd mix *,the mixed precision hybrid version hybridsvd mix *

does not require multiple precision arithmetic except forhppair mix *.

From Figs. 1-(c), 1-(f), 2-(c) and 2-(f), it turnsout that the graphs of Er

σ in hybridsvd mix 64 andhybridsvd mix 96 are almost the same as those ofhpsvd mix 64 and hpsvd mix 96, respectively. Also, thegraphs of Eu and Ev in hybridsvd mix * become sim-ilar to those of hpsvd mix *. Table 1 shows the averagenumber of iterations, for computing 50 singular pairs,from the 9th line to the 20th line in hppair mix * andthe running time. The iteration number and the run-ning time of hybridsvd mix 96 for both matrices areless than those of hpsvd mix 96. It is concluded thathybridsvd mix 96 is a speed-up version derived fromhpsvd mix 96 without changing the accuracy virtually.

5. Conclusion

In this paper, the mixed double-multiple precision ver-sion of the hyperplane constrained method and its hy-brid method are proposed. Numerically, it is observedthat the SVD computed by new versions are relativelyexact in double precision.

References

[1] K. Kondo, S. Sugimoto and M. Iwasaki, An SVD algorithmbased on solving nonlinear systems (in Japanese), Trans.

JSIAM, 19 (2009), 81–103.[2] K. Yadani, K. Kondo and M. Iwasaki, A singular value decom-

position algorithm based on solving hyperplane constrainednonlinear systems, Appl.Math.Comput., 216 (2010), 779–790.

– 27 –


0 0.2 0.4 0.6 0.8 1

2-60

2-50

2-70

2-80

2-90

2-100

2-110

2-120

0 0.2 0.4 0.6 0.8 1

2-60

2-55

2-50

2-65

0 0.2 0.4 0.6 0.8 1

2-50

2-40

2-30

2-20

2-10

(a) (b) (c)

0 0.2 0.4 0.6 0.8 1

2-50

2-40

2-30

2-20

2-10

0 0.2 0.4 0.6 0.8 1

2-50

2-40

2-30

2-20

2-10

0 0.2 0.4 0.6 0.8 1

2-50

2-40

2-30

2-20

2-10

(d) (e) (f)

Fig. 1. Graphs of singular values (horizontal axis) and the values of EH in (a), Eσ in (b), Erσ

in (c) and (f), Eu in (d) and Ev in(e) (vertical axis with logarithmic scale) for the SVD of A1. ×: dgesvd, +: iisvd, : hpsvd, : hpsvd mix 64, ©: hpsvd mix 96, :hybridsvd, : hybridsvd mix 64, : hybridsvd mix 96.

10-12

100

10-9

10-6

10-3

2-60

2-50

2-70

2-80

2-90

2-100

2-110

2-120

10-12

100

10-9

10-6

10-3

2-60

2-55

2-50

2-65

10-12

100

10-9

10-6

10-3

2-50

2-40

2-30

2-20

2-10

(a) (b) (c)

10-12

100

10-9

10-6

10-3

2-50

2-40

2-30

2-20

2-10

10-12

100

10-9

10-6

10-3

2-50

2-40

2-30

2-20

2-10

10-12

100

10-9

10-6

10-3

2-50

2-40

2-30

2-20

2-10

(d) (e) (f)

Fig. 2. Graphs of singular values (horizontal axis with logarithmic scale) and the values of EH in (a), Eσ in (b), Erσ

in (c) and (f),Eu in (d) and Ev in (e) (vertical axis with logarithmic scale) for the SVD of A2. ×: dgesvd, +: iisvd, : hpsvd, : hpsvd mix 64,©: hpsvd mix 96, : hybridsvd, : hybridsvd mix 64, : hybridsvd mix 96.

[3] K. Yadani, K. Kondo and M. Iwasaki, On the convergence ofthe V-type hyperplane constrained method for singular valuedecomposition, JSIAM Letters, 2 (2010), 21–24.

[4] K. Yadani, K. Kondo and M. Iwasaki, Numerical performanceof hyperplane constrained method and its hybrid method forsingular value decomposition, submitted.

[5] LAPACK, http://www.netlib.org/lapack/.[6] GMP, http://gmplib.org/.

Table 1. The average of iteration number ℓ and running time t

(in seconds).

A1 A2

ℓ t ℓ t

dgesvd – 0.02 – 0.01

hpsvd 10.54 4.12 26.28 11.35

hpsvd mix 64 10.36 5.19 12.76 7.03

hpsvd mix 96 10.14 5.14 11.68 6.26

hybridsvd 2.30 1.08 0.10 0.09

hybridsvd mix 64 2.38 1.48 5.52 3.07

hybridsvd mix 96 2.06 1.29 2.54 1.61

– 28 –


A knapsack public-key cryptosystem

with cyclic code over GF (2)

Yasuyuki Murakami1 and Takeshi Nasako2

1 Department of Telecommunications and Computer Networks, Faculty of Information andCommunication Engineering, Osaka Electro-Communication University, 18–8, Hatsu-Cho,Neyagawa-shi, Osaka 572–8530, Japan

2 Graduate School of Engineering, Osaka Electro-Communication University, 18–8, Hatsu-Cho,Neyagawa-shi, Osaka 572–8530, Japan

E-mail yasuyuki isc.osakac.ac.jp

Received November 8, 2009, Accepted February 8, 2010

Abstract

It is required to invent the public-key cryptosystem (PKC) that is based on an NP -hardproblem so that the quantum computer might be realized. The knapsack PKC is based onthe subset sum problem which is NP -hard. In this paper, we propose a knapsack PKC with acyclic code over GF (2) using the Chinese remainder theorem. The proposed scheme is secureagainst Shamir’s attack and Adleman’s attack and invulnerable to the low-density attack.Furthermore, the proposed scheme can reduce the size of public key by almost 25% ∼ 50% ofthe conventional scheme using a linear code.

Keywords knapsack public-key cryptosystem, subset sum problem, cyclic code, Chineseremainder theorem, low-density attack


1. Introduction

It was shown that the quantum computer can solvethe factoring problem, the discrete logarithm problemand the elliptic curve discrete logarithm problem in apolynomial time [1]. However, it is considered that eventhe quantum computer can not solve NP -hard problemsin a polynomial time. Thus, it is required to invent thepublic-key cryptosystem (PKC) that is based on an NP -hard problem so that the quantum computer might berealized. The subset sum problem is one of the NP -hardproblems.

The subset sum problem is to find the solution (x1, x2,. . . , xn) ∈ 0, 1n such that

C = a1x1 + a2x2 + · · ·+ anxn

for the given positive integers a1, a2, . . . , an and thegiven sum C. The public-key cryptosystem using thesubset sum problem has been conventionally called theknapsack cryptosystem. The knapsack cryptosystem hasa remarkable feature that the encryption can be per-formed very fast.

The first knapsack PKC was proposed by Merkle andHellman [2]. However, the secret key can be disclosedby Shamir’s attack [3] or Adleman’s attack [4] becausethe public key is generated with a linear transformationof a super-increasing sequence. The plaintext messagecan be also disclosed with the low-destiny attack (LDA)[5, 6] because the density is not sufficiently high. Theseattacks have given the impression that knapsack PKCsare insecure. It is, however, difficult to condemn that allthe knapsack PKCs are not secure.

In LDA, the subset sum problem is converted into the

problem of finding the shortest vector in a lattice. LDAwas proposed by Lagarias and Odlyzko for solving thesubset sum problem of low density [5]. The density, animportant parameter in knapsack schemes, is defined by

d =n

log2[max(a1, a2, . . . , an)].

Coster et al. improved LDA so that it can solve almost allsubset sum problems of the density less than 0.9408 [6].Nguyen and Stern proposed an adapting density attackfor low-weight knapsack PKCs [7], which will be referredto as the low-weight attack (LWA). They showed thatLWA could solve the subset sum problem with high prob-ability when Hamming weight is low.

Murakami and Nasako proposed the knapsack PKCwith the Chinese remainder theorem (CRT) [8]. Theknapsack PKC with CRT can avoid Shamir’s attackand Adleman’s attack. However, the knapsack PKC withCRT needs a large dimension n for realizing the densityinvulnerable to LDA. They also proposed the method ofencoding the plaintext before encryption with a linearcode in order to realize a high density above 1 [9]. How-ever, the size of public key is significantly large whenusing a linear code as the encoding.

In this paper, we shall propose a knapsack PKC usingCRT which uses a cyclic code over GF (2) as the en-coding. The proposed scheme is secure against Shamir’sattack and Adleman’s attack and invulnerable to LDA.The proposed scheme has an advantage that the size ofthe public key can be made much smaller than the con-ventional scheme using a linear code.

– 29 –

JSIAM Letters Vol. 2 (2010) pp.29–32 Yasuyuki Murakami and Takeshi Nasako

S6(Q)

S1

S5(Q)

S4(Q)

S3(Q)

S2(Q)

LSB

mod P

mod Q

LSB

S2 S5 S6 Su-1 Su Su+1 Sn

: 0 : 0 or 1 : 1

S3 S4

LSB

S1(Q)

Su-1(Q)

Su

(Q)Su+1

(Q)Sn

(Q)

MSB

MSBMSB

S1(P)S2

(P)S3

(P)S4

(P)S5

(P)S6

(P)Su-1

(P)Su

(P)Su+1

(P)Sn

(P)

Fig. 1. Trapdoor of the proposed scheme.

2. Proposed scheme

In this section, we shall propose a knapsack PKC us-ing a cyclic code over GF (2) with CRT. The proposedscheme adopts the trapdoor sequence proposed in [10],but not limited.

The keys of the proposed knapsack PKC are the fol-lowings:

Public key PK : PK = a, G(x).

Secret key SK :SK = s(P ), s(Q), s, P,Q, σ, TP , TQ.

2.1 Key generation

Bob creates a public key PK and a corresponding se-cret key SK by doing the following:

Algorithm K

(1) Decide the dimensions n and u such that n > u.

(2) Define the sets TP and TQ such that TP ∪ TQ =1, 2, . . . , u and TP ∩ TQ = φ.

(3) For i = n downto u + 1 do:

Generate r-bit random positive integers s(P )i and

s(Q)i .

(4) For i = u downto 1 do:

Generate random positive integers s(P )i and s

(Q)i

such that

s(P )i >

n∑

k=i+1

s(P )k , if i ∈ TP ,

s(P )i < s

(P )i+1, otherwise,

s(Q)i >

n∑

k=i+1

s(Q)k , if i ∈ TQ,

s(Q)i < s

(Q)i+1, otherwise.

(5) Choose integers P and Q such that

P >n

∑

k=1

s(P )k ,

Q >n

∑

k=1

s(Q)k ,

and gcd(P,Q) = 1.

(6) Compute s = (s1, s2, . . . , sn) ∈ ZnPQ such that

si ≡

s(P )i (mod P ),

s(Q)i (mod Q),

for i = 1 to n with CRT.

(7) Generate a polynomial F (x) of period n overGF (2).

(8) Generate a polynomial G(x) = g0 + g1x + g2x2 +

· · ·+ gn−1xn−1 of degree n− 1 such that F (x)|G(x)

over GF (2).

(9) Let Sn denote the set of permutations of integers1, 2, . . . , n. Let the generator matrix G be

G = [ξ1 | ξ2 | · · · | ξn]

=

g0 g1 g2 . . . gn−1

gn−1 g0 g1 . . . gn−2

gn−2 gn−1 g0 . . . gn−3

......

......

gn−u+1 gn−u+2 gn−u+3 . . . gn−u

.

Select a random permutation σ ∈ Sn such thatdet( G) 6= 0 over GF (2) where G = [ξσ(1) | ξσ(2) |· · · | ξσ(u)].

(10) Obtain a = (a1, a2, . . . , an) ∈ ZnPQ such that

ai = sσ(i).

for i = 1 to n with the permutation σ.

(11) Publicize the public key a, G(x) and the publicinformation on the dimension u, n.

Fig. 1 illustrates the trapdoor of the proposed scheme.

– 30 –


2.2 Encryption

Alice encrypts a message m = (m1,m2, . . . ,mu) ∈0, 1u into the ciphertext C ∈ Z by doing the following:

Algorithm E

(1) Encode the message m into m′ = (m′1,m

′2, . . . ,m

′n)

∈ 0, 1n as follows:

M ′(x) = M(x)G(x) mod (xn − 1)

where M(x) = m1 + m2x + · · · + muxu−1 andM ′(x) = m′

1 + m′2x + · · ·+ m′

nxn−1 are polynomialrepresentations of m and m′ over GF (2), respec-tively. It should be noted that m′ is a codeword ofthe cyclic code generated by F (x).

(2) Compute the ciphertext C ∈ Z as follows:

C =

n∑

i=1

aim′i.

(3) Send the ciphertext C to Bob.

2.3 Decryption

Bob decrypts the message m = (m1,m2, . . . ,mu) ∈0, 1u from the ciphertext C ∈ Z by doing the following:

Algorithm D

(1) Compute CP ∈ ZP and CQ ∈ ZQ as follows:

CP = C mod P,

CQ = C mod Q.

(2) For i = 1 to u do:If i ∈ TP

mi =

0 if CP < s(P )i ,

1 if CP ≥ s(P )i ,

Else

mi =

0 if CQ < s(Q)i ,

1 if CQ ≥ s(Q)i ,

CP ← CP − mis(P )i ,

CQ ← CQ − mis(Q)i ,

where m = (m1, m2, . . . , mu) ∈ 0, 1u.

(3) Obtain the message m as follows:

m = m G−1 mod 2.

3. Discussions

3.1 Security of secret key

Several attacks of computing the secret key from thepublic key are proposed on the knapsack PKC such asShamir’s attack [3] and Adleman’s attack [4]. These at-tacks are effective only when the public key is generatedwith a modular multiplication of a super-increasing se-quence. However, the proposed scheme uses CRT insteadof the modular multiplication in order to generate thepublic key from the secret key. Thus, these attacks arenot applicable to the proposed scheme.

3.2 Security against exhaustive search

The exhaustive search is an attack by searching plain-text at all possibilities. It requires a great investment oftime to search for 80 bits even by the latest computers.Thus, it is required that u ≥ 80 in order to be secureagainst the exhaustive search.

3.3 Security against space-time tradeoff attack

In general, the computation time can be reduced byincreasing the memory use. This type of attacks is calledthe space-time tradeoff attack. We can reasonably as-sume that the time complexity of O(N) can be dividedinto the time complexity of O(

√N) and the space com-

plexity of O(√

N). Thus, it is required that n ≥ 160 inorder to be secure against the space-time tradeoff attack.

3.4 Security against low-density attack

3.4.1 Density of proposed scheme

LDA works effectively for a low-density knapsackPKC, irrespective of the trapdoors. This attack convertsthe subset sum problem into the problem of finding ashort vector in a lattice. LDA proposed by Coster et al.can solve the almost all subset sum problems when thedensity is less than 0.9408 [6].

The density d of the proposed knapsack scheme isgiven by

d ≃n

log2 PQ.

For simplicity, we assume that u = 2u′, TP = 1, 3,. . . , 2u′ − 1 and TQ = 2, 4, . . . , 2u′. Then, we canestimate that log2 P ≃ log2 Q ≃ r + u/2 + log2(n − u)and we have

log2 PQ ≃ 2r + u + 2 log2(n− u). (1)

Therefore the density d can be estimated by

d ≃n

2r + u + 2 log2(n− u).

For example, d > 1 can be achieved when r = 40,u = 80, n = 173. Therefore, the proposed scheme is in-vulnerable to LDA because a high density above 1 canbe realized by encoding the plaintext before encryption.Thus, we can conclude that the proposed scheme is in-vulnerable to LDA. We recommend r ≥ 40, u ≥ 80 andn ≥ 160 in order to realize a high security.

3.4.2 Effect of encoding

In the proposed scheme, the m′ can be represented as

m′ = mG over GF (2). (2)

If m′ can be represented as

m′ = mG over Z, (3)

then C can be represented by

C =u

∑

i=1

a′imi, (4)

where we let the u-dimensional integer vector a′ be a′ =aGT over Z. Indeed, there are several cases that (3)

– 31 –


holds. For example, (3) holds when the generator matrixG is sparse such as G(x) = 1.

The bit-length of each a′i can be estimated by ⌈log2PQ

+ log2 n⌉[bit] at maximum. Thus, it is seen that the den-sity d′ of the knapsack a′ can be estimated by

d′ <u

log2 PQ + log2 n.

This means that the proposed scheme would not be se-cure against LDA if (3) holds. However, there are littlecases that (3) holds when G is not sparse. In order tolet the generator matrix G be non-sparse, we have onlyto let the number of terms of G(x) be sufficiently large.We strongly recommend to let the number of terms ofG(x) be approximately n/2. In this case, the number ofnon-zero elements of G can be estimated as un/2 whichis sufficiently large. Thus, we can conclude that it is dif-ficult to convert the proposed scheme into a subset sumproblem of low density.

3.5 Security against low-weight attack

The pseudo-density κ is defined by

κ =k log2 n

log2[max(a1, a2, . . . , an)]

where k is Hamming weight of the encoded message m′.LWA can solve a subset sum problem with high proba-bility when the pseudo-density κ is lower than 1 even ifthe density d is higher than 1 [7].

Let the number of terms of G(x) be n/2 which is rec-ommended value. Assuming that the plaintext messagem be randomly generated, Hamming weight of m′ canbe estimated as k = n/2. In this case, it is seen thatκ = d log2 n/2 > d > 1 can be realized. Consequently,we can conclude that the proposed scheme is invulnera-ble to LWA.

3.6 Size of public key

Let the function I() return the total amount of datain parenthesis. From (1), the amount of the public keya, I(a), can be estimated by

I(a) = n[2r + u + 2 log2(n− u)].

In the proposed scheme, the size required for rep-resenting G(x) is only I(G(x)) = n. Thus, the totalamount of the public key in the proposed scheme canbe estimated as

I (a, G(x)) = n[2r + u + 1 + 2 log2(n− u)].

On the other hand, in the conventional scheme whichuses a linear code as the encoding, I(G) = un is requiredfor representing G. Thus, the total amount of the publickey in the conventional scheme can be estimated as

I(a, G) = n[2r + 2u + 2 log2(n− u)].

The ratio of I(a, G(x)) to I(a, G) can be estimated as

I (a, G(x))

I(a, G)≃

2r + u

2r + 2u

when n and u are sufficiently large. Since we usually letthe parameter u be 0 < r < u, it is seen that the pro-

posed scheme can reduce the size of public key by almost25% ∼ 50% of the conventional scheme. For example,the proposed scheme can reduce the size of public keyby almost 30% when r = 40, u = 80 and n = 255. Wecan conclude that the size of public key in the proposedscheme is sufficiently practical.

4. Conclusion

In this paper, we have proposed a knapsack PKC witha cyclic code over GF (2) which uses CRT as the trap-door. The proposed scheme is secure against Shamir’s at-tack and Adleman’s attack which are the attack of com-puting the secret key. Moreover, the proposed scheme isinvulnerable to both LDA and LWA because it can re-alize a high density above 1 and a high pseudo-densityabove 1. Furthermore, the proposed scheme can reducethe size of public key by almost 25% ∼ 50% of the con-ventional scheme using a linear code.

Acknowledgments

This work was supported by SCOPE (Strategic In-formation and Communications R&D Promotion Pro-gramme) from the Ministry of Internal Affairs and Com-munications of Japan.

References

[1] W. P. Shor, Polynomial-time algorithms for prime factoriza-tion and discrete logarithms on a quantum computer, in:Proc. of the 35th Annual Symposium on Foundations of Com-puter Science, pp. 124–134, IEEE Computer Society Press,1994.

[2] R. C. Merkle and M. E. Hellman, Hiding information and sig-natures in trapdoor knapsacks, IEEE Trans. Inform. Theory,IT-24 (1978), 525–530.

[3] A. Shamir, A polynomial time algorithm for breaking the ba-sic Merkle-Hellman cryptosystems, in: Proc. of Crypto’82, pp.279–288, Plenum Press, 1982.

[4] L. M. Adleman, On breaking the iterated Merkle-Hellmanpublic-key cryptosystem, in: Proc. of Crypto’82, pp. 303–308,Plenum Press, 1982.

[5] J. C. Lagarias and A. M. Odlyzko, Solving low density subsetsum problems, J. Assoc. Comp. Mach., 32 (1985), 229–246.

[6] M. J. Coster, B. A. LaMacchia, A. M. Odlyzko and C. P.Schnorr, An improved low-density subset sum algorithm, in:Proc. of Eurocrypt’91, D. W. Davies ed., Lect. Notes Comput.Sci., Vol. 547, pp. 54–67, Springer-Verlag, Berlin, 1991.

[7] P. Q. Nguyen and J. Stern, Adapting density attacks to low-

weight knapsacks, in: Proc. of Asiacrypt 2005, B. Roy ed.,Lect. Notes Comput. Sci., Vol. 3788, pp. 41–58, Springer-Verlag, Berlin, 2005.

[8] Y. Murakami and T. Nasako, Knapsack public-key cryptosys-

tem using Chinese remainder theorem, in: Proc. of the 29thSymposium on Information Theory and Its Applications, pp.207–210, 2006.

[9] T. Nasako and Y. Murakami, A high-density knapsack cryp-tosystem using combined trapdoor (in Japanese), Trans.JSIAM, 16 (2006), 591–605.

[10] Y. Murakami and T. Nasako, A new class of knapsack public-

key cryptosystems using modular multiplication, in: Proc. ofthe 1st Joint Workshop on Information Security, pp. 351–354,2006.

– 32 –


New apportionment methods and their quota property

Tetsuo Ichimori1

1 Department of Information Systems, Osaka Institute of Technology, 1-79-1, Kitayama, Hi-rakata City, Osaka 573-0196, Japan

E-mail ichimori is.oit.ac.jp

Received October 6, 2009, Accepted February 28, 2010

Abstract

In this paper, we discuss the apportionment problem of distributing seats in a legislature, basedproportionally on the population of electoral districts or on the vote totals of political parties.If an apportionment method can be defined via discrete optimization, then its continuousrelaxation should have an ideally proportional solution (i.e., the quota) at optimality. First, wepropose a new class of reasonable methods of apportionment satisfying such a property. Thenwe study symmetries of five apportionment methods in the new class. Finally, we estimatehow often the five methods stay within the quota.

Keywords apportionment, representation, divisor methods

Research Activity Group Mathematical Politics

1. Introduction

In most of the countries, seats are allocated to partiesor regions proportionally to their respective vote totalsor to their respective populations. In Japan, we have480 seats in the House of Representatives. Out of the480 seats, 300 seats are for the single-seat constituencysystem and they are allocated to the 47 prefectures pro-portionally to their populations. And the remaining 180seats are for the proportional representation system andthey are allocated to the 11 electoral districts propor-tionally to their populations. In the U.S., the 435 seatsof the House of Representatives are allocated to the 50states proportionally to their populations.

Allocating seats proportionally may seem easy, but infact it is not. Ideally, for example, Iowa is entitled to4.532 seats according to the 2000 populations. If Iowagets 5 seats, then she is overrepresented. Otherwise, ifIowa gets 4 seats, then she is underrepresented. Since thenumber of seats must be integral, perfect proportionalitycannot be achieved. Then, some people might come upwith some rule of rounding these perfectly proportionalvalues up or down to a neighboring integer so that all theseats are allocated. Some countries use apportionmentmethods obeying such a rule, one of which is the Hamil-ton method or the method of greatest remainders. InJapan, we have used the Hamilton method as a “propor-tional” apportionment method since 1947. In these ap-portionment methods including the Hamilton method,each state, region or party can get reasonable seats wherethe difference is at most one between its given seats andits perfect proportional value (conventionally it is calledthe quota). In other words, any reasonable apportion-ment method should “stay within the quota.” To expressmathematically, staying within the quota requires

⌊qi⌋ ≤ ai ≤ ⌈qi⌉ for all i

where qi is the quota of state or party i and ai is the num-

ber of seats assigned to state or party i. However, theHamilton method gives rise to the so-called “Alabamaparadox” which would have given Alabama 8 seats witha house size of 299 and 7 seats with a house size of 300.At the very least, the Hamilton method is not propor-tional in the light of the Alabama paradox. Surprisingly,all such apportionment methods of rounding up or downquotas turn out to meet with nonsensical paradoxes.

Balinski and Young [1] proved that no apportionmentmethod can avoid nonsensical paradoxes except for so-called “divisor” methods. However we encounter anotherdifficulty, in other words, there is no divisor method thatalways stays within the quota.

Section 2 describes divisor methods and Section 3 de-fines a new class of “relaxedly proportional” methods.In Section 4 we study the chance of violating quota forfive relaxedly proportional methods. And we try to findwhich one stays within the quota as often as possible.

2. Divisor methods

We define a real valued function d(a) on the nonneg-ative integers a ≥ 0. The function d(a) is strictly in-creasing of a satisfying a ≤ d(a) ≤ a + 1 and moreoversatisfying d(b) = b and d(c) = c+1 for no pair of integersb ≥ 1 and c ≥ 0. The function d(a) is called a rounding

criteria.Let z be a positive real number and [z] denote an

integer satisfying the following. (i) For d with d(0) = 0:If d(a − 1) < z < d(a) for some positive integer a ≥1, then [z] = a. If z = d(a) for some positive integera ≥ 1, then [z] = a − 1 or a (ii) For d with d(0) > 0:Additionally, define d(−1) = 0. If d(a−1) < z < d(a) forsome nonnegative integer a ≥ 0, then [z] = a. If z = d(a)for some nonnegative integer a ≥ 0, then [z] = a or a+1.

Let s denote the number of states, h ≥ s + 1 thetotal number of seats to be apportioned, or house size,p = (p1, . . . , ps) > 0 a vector of populations, and a =

– 33 –

JSIAM Letters Vol. 2 (2010) pp.33–36 Tetsuo Ichimori

(a1, . . . , as) ≥ 0 a vector of nonnegative integers. Thevector a is called an apportionment of h if

∑

i ai = h.We next introduce a divisor method M and a divisor

x > 0. A divisor means a notional ratio of population toseat. If we have

∑si=1[pi/x] = h for some divisor x > 0,

state i receives ai = [pi/x] seats where pi/x is referred toas the quotient of state i. Given p, h ≥ s+1, and d(a), wedefine a divisor method M as the set of apportionments

a : ai =[pi

x

]

and

s∑

i=1

[pi

x

]

= h for some x > 0

.

If d(0) = 0, then the assumption h ≥ s+1 means ai ≥ 1for all i and aj ≥ 2 for at least one j. This leads thefamous min-max relation due to Balinski and Young [1].If d(0) = 0, then M is equivalent to the set

a : mini∈I

pi

d(ai − 1)≥ max

1≤j≤s

pj

d(aj)and

s∑

i=1

ai = h

where I = i : ai ≥ 2. If d(0) > 0, then redefine I =i : ai ≥ 1. Then the relation above still holds.

Though we can define innumerable divisor methods,the following methods are called five historical methods

and have received special treatment for a long time:

• the Adams method with d(a) = a,

• the Dean method with d(a) = a(a + 1)/(a + 0.5),

• the Hill method with d(a) =√

a(a + 1),

• the Webster method with d(a) = a + 0.5,

• the Jefferson method with d(a + 1) = a + 1.

In his article [2] published in 2008, Agnew claims thathe proposes two new divisor methods. However, Theilhad already proposed one of them in [3, footnote 1]published in 1969 and moreover Theil and Schrage hadstudied the other method and given its rounding criteriain [4, p. 262] published in 1977.

3. Relaxedly proportional methods

Let N denote the set of nonnegative integers and p thetotal of all populations: p =

∑

i pi. It is well known thatthe Webster method yields solutions that minimize

∑

i

pi(ai/pi −h/p)2 subject to∑

i ai = h and ai ∈ N for alli. Conversely, any solution that minimizes

∑

i pi(ai/pi

−h/p)2 subject to the same constraints is an apportion-ment in the Webster method, for the details see [1].

Noticing that

s∑

i=1

pi

(

ai

pi

−h

p

)2

=

s∑

i=1

a2i

pi

−2h

p

s∑

i=1

ai +h2

p2

s∑

i=1

pi

=s

∑

i=1

a2i

pi

−h2

p,

we have the Webster problem which minimizes

s∑

i=1

a2i

pi

s.t.∑

i

ai = h and ai ∈ N for all i.

Now consider its continuous relaxation minimizing

s∑

i=1

a2i

pi

s.t.∑

i

ai = h and ai ∈ R+ for all i,

where R+ denotes the set of positive real numbers.Then we can have some positive λ > 0 such thatd(a2

i /pi)/dai = 2(ai/pi) = λ for all i at optimality, whichmeans ai is proportional to pi for all i at optimality. Inother words, ai = (λ/2)pi = (h/p)pi = qi at optimality.Then, we say the Webster method is relaxedly propor-

tional.Inversely, if we consider the relation pi/ai = const.,

which means undoubtedly that pi is proportional toai, then we can obtain another objective function:∫

pi/ai dai = pi log ai + C where C is an integral con-stant. Since the function pi log ai is concave, we have anapportionment problem which maximizes

s∑

i=1

pi log ai s.t.∑

i

ai = h and ai ∈ N+ for all i,

where N+ denotes the set of positive integers. Thisapportionment problem was considered by Theil andSchrage as said before. So we call this problem as T&Sfor short. The apportionment method (also denoted byT&S) defined by this problem is a divisor method with arounding criteria: d(0) = 0 and d(a) = 1/ log((a + 1)/a)for all integers a ≥ 1.

The relation ai/pi = const. is equivalent to the rela-tion log(ai/pi) + 1 = const., which yields

∫

(log(ai/pi) +1)dai = ai log(ai/pi)+C. Since the function ai log(ai/pi)is convex, we have another apportionment problemwhich minimizes

s∑

i=1

ai logai

pi

s.t.∑

i


This apportionment problem was proposed by Theil assaid before. The Theil method defined by this problem isa divisor method with a rounding criteria: d(0) = 1/e ≈0.37 and d(a) = (1/e)(a+1)a+1/aa for all integers a ≥ 1.

Moreover, it is also known that the Hill method yieldssolutions that minimize

∑

i ai(pi/ai − p/h)2 subject to∑

i ai = h and ai ∈ N+ for all i. Conversely, any solutionthat minimizes

∑

i ai(pi/ai − p/h)2 subject to the sameconstraints is an apportionment in the Hill method.

Sinces

∑

i=1

ai

(

pi

ai

−p

h

)2

=

s∑

i=1

p2i

ai

−p2

h,

we obtain the Hill problem which minimizes

s∑

i=1

p2i

ai

s.t.∑

i

ai = h and ai ∈ N+ for all i,

and its continuous relaxation minimizing

s∑

i=1

p2i

ai

s.t.∑

i

ai = h and ai ∈ R+ for all i.

Then we can have some negative λ < 0 such thatd(p2

i /ai)/dai = −(pi/ai)2 = λ for all i at optimality,

which means ai is proportional to pi for all i at opti-mality, namely, the Hill method is also relaxedly pro-portional.

And again, inversely, if we consider the relation(ai/pi)

2 = const., then we have the following objective

– 34 –


function:∫

(ai/pi)2dai = a3

i /p2i +C. Hence we can obtain

still another apportionment problem which minimizes

s∑

i=1

a3i /p2

i s.t.∑

i


We can prove that the apportionment method definedby this problem is a divisor method with a roundingcriteria: d(a) =

√

a2 + a + 1/3 for all integers a ≥ 0. Asfar as the author knows, this rounding criteria is new.We call this divisor method as the “1/3” method. Fromwhat is said above, we have the following theorem:

Theorem 1 The Webster, T&S, Theil, Hill and “1/3”methods are all relaxedly proportional.

Next we will state another theorem with its proof.

Theorem 2 The “1/3” method is a divisor method.

Proof Let a minimize∑

i x3i /p2

i subject to the con-straints:

∑

i xi = h and xi ∈ N for all i. Then we havethe relation:

(ai − 1)3

p2i

+(aj + 1)3

p2j

≥a3

i

p2i

+a3

j

p2j

for any i with ai ≥ 1 and any j with j 6= i. Or,

a2j + aj + 1/3

p2j

≥a2

i − ai + 1/3

p2i

.

Since a2i − ai + 1/3 > 0, we can rewrite as follows:

p2i

a2i − ai + 1/3

≥p2

j

a2j + aj + 1/3

,

orpi

√

a2i − ai + 1/3

≥pj

√

a2j + aj + 1/3

.

Since a2 − a + 1/3 = (a − 1)2 + (a − 1) + 1/3, we have

pi

d(ai − 1)≥

pj

d(aj)

for any i : ai ≥ 1 and any j : j 6= i, which means that a

is an apportionment in the 1/3 method.Conversely, assume a an apportionment in the 1/3

method, which satisfies the above relations. Namely,

3a2j + 3aj + 1

p2j

≥3a2

i − 3ai + 1

p2i

where ai ≥ 1 and j 6= i. Obviously, 3a2j + 3cjaj + c2

j ≥

3a2j +3aj +1 for any positive integer cj ≥ 1, and we can

easily verify that 3a2i −3ciai + c2

i ≤ 3a2i −3ai +1 for any

positive integer ci, 1 ≤ ci ≤ ai. Hence it follows fromthe above relations that

3a2j + 3cjaj + c2

j

p2j

≥3a2

i − 3ciai + c2i

p2i

for any integer cj ≥ 1 and any integer 1 ≤ ci ≤ ai whereaj ≥ 1, ai ≥ 1 and j 6= i.

Let b minimize∑

i x3i /p2

i subject to the same con-straints as before. If b = a, then the proof is done. As-sume otherwise, and let G = j : bj > aj and L =i : bi < ai. In addition, let bj = ai + cj ≥ 1 forj ∈ G and bi = ai − ci ≥ 1 for i ∈ L. Then we have

∑

j∈G cj =∑

i∈L ci letting α =∑

j∈G cj . Then it fol-lows from the preceding α inequalities that

∑

j∈G

(3a2j + 3cjaj + c2

j )cj

p2j

≥∑

i∈L

(3a2i − 3ciai + c2

i )ci

p2i

.

Then we can get what follows:

s∑

k=1

b3k

p2k

−s

∑

k=1

a3k

p2k

=∑

j∈G

[

(aj + cj)3

p2j

−a3

j

p2j

]

−∑

i∈L

[

a3i

p2i

−(ai − ci)

3

p2i

]

=∑

j∈G

(3a2j + 3cjaj + c2

j )cj

p2j

−∑

i∈L

(3a2i − 3ciai + c2

i )ci

p2i

≥ 0.

This means that the apportionment a minimizes∑

i x3i /p2

i subject to the same constraints as before.(QED)

Theorem 3 The Adams method is not relaxedly pro-

portional.

Proof The Adams method can be defined by the op-timization problem which minimizes

s∑

i=1

a2i − ai

pi

s.t.∑

i

ai = h and ai ∈ N for all i,

with its continuous relaxation minimizing

s∑

i=1

a2i − ai

pi

s.t.∑

i

ai = h and ai ∈ R+ for all i,

where we have some λ such that (2ai − 1)/pi = λ forall i at optimality, or ai = (λ/2)pi + 0.5 for all i atoptimality. This means that the Adams method is notrelaxedly proportional and it favors the small states orparties because state or party i gets additive 0.5 seatsregardless of its population size.

(QED)

Similarly, we have the following theorems:

Theorem 4 The Jefferson and Dean methods are not

relaxedly proportional.

Theorem 5 The Dean method favors the small states

or parties.

Theorem 6 The Hamilton method is relaxedly propor-

tional.

Proof The Hamilton method minimizess

∑

i=1

(ai − qi)2

s.t.∑

i

ai = h and ai ∈ N for all i,

and its continuous relaxation has an optimal solutiona = q.

(QED)

Since the Hamilton method suffers from nonsensicalparadoxes, we strongly claim that an apportionmentmethod should be not only a relaxedly proportionalmethod but also a divisor method.

– 35 –


Table 1. Expected number of violating quota per 1,000 problems,quoted from [1].

Adams Dean Hill Webster Jefferson

1,000 15.40 2.86 0.61 1,000

4. Staying within the quota

Balinski and Young study how often five historicaldivisor methods produce apportionments that violatequota. They assume the 50 states and a fixed appor-tionment of the 435 seats according to the 1970 pop-ulations for each of the five methods. Let method Mdefine an apportionment a = a(M) and a divisor x =x(a(M)). Let Pi be uniformly distributed on the inter-val d(ai(M) − 1)x(a(M)) ≤ Pi ≤ d(ai(M))x(a(M)),then the apportionment method M gives the same ap-portionment a(M) for the populations P1, . . . , Ps as forthe 1970 populations. To avoid the unrealistic assump-tion of very small states, they assume in estimating thelikelihood of violating quota that no state’s quotientis less than 0.5. In other words, they assume that thepopulations are uniformly distributed on the intervalmax0.5, d(ai(M)x(a(M)) ≤ Pi ≤ d(ai(M))x(a(M)).Especially in the proportional representation system,many nations explicitly prescribe that parties with verysmall vote totals should not obtain any seat at all in or-der to keep a stable administration. They estimate theprobability of violating quota for each of the five histor-ical divisor methods, see Table 1 where the number ofinstances is counted in which some state violates quotaby Monte Carlo simulation and each entry denotes theexpected number per 1,000 problems. The methods ofAdams and Jefferson can be seen to violate quota virtu-ally all the time. Considering that they are not relaxedlyproportional, this result seems reasonable. Also, thoughthe Dean method is not relaxedly proportional either,its expected number is rather small but relatively muchlarger than those of the methods of Hill and Webster.By contrast, the Webster method virtually never vio-lates quota with the probability of 0.00061.

We will do almost the same as they do. But wewill first remove three methods of Adams, Jeffersonand Dean which are not relaxedly proportional fromfive historical methods. Instead of them we will addthree methods of T&S, Theil and “1/3” where T&Sand “1/3” are mirror images of the methods of Web-ster and Hill, respectively. The meaning of mirror im-ages is as follows. The objective function of the Web-ster problem is

∑

a2i /pi whose relaxation has the re-

lations ai/pi = const. for all i at optimality, whereasthe T&S problem has the objective function

∑

pi log ai

whose relaxation has the relations pi/ai = const. forall i at optimality, where we note that pi/ai is the in-verse of ai/pi. In addition, the Hill problem has the ob-jective function

∑

p2i /ai whose relaxation has the re-

lations (pi/ai)2 = const. or pi/ai = const. for all i

at optimality whereas the “1/3” problem has the ob-jective function a3

i /p2i whose relaxation has the rela-

tions (ai/pi)2 = const. or ai/pi = const. for all i at

optimality. The Theil problem has the objective func-

Table 2. Expected number of violating quota per 1,000 problems.

Hill T&S Theil Webster “1/3”

1910 1.150 0.652 0.400 0.274 0.276

1920 1.286 0.717 0.452 0.300 0.364

1930 2.081 1.126 0.657 0.480 0.526

1940 2.267 1.290 0.698 0.491 0.530

1950 1.570 0.797 0.451 0.309 0.306

1960 2.126 1.010 0.539 0.344 0.370

1970 2.752 1.385 0.753 0.538 0.558

1980 2.681 1.492 0.857 0.587 0.647

1990 6.311 3.838 2.427 1.882 2.072

2000 6.962 4.225 2.715 2.142 2.453

means 2.919 1.653 0.995 0.735 0.810

tion∑

ai log(ai/pi) whose relaxation has the relationslog(ai/pi) + 1 = const. or ai/pi = const. for all i at op-timality. Because the relation log(ai/pi) + 1 = const. isequivalent to 1 − log(pi/ai) = const. or pi/ai = const.,the Theil method has the mirror image of itself.

We use ten sets of populations from 1910 through 2000and produce 106 problems for each method and each setof populations. We estimate the probability of violatingquota for each of the new five which are not only divisormethods but also relaxedly proportional methods. SeeTable 2 where each entry denotes the expected numberper 1,000 problems.

In general, these five methods give very small proba-bilities from 0.735×10−3 (Webster) through 2.919×10−3

(Hill) on average, the latter probability is almost 4 timeslarger than the former one. This result might indicatethat we have found one of the relaxedly proportionaldivisor methods which has the smallest probability ofviolating quota, namely, the Webster method.

5. Conclusion

We have proposed relaxedly proportional apportion-ment methods. They satisfy an exceedingly naturalproperty: If an apportionment method can be describedin the form of discrete optimization, then the contin-uous relaxation should have an optimal solution iden-tical to the quota. Surprisingly, some famous methods(including the methods of Adams, Dean and Jefferson)are found to be not relaxedly proportional. Since theclass of relaxedly proportional methods includes infiniteapportionment methods, we have selected five methodsout of them, considering the symmetry of them. And wehave studied how often these five methods stay withinthe quota. The Webster method turns out to have thelargest probability of staying within the quota amongthem.

References

[1] M. L. Balinski and H. P. Young, Fair Representation, YaleUniv. Press, New Haven, 1982.

[2] R. A. Agnew, Optimal congressional apportionment, Ameri-can Math. Monthly, 115 (2008), 297–303.

[3] H. Theil, The desired political entropy, American Political

Science Review, 63 (1969), 521–525.[4] H. Theil and L. Schrage, The apportionment problem and

the European Parliament, European Economic Reviews, 9

(1977), 247–263.

– 36 –


Numerical solution to shape optimization problems for

non-stationary Navier-Stokes problems

Yutaro Iwata1, Hideyuki Azegami1, Taiki Aoyama1 and Eiji Katamine2

1 Graduate School of Information Science, Nagoya University, A4-2(780) Furo-cho, Chikusa-ku,Nagoya 464-8601, Japan

2 Department of Mechanical Engineering, Gifu National College of Technology, Kamimakuwa2236-2, Motosu City, Gifu 501-0495, Japan

E-mail azegami is.nagoya-u.ac.jp

Received November 2, 2009, Accepted February 6, 2010

Abstract

The present paper describes a numerical solution of shape optimization problems for non-stationary Navier-Stokes problems. As a concrete example, we consider the problem of findingthe shape of an obstacle in a flow field in order to minimize the energy loss integral for anassigned time interval. The primary goal of the present paper is to demonstrate the evaluationof the shape derivative of the energy loss. The traction method is used for the reshapingalgorithm. Numerical results show that the shapes of the circle obstacle converge to wedgeshapes for the cases of Reynolds numbers of 100 and 250.

Keywords calculus of variations, shape optimization, Navier-Stokes problem, shape deriva-tive, traction method

Research Activity Group Mathematical Design

1. Introduction

Shape optimization problems for flow fields arise inthe design of fluid machines. In the present paper, weconsider flow fields of incompressible viscous fluids ob-tained as the solutions to non-stationary Navier-Stokesproblems. As a concrete example, we consider a flow fieldin which an obstacle exists and minimize the energy lossintegral in the flow field for an assigned time intervalunder the domain measure constraint.

A theoretical frame work on shape derivatives for sta-tionary Stokes and Navier-Stokes problems has been in-vestigated since the 1970’s [1–7] based on general theo-ries on shape derivatives [8]. Numerous numerical anal-yses of stationary problems have been conducted.

In these theoretical and numerical studies, a diffi-culty arose concerning the lack of regularity of the shapederivatives, which causes oscillation of boundaries in nu-merical analyses. This means that shape derivatives can-not be used directly as reshaping vectors. To compensatefor the lack of regularity, the authors proposed the useof solutions to boundary value problems of elliptic par-tial equations using the shape derivatives for the Neu-mann condition [9] or the Robin condition [10]. Thisreshaping method is referred to as the traction method.Applications of the traction method to the shape op-timization problems for stationary Navier-Stokes prob-lems were presented in previous studies [11,12]. Anothermethod by which to overcome the irregularity of theshape derivatives for a moving boundary was proposedusing the Laplace operator on the boundary [5].

As described above, in the case of stationary Navier-Stokes problems, shape optimization problems have beeninvestigated extensively, as compared to the case of non-

Γ0 u0

Ω=Σ \S

S

D

Fig. 1. Flow field Ω with obstacle S.

stationary Navier-Stokes problems, and there seems tobe no references to shape derivatives for non-stationaryNavier-Stokes problems as a distributed parameter sys-tem, although numerical results obtained using the def-inition of derivatives of discrete systems have been re-ported. In general, we cannot obtain solutions to sta-tionary Navier-Stokes problems for high Reynolds num-bers. Therefore, the main objective of the present pa-per is to derive a theoretical result for computing theshape derivatives in non-stationary Navier-Stokes prob-lems. Then, based on these results, we present a nu-merical scheme and numerical results for the obstacleproblem using a newly developed program based on thealgorithm of the traction method.

2. Non-stationary Navier-Stokes problem

Let us define the domains depicted in Fig. 1. D de-notes a fixed bounded domain in R

d, d = 2, 3. Fors = 1, 2, 3, r > 0 and M > 0, we introduce a classof sub-domains of D, as denoted by Ws,∞(r, M), in thefollowing definition.

Definition 1 (Admissible set of domains) A sub-

domain Ω of D, such that Ω ⊆ D, belongs to Ws,∞(r, M)

– 37 –

JSIAM Letters Vol. 2 (2010) pp.37–40 Yutaro Iwata et al.

if and only if the following conditions are satisfied. Ω is

composed of Σ \ S such that Σ and S ⊂ Σ are open sub-

domains of D. The boundaries ∂Σ and ∂S are of class

W s,∞ in the sense of [8]. That is, a finite number of open

balls of radius r covers the boundary, and the boundary

is identified with a level set of a W s,∞ function defined

on each open ball. Moreover, the set of these functions

is bounded with respect to the W s,∞ norm by M .

The weak formulation of the non-stationary Navier-Stokes problem is defined as follows by referring to [13]with respect to the Dirichlet conditions.

Problem 2 (Navier-Stokes problem) Let Ω ∈Ws,∞(r, M), and let Γ0 ⊂ ∂Σ. In addition, ρ and µ are

supposed to be positive constants, and

f ∈ L2(

(0, T ); W 1,∞(

D; Rd))

,

p0 ∈ L2(

(0, T ); W 2,∞(

D; Rd))

,

u0 ∈

v ∈ H1(

(0, T ); W 3,∞(

D; Rd)) ∣

∣

∇ · v = 0, v|∂S = 0

.

Find velocity and pressure (u, p) ∈ U × Q such that

∫ T

0

∫

Ω

[

ρu · v + ρ (u · ∇)u · v

+ µ∇ (u − u0) · ∇v − p∇ · v]

dxdt

=

∫ T

0

[∫

Ω

(f · v − µ∇u0 · ∇v) dx

+

∫

∂Ω\Γ0∪∂S

p0 · v dγ

]

dt,

∫ T

0

∫

Γ0∪∂S

(u − u0) · (µ∇νv) dγdt = 0,

∫ T

0

∫

Γ0∪∂S

µ∇νu · v dγdt = 0,

∫ T

0

∫

Ω

q∇ · u dxdt = 0,

for all (v, q) ∈ V × Q where

U =

v ∈ H1(

(0, T ); H1(

Ω; Rd)) ∣

∣ v = 0 at t = 0

,

V =

v ∈ H1(

(0, T ); H1(

Ω; Rd))∣

∣ v = 0 at t = T

,

Q =

q ∈ H1(

(0, T ); L2 (Ω; R))

∣

∣

∣

∣

∫

Ω

q dx = 0

,

˙( · ) = ∂( · )/∂t for time t ∈ R, and ∇u = ∇(∂ui/∂xj)ij

for x∈Rd. The vector ν ∈ R

d denotes outer unit normal

on ∂Ω, ∇ν = ν · ∇, and ∇νu = (∑d

j=1(∂ui/∂xj)νj)i.

3. Shape optimization problem

Let us consider the following concrete problem.

Definition 3 (Cost functionals: J0 and J1) Let

(u, p) be the solution to Problem 2 for Ω ∈ Ws,∞(r, M).Let J0(Ω, u, p) and J1(Ω) be the energy loss and the

functional for the domain measure constraint as

J0(Ω, u, p) =

∫ T

t0

(

−

∫

Γ0∪∂S

µ∇νu · u0 dγ +

∫

Ω

f · u dx

+

∫

∂Ω\Γ0∪∂S

p0 · u dγ

)

dt

+1

2

∫

Ω

|u (x, T )|2 dx,

J1 (Ω) = m0 −

∫

Ω

dΩ,

respectively, where u0 and p0 are defined in Problem 2.

Here t0 ∈ (0, T ) is a constant given by the designer, and

m0 > 0 is a constant such that J1(Ω0) ≤ 0 for some

Ω0 ∈ Ws,∞(r, M).

Problem 4 (Shape optimization problem) Let

(u, p) be the solution to Problem 2 for Ω ∈ Ws,∞(r, M).For J0 and J1 as given in Definition 3, find Ω such that

minΩ∈Ws,∞(r,M)

J0 (Ω, u, p)∣

∣ J1 (Ω) ≤ 0

.

Since Ws,∞(r, M) is compact with respect to theL2(D) topology [8], we can approach a local solutionby constructing a series of domains from some Ω0 suchthat J1(Ω0) ≤ 0 by looking for descent domain varia-tions among the admissible set Us,∞, which is definedas follows.

Definition 5 (Domain variations) Let

Us,∞ =

ρ ∈ W s,∞0

(

D; Rd)∣

∣ ‖ρ‖ ≤ 1

be a set of domain variations, and let the new domain

Ωǫρ from Ω ∈ Ws,∞(r, M) be constructed with domain

variation ρ ∈ Us,∞ and a small constant ǫ > 0, as fol-

lows:

Ωǫρ = x + ǫρ | ∀x ∈ Ω .

Problem 6 (Optimum domain variation) Let (u,p) and (uǫρ, pǫρ) be solutions to Problem 2 for Ω and

Ωǫρ ∈ Ws,∞(r, M), respectively, with a small fixed con-

stant ǫ > 0 and ρ ∈ Us,∞. For J0 and J1, as given in

Definition 3, find ρ such that

minρ∈Us,∞

J0 (Ωǫρ, uǫρ, pǫρ)∣

∣ J1 (Ωǫρ) ≤ 0

.

4. Shape derivatives

Let us define the shape derivatives as the Gateauxderivatives with respect to domain variation. To solveProblem 6 using a gradient-based method, we need toevaluate the shape derivatives of the cost functionals.

Let us introduce the Lagrangian L 0 for J0 using theLagrange multipliers (v0 − u, q0) ∈ V × Q for Problem2 as

L0(

Ω, u, p, v0, q0)

=

∫ T

0

∫

Ω

[

−ρu · v0 − ρ (u · ∇)u · v0

− µ∇ (u − u0) · ∇v0 + p∇ · v0

+ f ·(

v0 + χ0u)

− µ∇u0 · ∇v0

+q0∇ · u

]

dx

+

∫

∂Ω\Γ0∪∂S

p0 ·(

v0 + χ0u)

dγ

– 38 –


+

∫

Γ0∪∂S

[

(u − u0) ·(

µ∇νv0)

+µ∇νu ·(

v0 − χ0u0

)]

dγ

dt

+1

2

∫

Ω

|u (x, T )|2 dx

where χ0 = χ(t0,T ), and χ( · ) denotes the characteristicfunction. If Ω is a local minimum point of Problem 6 and(u, p) is the solution to Problem 2, then the Gateauxderivatives of L 0 with respect to (u, p) ∈ U × Q are

L0′(

Ω, u, p, v0, q0)

(u, p)

=

∫ T

0

∫

Ω

[

ρu · v0 − ρ (u · ∇) u · v0 − µ∇u · ∇v0

+p∇ · v0 + χ0f · u + q0∇ · u

]

dx

+

∫

∂Ω\Γ0∪∂S

χ0p0 · u dγ

+

∫

Γ0∪∂S

[

u ·(

µ∇νv0)

+ µ∇νu

·(

v0 − χ0u0

)]

dγ

dt = 0.

Here, let us define an adjoint problem for J0 as below.

Problem 7 (Adjoint problem for J0) Let (u, p) be

the solution to Problem 2 for Ω ∈ Ws,∞(r, M). Find

adjoint velocity and pressure (v0 − u, q0) ∈ V × Q such

that∫ T

0

∫

Ω

[

−ρv0 · u + ρ (u · ∇)v0 · u

+µ∇(

v0 − χ0u0

)

· ∇u − q0∇ · u

]

dxdt

=

∫ T

0

[∫

Ω

(χ0f · u − µχ0∇u0 · ∇u) dx

+

∫

∂Ω\Γ0∪∂S

χ0p0 · u dγ

]

dt,

∫ T

0

∫

Γ0∪∂S

(

v0 − χ0u0

)

· (µ∇ν u) dγdt = 0,

∫ T

0

∫

Γ0∪∂S

µ∇νv0 · u dγdt = 0,

∫ T

0

∫

Ω

p∇ · v0 dxdt = 0,

for all (u, p) ∈ U × Q.

Note that this is a linear problem with respect to(v0, q0).

On the other hand, if (u, p) and (v0, q0) are the solu-tions of Problems 2 and 7, by Lemmas 3 and 4 in [13],the shape derivatives of L

0 with respect to ρ ∈ Us,∞

are obtained as

L0′(

Ω, u, p, v0, q0)

(ρ)

=

∫

∂Ω

G0(

u, p, v0, q0)

ν · ρ dγ

=

∫

Γ0∪∂S

[

−G0a

(

u − u0, v0 − u0

)

+ G0b

(

u, v0)

+ G0c (u)

(

u, v0)

+ G0f

(

u + v0)

+ G0T (u)

]

ν · ρ dγ

+

∫

∂Ω\Γ0∪∂S

[

G0a

(

u, v0)

+ G0b

(

u, v0)

+ G0c (u)

(

u, v0)

+ G0f

(

u + v0)

+ G0T (u) + G0

p0

(

u + v0)]

ν · ρ dγ, (1)

by using ∇(u − u0) · ∇u = ∇ν(u − u0) · ∇νu, where

G0a (u, v) = −

∫ T

0

µ∇u · ∇v dt,

G0b (u, v) = −

∫ T

0

ρu · v dt,

G0c (u) (v, w) = −

∫ T

0

(u · ∇)v · w dt,

G0f (u) =

∫ T

0

f · u dt, G0T (u) =

1

2|u (x, T )|2 ,

G0p0

(u) =

∫ T

0

[∇ν (p0 · u) + κp0 · u] dt.

From the stationary conditions of L 0, we have thefollowing primary result.

Theorem 8 (Shape derivative of energy loss)Let Ω ∈ W2,∞(r, M), in which the subboundary of ∂Ω\Γ0 ∪ ∂S, such that p 6= 0 is of the W3,∞(r, M) class.

Suppose that (u, p) is the solution of Problem 2 and that

(v0, q0) is the solution of Problem 7. Then, the shape

derivative of J0 with respect to ρ ∈ Us,∞ is given as

J0′(Ω, u, p) (ρ) =

∫

∂Ω

G0(

u, p, v0, q0)

ν · ρ dγ,

where G0 is defined in (1). Furthermore, shape gradient

G0ν belongs to W 1,∞(D; Rd).

Corollary 9 (In case of D = Σ) If only S is variable

in Theorem 8, then

J0′ (Ω, u, p) (ρ) =

∫

∂S

G00

(

u, v0)

ν · ρ dγ,

where G00ν ∈ W 1,∞(D; Rd) is given as

G00 (u, v) =

∫ T

0

µ∇νu · ∇νv dt.

For J1, we have J1′(Ω)(ρ) =∫

∂ΩG1ν · ρ dγ, where

G1 = 1.

5. Numerical scheme

Based on the above results, we use the followingscheme. Suppose that ( · )h denotes the Galerkin approx-imation.

( i ) Solve Problems 2 and 7 using the finite elementmethod, as shown below, and calculate G0

h.

( ii ) Compute the domain variations ρ0G and ρ1

G de-creasing J0 and J1, respectively, using G0

h andG1

h = 1, respectively, using the traction method im-plemented by the finite element method.

(iii) Solve Problem 6 using the algorithm based on theSQP method [14].

– 39 –


u0 Γ0

SΩ=Σ \S

(a) Outline (b) Finite element mesh

Fig. 2. Example setting.

(a) Initial (circle) (b) Optimum (c) OptimumRe = 100 Re = 250

Fig. 3. Initial and optimum shapes.

For simplicity, let d = 2 and f = 0. The P2 + babbleelement and the P1 element are employed for u and pin Problem 2, respectively. In addition, we use a semi-implicit time-advancement scheme with the Adams-Bashforth method for the convection term and theCrank-Nicolson scheme for the other terms.

For Problem 7, we use the same elements for Problem2, and the Crank-Nicolson scheme for all terms. We makeuse of the P1 element for the domain variation in thetraction method.


We developed a program based on the above schemeand obtained numerical results for the obstacle prob-lem. Fig. 2 shows an outline of the example setting anda finite element mesh. We assumed that the boundaryconditions are given as follows:

u0 =

(

u01

0

)

θ(t), u01 =µRe

ρd0,

θ(t) =

ct

(

t ∈

(

0,1

c

))

1

(

t ∈

(

1

c, T

)),

where c = 1/(100∆t) on the left-hand side of ∂Σ, p01 = 0and u02 = 0 on the upper and lower sides of ∂Σ, andp0 = 0 on right-hand side of ∂Σ. Here d0 denotes thediameter of the initial shape of the obstacle in Fig. 3 (a),and ∆t means the time step size. We analyzed the casesfor Reynolds numbers of Re = 100 and 250. For Re =100, T = ∆tN = 0.05 × 20, 000 = 1, 000 sec, and t0 =∆t × 6, 000 = 300 sec. For Re=250, T = ∆tN = 0.02×20, 000 = 400 sec, and t0 = ∆t×6, 000 = 120 sec. The as-sumption of Corollary 9 is satisfied. The traction methodof Neumann type was used for the domain variation.

Fig. 4 shows that the energy loss decreases monoton-ically while satisfying the domain measure constraint,and Fig. 5 shows that the size of the Karman vortexdecreased compared to the initial shapes.

Acknowledgments

The present study was supported by JSPS KAKENHI(20540113).

0.7

0.8

0.9

1.0

0 2 4 6 8 10 12

Rat

es t

o i

nit

ial

val

ues


Energy loss J 0

Domain measure J 1

0.6

0.7

0.8

0.9

1.0

0 2 4 6 8 10 12

Rat

es t

o i

nit

ial

val

ues


Energy loss J 0

Domain measure J 1

0.6

(a) Re = 100 (b) Re = 250

Fig. 4. Iteration histories with respect to shape variation.

Initial (circle) Optimum

Re = 100, t = ∆t× 20, 000 = 1, 000 sec

Re = 250, t = ∆t× 20, 000 = 400 sec

Fig. 5. Velocities u (upper) and pressures p (lower).

References

[1] O. Pironneau, On optimum profiles in Stokes flow, J. FluidMech., 59 (1973) 117–128.

[2] O.Pironneau, On optimum design in fluid mechanics, J. FluidMech., 64 (1974) 97–110.

[3] O. Pironneau, Optimal Shape Design for Elliptic Systems,Springer-Verlag, New York, 1984.

[4] J. A. Bello, E. Fernandez-Cara, J. Lemoine and J. Simon, Thedifferentiability of the drag with respect to the variations ofa Lipschitz domain in a Navier-Stokes flow, SIAM J. ControlOptim., 35 (1997) 626–640.

[5] B. Mohammadi and O. Pironneau, Applied Shape Optimiza-tion for Fluids, Oxford Univ. Press, New York, 2001.

[6] J. Haslinger, J. Malek and J. Stebel, Shape optimization inproblems governed by generalised Navier-Stokes equations:existence analysis, Control Cybern., 34 (2005) 283–303.

[7] S. Kaizu, The Gateau derivative of cost functions in the opti-mal shape problems and the existence of the shape derivativesof solutions of the Stokes problems, JSIAM Letters, 1 (2009)17–20.

[8] D. Chenais, On the existence of a solution in a domain iden-tification problem, J. Math. Anal. Appl., 52 (1975) 189–219.

[9] H. Azegami, Solution to domain optimization problems (inJapanese), Trans. JSME, Ser. A, 60 (1994) 1479–1486.

[10] H. Azegami and K. Takeuchi, A smoothing method for shapeoptimization: traction method using the Robin condition, Int.J. Comput. Methods, 3 (2006) 21–33.

[11] E. Katamine, H. Azegami, T. Tsubata and S. Itoh, Solutionto shape optimization problems of viscous flow fields, Int. J.Comput. Fluid Dynam., 19 (2005) 45–51.

[12] E. Katamine, Y. Nagatomo and H. Azegami, Shape optimiza-tion of 3D viscous flow fields, Inverse Problems in Science andEngineering, 17 (2009) 105–114.

[13] G. Allaire, F. Jouve and A.-M. Toader, Structural optimiza-tion using sensitivity analysis and a level-set method, J. Com-put. Phys., 194 (2004) 363–393.

[14] H. Azegami, Solution to boundary shape optimization prob-lems, in: C. A. Brebbia and W. P. de Wilde eds, High Perfor-mance Structures and Materials II, pp. 589–598, WIT Press,Southampton, UK, 2004.

– 40 –


A block sparse approximate inverse

with cutoff preconditioner for semi-sparse linear systems

derived from Molecular Orbital calculations

Ikuro Yamazaki1, Masayuki Okada2, Hiroto Tadano1, Tetsuya Sakurai1 and Keita Teranishi3

1 Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tenn-odai, Tsukuba-shi, Ibaraki 305-8578, Japan

2 Hitachi, Ltd., Software Division, Kaneichi Bldg. 549-8, Shinano-cho, Totsuka-ku, Yokohama244-0801, Japan

3 Cray, Inc., 380 Jackson St. Suite 210, St Paul, MN 55101, USA

E-mail yamazaki mma.cs.tsukuba.ac.jp


Abstract

We present an approach to preconditioning for large, relatively dense linear systems and verifythe validity of our method. We restrict the target of our method to Molecular Orbital (MO)calculations. Sparse Approximate Inverse (SAI) is typically less effective at accelerating theconvergence and requires a huge computational cost in its construction when a large numberof nonzero entries are kept in the approximate inverse matrix. We explain a construction ofBlock SAI and a cutoff strategy to reduce the number of nonzero elements, and investigatethe efficiency of a cutoff strategy and Block SAI.

Keywords linear system, preconditioning, sparse approximate inverse


1. Introduction

Density Function Theory (DFT) is a popular methodto obtain the potential energy of materials in atomisticscales, and typically involves a solution of large scaleeigenvalue problems. In particular for the area of proteinfolding, we have developed a task-parallel scheme forthe eigenvalue problems [1] that achieves a substantialparallel speedup. Yet, our scheme requires each CPU (ora group of CPUs) to solve the solutions of large linearsystems

Ax = b,

where A ∈ Cn×n contains relatively large number of

nonzero elements (semi-sparse). We have observed thatthese linear systems are typically derived from MolecularOrbital (MO) calculations.

For the solution of such linear systems on multi-coreCPUs, Krylov subspace methods preconditioned withsparse approximate inverse (SAI) based on Frobeniusnorm minimization appear very attractive because of thegood parallel efficiency in both preconditioner construc-tion and application. However, the convergence of theiterative solvers with SAI tends to be slower than thosewith the conventional preconditioning methods such asincomplete factors. This is because the SAI precondi-tioner often falls into local minimization with respect toindividual columns. In addition to this drawback, thearithmetic costs of constructing the SAI preconditionergrows cubically with the number of nonzero entries perrow, making it less feasible than other preconditioningalternatives.

In this paper, we attempt to overcome these perfor-mance bottlenecks of SAI using a blocked version ofFrobenius norm minimization in order to mitigate theside effect of the minimization process applied to individ-ual columns of the approximate inverse. We also applydifferent drop-threshold schemes to achieve a reductionof the arithmetic costs of preconditioner constructionand application at the cost of a small increase in theiteration counts.

This paper is organized as following. In Section 2, theSAI preconditioner and its block variant are described.We discuss our preconditioner for the semi-sparse linearsystems in Section 3. In Section 4, we describe how toprofile for a cutoff parameter. In Section 5, we investigatethe performance of our preconditioner through the nu-merical experiments with two matrices obtained fromthe computation of molecular orbital, followed by theconcluding remarks in Section 6.

2. SAI and its block variant

Our approach to constructing a preconditioning ma-trix is based on Frobenius norm minimization [2, 3]:

minM

‖AM − I‖2F, (1)

where I is the identity matrix. The Frobenius norm canbe minimized in parallel:

‖AM − I‖2F =

n∑

k=1

‖Amk − ek‖22, (2)

where mk and ek are the k-th column of M and Irespectively. Thus, the preconditioning matrix M =

– 41 –

JSIAM Letters Vol. 2 (2010) pp.41–44 Ikuro Yamazaki et al.

[m1,m2, · · · ,mn] ≈ A−1 is constructed by solving nindependent least square problems:

minmk

‖Amk − ek‖22, k = 1, 2, . . . , n. (3)

The Block SAI preconditioner is proposed by Barnardand Grote [4, 5] in order to improve the accuracy ofthe preconditioning matrix. In the Block SAI precon-ditioner, problem (1) can be approximated in parallel:

‖AM − I‖2F =

L∑

k=1

‖AMk − Ek‖2F, (4)

where l is a block size, L = ⌈n/l⌉ and Ek is a sub-matrix of the identity matrix I such that I = [E1,E2, · · · , EL]. Thus, the preconditioning matrix M =[M1,M2, · · · ,ML] is constructed by solving L indepen-dent least square problems:

minMk

‖AMk − Ek‖2F, k = 1, 2, . . . , L. (5)

The initial sparsity pattern M0 of the preconditioningmatrix is decided by the following:

spy(A) = spy(M0), (6)

where “spy” denotes the sparsity pattern of a matrix.

3. Block SAI with cutoff (BSAIC)

Molecular orbital calculation of the biochemistry ap-plication [6] requires to solve the following generalizedeigenvalue problem:

Fν = λSν,

where F ∈ Rn×n is symmetric and S ∈ R

n×n is sym-metric positive definite. In molecular orbital calcula-tion, eigenpairs around Highest Occupied Molecular Or-bital (HOMO) - Lowest Unoccupied Molecular Orbital(LUMO) are important to analyze the chemical reac-tions. Thus, interior eigenvalue problems are required tosolve. For instance, the inverse of the shifted matrix(F − σS)−1 is required to find some eigenvalues aroundσ in some methods. In the Sakurai-Sugiura (SS) method[1], the solution of the system of linear equations Ax = b,where the coefficient matrix A is given by

A = ωS − F,

is required. Since F is semi-sparse, A is also semi-sparse.Firstly, we describe why the sparsity pattern of A is

used as the initial sparsity pattern of M0. The nonzeropattern of A−1 is changed depending on ω. When thesparsity pattern of A is similar to that of A−1, Ax = b

is relatively easy to solve. Meanwhile, when the sparsitypattern of A is not similar to that of A−1, it is difficultto solve Ax = b. In practice, the sparsity pattern of A−1

with important ω is relatively similar to that of A, andthus the sparsity pattern of A can be used as the initialsparsity pattern of M0.

Secondly, we describe how to reduce the computa-tional cost of SAI. The coefficient matrix A containsrelatively large number of nonzero elements, and thusthe computational cost of SAI is huge if SAI is appliedto these matrices. For this reason, we propose a cutoff

that is applied to the coefficient matrix A to reduce thenumber of nonzero elements. In our cutoff, off-diagonalnonzero elements of A are dropped if they are small com-pared to a cutoff parameter θ as shown below:

Ac = [aij ], aij =

aij , (|aij | > θ or i = j),

0, (otherwise),(7)

where θ is a nonnegative real value. As a result, thecomputational cost of SAI is reduced because of fewernonzero elements in A. However, a larger value of θ leadsto a less effective preconditioning matrix with a largenumber of iterations, though it reduces the computa-tional costs of SAI even further.

After applying the cutoff strategy, least square prob-lems with the approximate matrix Ac:

minMk

‖AcMk − Ek‖2F, k = 1, 2, . . . , L (8)

are solved. The matrix M = [M1,M2, · · · ,ML] is em-ployed as the preconditioning matrix. We call this pre-conditioner the Block SAI with Cutoff (BSAIC) precon-ditioner.

We describe the performance improvement of the ap-proximate inverse obtained by a block version of Frobe-nius norm minimization. Firstly, the performance degra-dation of SAI by applying our cutoff can be reduced dueto extra fill-ins introduced by the blocked version. Thesefill-ins make the preconditioner more robust and allowlarger value of θ for the cutoff to make it sparser thanthe original SAI.

Let the size of the matrix of i-th QR decompositionof SAI be mi × ni (mi ≥ ni). Its computational cost isO(min

2i ). Let the number of nonzero elements of A be

αn2 (0 < α ≤ 1), where n is the dimension of A. Inmany cases, mi and ni are proportional to α. Therefore,the decrease of α provides the drastic decrease of thecomputational cost of the QR decomposition. Indeed,when the matrices derived from the computation of themolecular orbitals are used, mi and ni are proportionalto α, and the cutoff is effective for the QR decomposi-tion. Applying the cutoff to A, the number of iterationsincreases slowly in certain range of θ.

Secondly, Block SAI increases a scope of minimizationas the original version does it for each column of M .Computing the minimum associated with a block of col-umn reduces the threat of local minimization. In otherwords, a large l makes M more global minimizer in thesense of the linear space of A. However, a large l oftenincreases the cost of the least square problem associatedwith each block as the row dimension of the matrix foreach least square problem is determined by the numberof nonzero rows in the block. This performance drawbackcan be mitigated with a large cutoff value θ, and our ex-periments in Section 5 indicates that a large l slows downthe increase of the iteration count with respect to θ.

In conclusion, both the cutoff parameter θ and theblock size l are preferred to be large as much as possible.However, these values depend on problems, and thus wetake matrices obtained from computation of the molec-ular orbitals for instance.

– 42 –


4. Profiling for cutoff parameter tuning

In the SS method, linear systems Ax = b for variousω which are around HOMO-LUMO need to be solved.The difficulty of solving Ax = b depends on ω. The con-structing cost of M are highly dependent on the numberof nonzero elements of A. Therefore, we need to set anappropriate cutoff parameter depending on ω. How toprofile for a cutoff parameter is shown below:

1) Profiling stagea. Check the values of matrix elements.b. Check the constructing time of M with a large cutoff

value.c. Estimate the constructing times of M with several

cutoff values from the number of nonzero elementsof all matrices of QR decomposition.

d. Set the range of the cutoff parameter µ with µmin ≤µ ≤ µmax.

2) Trial stagea. Set µ = µmax.b. Construct M with the cutoff parameter µ.c. Perform one cycle of the preconditioned GMRES(k).d. If ‖b − Axk‖/‖b‖ ≤ δ or µ ≤ µmin then

exit this trial stage.Else

set µ = βµ (0 < β < 1).End if

e. Go to 2)-b.

3) Perform the preconditioned GMRES(k) using M con-structed in 2).

5. Numerical experiments

In this section, the BSAIC preconditioner is comparedwith SAI and Block SAI by numerical experiments.All experiments are carried out by MATLAB 7.4 onMacBook (CPU: Intel Core 2 Duo 2.0GHz, Memory:2.0Gbytes, OS: Mac OS 10.5.6). The test problems aresolved by the preconditioned GMRES(30) method [7].The stopping criterion for the relative residual is 10−10.The initial guess x0 is set to 0 and all elements of b areset to 1.

5.1 Example 1

In Example 1, the matrices F and S are derived fromthe computation of the molecular orbitals of a modelDNA. The coefficient matrix A is given by ωS−F , whereω is a real parameter.

The size of A is 1, 980 and the number of nonzeroelements is 728, 080 (18.57%). In this example, the pa-rameter ω for the coefficient matrix A and the block sizel are set to −0.16 and 30, respectively.

When A−1c with small θ is used as a preconditioning

matrix, the number of iterations is very small (e.g., thenumber of iterations is 5 with θ = 10−3). The eigen-value corresponding to HOMO is −0.16538. In molecularorbital calculations, eigenpairs around HOMO-LUMOare desired. Therefore, the system of linear equationsAx = b for several parameters ω which are close toHOMO need to be solved.

The results of Example 1 are reported in Table 1. Thenumber of iterations of SAI is 9 and that of Block SAI

(a) spy(A). (b) spy(A−1).

Fig. 1. The sparsity pattern with ω = −0.24.

(a) spy(A). (b) spy(A−1).

Fig. 2. The sparsity pattern with ω = −0.5.

with l = 30 is 7. Block SAI takes less iterations and pre-conditioning time than those of SAI. SAI with Cutofftakes less preconditioning time than that of SAI andBlock SAI. BSAIC converges faster than other precon-ditionings.

5.2 Example 2

In Example 2, the matrices F and S are derived fromthe computation of the molecular orbitals of Lysozyme.In this example, we also use the real parameter ω for thecoefficient matrix A = ωS − F . The matrices F and Sare real symmetric and real symmetric positive definite,respectively. The size of A is 6, 005 and the number ofnonzero elements is 3, 275, 925 (9.08%). The parameterω for coefficient matrix A and the block size l are set to−0.24 and 30, respectively.

Fig. 1 shows sparsity patterns of A and A−1 with ω =−0.24, respectively. Fig. 2 shows sparsity patterns of Aand A−1 with ω = −0.5, respectively. The eigenvaluewhich corresponds to HOMO is −0.25086.

The results of Example 2 are reported in Table 2. Thenumber of iterations of SAI is 6 and that of Block SAIwith l = 30 is 5. BSAIC converges faster than otherpreconditionings.

Fig. 3 shows the actual preconditioning time and theestimated time with respect to several cutoff values usingl = 30 for Example 2. One iteration take 0.2 [sec]. Whenθ = 10−5 is used, too large comparison with time for oneiteration. Thus, we set the range of cutoff parameterfrom 10−1 to 10−4.

The results of our profiling are presented in Table 3.The threshold δ of relative residual are set to 10−3. To-tal time denotes the times including profiling, Cutoff,preconditioning and iteration. An appropriate cutoff pa-rameter depending on ω is found. When θ = 10−5 isused, the preconditioning time is 140.92 [sec] withoutprofiling. Thus, when a small value of θ is used withoutprofiling, large computational cost is required.

Fig. 4 shows the relative residual of our profiling withω = −0.45 for Example 2. Ax = b is solved with variableθ.

– 43 –


Table 1. Results for Example 1.

PreconditionerThe number Wall clock time [sec]

of iterations Cutoff Preconditioning Iteration Total

SAI 9 — 474.94 0.24 475.18

Block SAI 7 — 56.85 0.24 57.09

SAI with Cutoffθ = 10−2 123 0.12 1.15 2.17 3.44

θ = 10−4 21 0.16 16.82 0.50 17.48

BSAICθ = 10−2 44 0.12 1.07 0.77 1.96

θ = 10−4 14 0.16 5.46 0.29 5.91


PreconditionerThe number Wall clock time [sec]

of iterations Cutoff Preconditioning Iteration Total

SAI 6 — 21197.78 0.80 21198.58

Block SAI 5 — 1760.92 0.64 1761.56

SAI with Cutoffθ = 10−2 59 0.44 4.00 2.64 7.08

θ = 10−4 13 0.58 75.62 0.45 76.65

BSAICθ = 10−2 17 0.44 3.91 0.58 4.93

θ = 10−4 9 0.58 36.54 0.43 37.55

Actual precond. time

Cutoff value θ

Estimated precond. time

101010-3 -2 -110-4-510

3

2

1

0

-1Com

pu

tati

on

al

tim

e [s

ec]

(log) 10

10

10

10

10

Fig. 3. The actual preconditioning time and the estimated timewith respect to several Cutoff value using l = 30 for Example 2.

6. Conclusions

Our method, BSAIC algorithm, reduces the compu-tational cost for generating the approximate inverse M ,and overcomes the performance bottlenecks of SAI usingthe blocked version of Frobenius norm minimization andthe cutoff strategy for semi-sparse matrices. Althoughour empirical study is confined to the problems fromelectronic calculations, we demonstrate that the perfor-mance of BSAIC substantially better than SAI precondi-tioning, and the application of cutoff parameters furtherincrease the performance advantage of BSAIC, makingit robust and efficient preconditioning together. The re-sults of the empirical study also indicate that some pos-sibilities of predicting the cost of constructing M andtuning θ in a small computational overhead.

In future works, we will try to find a better strategyfor selecting an appropriate cutoff parameter tuning andapply for large scale problems.

Acknowledgments

This research was supported in part by a Grant-in-Aidfor Scientific Research of Ministry of Education, Culture,Sports, Science and Technology, Japan, Grant number:21246018 and 21105502.

Table 3. Results of our profiling for Example 2.

ω Appropriate θ Total time [sec]The total number

of iteration

−0.24 10−2 13.36 46

−0.3 10−2 14.32 59

−0.35 10−2 17.16 130

−0.4 10−2 14.54 69

−0.45 10−3 28.82 178

−0.5 10−4 83.53 793

Number of iteration

θ=1e-1θ=1e-2θ=1e-3Threshold δ

200150100500

-1

-3

-5

-7

Rel

ati

ve

resi

du

al

-9

-1110

10

10

10

10

10

Fig. 4. The relative residual of our profiling with ω = −0.45 forExample 2.

References

[1] T. Sakurai and H. Sugiura, A projection method for general-ized eigenvalue problems using numerical integration, J.Com-

put. Appl. Math., 159 (2003) 119–128.[2] M. W. Benson and P. O. Frederickson, Iterative solution of

large sparse linear systems arising in certain multi dimen-sional approximation problems, Utilitas Math., 22 (1982)

127–140.[3] E. Chow and Y. Saad, Approximate inverse preconditioners

via sparse-sparse iterations, SIAM J. Sci. Comput., 19 (1998)995–1023.

[4] S. T. Barnard and M. J. Grote, A block version of the SPAIpreconditioner, in Proc. of the 9th SIAM conf. on ParallelProcess. for Sci. Comput, San Antonio, TX, 1999.

[5] SPAI, http://www.computational.unibas.ch/software/spai/.[6] Y. Inadomi, T. Nakano, K. Kitaura and U. Nagashima, Def-

inition of molecular orbitals in fragment molecular orbitalmethod, Chem. Phys. Letters, 364 (2002) 139–143.

[7] Y. Saad and M. H. Schultz, GMRES: a generalized minimalresidual algorithm for solving nonsymmetric linear systems,SIAM J. Sci. Stat. Comput., 7 (1986) 856–869.

– 44 –


Finite element computation for scattering problems

of micro-hologram using DtN map

Yosuke Mizuyama1, Takamasa Shinde2, Masahisa Tabata3 and Daisuke Tagami3

1 Panasonic Boston Laboratory, 2 Wells Avenue, Newton, MA 02459, USA2 Fujitsu Advanced Technologies Limited, 4-1-1, Kamikodanaka, Nakahara-ku, Kawasaki 221-8588, Japan

3 Faculty of Mathematics, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka 819-0395, Japan

E-mail mizuyamay us.panasonic.com, shinde.takamasa jp.fujitsu.com,

tabata,tagami math.kyushu-u.ac.jp

Received September 30, 2009, Accepted January 9, 2010

Abstract

Computational results are presented on micro-hologram diffraction for optical data storageusing a finite element method. Retrieval of object light from a micro-hologram is formulatedas an optical scattering problem in an infinite region. In order to overcome the difficulty ofdealing with the infinite region a Dirichlet to Neumann (DtN) map is employed on an artificialboundary. By virtue of the DtN map reflection from the artificial boundary is effectivelyalleviated and non-reflecting boundary is obtained. Retrieval of the object light is computedfor two different models.

Keywords optical scattering, DtN map, finite element method, micro-hologram


1. Introduction

Holographic data storage has been studied as a nextgeneration method for optical data storage with terabytecapacity. A method using a micro-hologram is one ofsuch technologies [1, 2], where a micro-hologram is gen-erated as a set of interference fringes when two counter-propagating focused laser beams intersect at the focus.The data that one of the two lights, namely object light,carries are reconstructed as diffraction from the holo-gram when the other light called the reference light isilluminated on the hologram. This process is called re-trieval of object light; see [3, p. 308] in detail. One of themain interests of the study is to estimate the diffractionefficiency in the retrieval process that determines thesignal to noise ratio of the data storage system.

For free space propagation of light where neither freecharge nor current exists, the electric field and the mag-netic field are decoupled. This reduces the Maxwell equa-tions to a set of vector-valued Helmholtz equations forthe electric and magnetic fields assuming that the fieldsare time harmonic. Furthermore, in the retrieval process,a laser beam is in general polarized so it suffices to an-alyze one component of the electric field of light insteadof all components of the Helmholtz equations [4].

As a result, the retrieval process can be described asan optical scattering problem, which is stated by thescalar Helmholtz equation in an infinite region. In or-der to avoid computational difficulty in an infinite re-gion, several techniques have been developed to trans-form the original problem into one in a bounded domain.In the field of optics, there are researches using Bound-ary Element Method (BEM) [5], hybrid finite elementmethod with BEM coupling [6], Perfectly Matched Layer

(PML) [7], Transparent Boundary Condition (TBC) [4].There is another method called Dirichlet to Neumann

(DtN) map [8, 9] that has been used mainly in scatter-ing problems in acoustics and has not been yet used foroptical scattering problems. The reason may be becauseof the large wave number of light making computationmore difficult. To the best of our knowledge, the DtNmap has not yet been used for optical scattering prob-lems. In this paper, we apply a DtN map to our opticalscattering problem and simulate retrieval of object lightfrom a micro-hologram.

2. Formulation

Let ΩB be a 2-dimensional transmissive scatterer witha smooth boundary Γ and an outward unit normal n;see Fig. 1. We assume the time harmonic field. Let ube the complex amplitude of a scalar component of theelectric field of scattered light. The scattering problemis formulated by the following Helmholtz equations inR

2 according to [4, 10]; for a given domain ΩB , wavenumbers k1 and k2 in medium 1 and 2, and an incidentlight uinc, find u : R

2 → C such that

−∆u − k22u = (∆ + k2

2)uinc in ΩB ,

−∆u − k21u = 0 in Ωc

B ,

[ u ] = 0 on Γ ,[

∂u

∂n

]

= 0 on Γ ,

limr→+∞√

r

(

∂u

∂r− ik1u

)

= 0,

(1)

– 45 –

JSIAM Letters Vol. 2 (2010) pp.45–48 Yosuke Mizuyama et al.

Γa

Γ

nΩB

a

Ωa

2R

Fig. 1. A scatterer ΩB and an artificial boundary Γa.

where i denotes the imaginary unit and [ . ] represents agap across Γ and r := |x| with the orthogonal coordinatesystem x = (x1, x2) in R

2.The scatterer, incident light and scattered light in the

formulation correspond to the micro-hologram, referencelight and object light, respectively in the retrieval pro-cess.

Let Ωa be a circle with the radius a (> 0), and letΓa be the boundary of Ωa; see Fig. 1. Suppose that thecircle Ωa includes ΩB strictly. By introducing DtN map[8], the problem (1) becomes equivalent to the followingequations in Ωa; find u : Ωa → C such that

−∆u − k22u = (∆ + k2

2)uinc in ΩB ,

−∆u − k21u = 0 in Ωa\ΩB ,

[ u ] = 0 on Γ ,[

∂u

∂n

]

= 0 on Γ ,

∂u

∂r= −Su on Γa.

(2)

Here S is the Steklov-Poincare operator defined by

Su := −k1

∞∑

n=−∞

H(1)′n (k1a)

H(1)n (k1a)

un(a)φn(θ),

where (r, θ) is the polar coordinate system in R2, H

(1)n

Hankel function of the first kind of order n, φn(θ) thespherical harmonics defined by

φn(θ) :=1

√2π

einθ

and un a Fourier coefficient defined by

un(a) :=

∫ 2π

0

u(a, θ)φn(θ) dθ.

The Hankel function and its derivative are defined asfollows:

H(1)n (x) := Jn(x) + iYn(x),

H(1)n (x)′ :=

1

2

(

H(1)n−1(x) − H

(1)n+1(x)

)

,

where Jn(x) and Yn(x) are the Bessel function of thefirst and second kind of order n, respectively.

Let L2(Ωa) be the space of complex-valued square-

integrable functions defined in Ωa, and let ‖ .‖0,Ωabe its

norm. For m ∈ N, let Hm(Ω) be the space of functionsin L2(Ωa) with derivatives up to the mth order, andlet ‖ . ‖m,Ωa

be its norm. Set V := H1(Ωa). Moreover,bilinear forms a and s are defined by

a(u, v) :=

∫

Ωa

(∇u · ∇v − k2u v ) dx, ∀u, v ∈ V,

s(u, v) :=

∫

Γa

(Su) v ds, ∀u, v ∈ V,

and a linear functional f is defined by

〈f, v〉 :=

∫

Ωa

f v dx, ∀v ∈ V.

Here, k is a piecewise constant function defined by

k(x) :=

k1 in Ωa\ΩB ,

k2 in ΩB ,

and f is a scattering potential defined by

f(x) :=

0 in Ωa\ΩB ,

(∆ + k22)u

inc in ΩB .

Note that simple calculations make the bilinear form sbecome

s(u, v) = −k1 a

+∞∑

n=−∞

H(1)′n (k1a)

H(1)n (k1a)

un vn .

Now, (2) can be written in a weak form as follows: findu ∈ V such that

a(u, v) + s(u, v) = 〈f, v〉, ∀v ∈ V. (3)

3. Finite element approximation

Let Th be a uniformly regular family of triangula-tion of Ωa, where h stands for the maximum diameter ofthe triangles in Th. We set Ωah := int(∪T ;T ∈ Th).By definition, let Vh ⊂ V be the P1 finite element space.Moreover, the bilinear forms a and s, and the linear func-tional f are approximated by bilinear forms ah and sN

h ,and a linear functional fh defined by, for uh, vh ∈ Vh,

ah(uh, vh) :=

∫

Ωah

(∇uh · ∇vh − k2uh vh) dx,

sNh (uh, vh) := −k1 a

N∑

n=−N

H(1)′n (k1a)

H(1)n (k1a)

uhn vhn,

〈fh, vh〉 :=

∫

Ωah

(Πhf) vh dx,

where N is a truncation number and Πhf denotes theP1 interpolant of f .

Then, a finite element problem corresponding to (3)is obtained as follows: find uh ∈ Vh such that

ah(uh, vh) + sNh (uh, vh) = 〈fh, vh〉, ∀vh ∈ Vh. (4)

Remark 1 Let D be a reflective scatterer with smooth

boundary Γ . Instead of the problem in the whole

2-dimensional Euclidean space R2, we consider the

Helmholtz equation in the exterior region R2\D. Sup-

pose that the circle Ωa includes D strictly. Then, we can

– 46 –


obtain the equivalent formula as follows:

−∆u − k21u = f inΩa\D,

u = g on Γ ,

∂u

∂r= −Su on Γa.

(5)

Under appropriate assumptions, there exists a conver-

gence result for a finite element scheme (5) correspond-

ing to this problem; see [9].


Let u1 and u2 be the complex amplitude of a referencelight and an object light, respectively. An interferencepattern is calculated by the intensity field of the sumof u1 and u2. The domain for the corresponding scat-terer, ΩB , is found as the region where the interferenceintensity exceeds a given value. We assume a model thatthe region suffers index change to n2 from surroundingindex n1 of a holographic material. We thus prepare ascatterer ΩB prior to the simulation of a scattering prob-lem. Then simulation is made to compute the scatteringdistribution and intensity so as to analyze the retrievalprocess.

In scattering problem, we approximate the retrievalreference light by an incident light with a plane wave,i.e., uinc ∼ eik1x, whereas u1 and u2 are Gaussian beams.Then the scattering potential f can be simplified to(−k2

1 + k22) eik1x in ΩB . The complex amplitude of scat-

tered field u is approximated by the conventional con-forming P1 elements.

Considering that micro-holograms are literally of thesize of microns, which is the same order as the wave-length of light, we have nondimensionalized the equa-tions with respect to λ. Throughout these examples, therefractive indices are n1 = 1.5 and n2 = 1.51. The wavenumbers then become

k1 := 2πn1 ≈ 9.425, k2 := 2πn2 ≈ 9.488.

The Hankel function appearing in the Steklov-Poincare operator is calculated by using the built-infunction for the Bessel functions in the compiler. In orderto solve the resultant linear systems, Conjugate Residual(CR) method was used. The computations were done byCore 2 Duo 3GHz CPU with 8GB memories.

4.1 Model A

A scatterer ΩB is given by

ΩB = x ∈ R2; |u1 + u2|

2 ≥ 0.5,

which is a result of interference of two Gaussian beamsthat intersect at 90 degrees:

u1(x1, x2) =1

2

√

xR

q(x1)exp

(

−ik1x

22

2q(x1)

)

exp(ik1x1),

u2(x1, x2) =1

2

√

xR

q(x2)exp

(

−ik1x

21

2q(x2)

)

exp(−ik1x2),

where xR is a given nondimensionalized Rayleigh range≈ 4.676. The size of the micro-hologram created by thesebeams corresponds to about 0.7µm.

In this example, u1 and u2 represent the reference light

Fig. 2. Model A and its triangulation.

0.050.0450.040.0350.030.0250.020.0150.010.005

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5-1.5

-1

-0.5

0

0.5

1

1.5

2

0

Fig. 3. The absolute value of scattered waves in Model A.

propagating along x1 direction and the object light along−x2 axis, respectively to form a micro-hologram, whichis a scatterer in our optical scattering problem. Since weused Gaussian beams, only the vicinity of the Rayleighrange has strong field. This is the reason why the scat-terer consists of only three micro-ellipses around the fo-cus. The two beams intersecting at 90 degrees madethe micro-hologram orientated at 45 degrees as Fig. 2depicts. Fig. 2 also shows triangulation, in which thenumber of triangles is 105,578, and the number of nodalpoints is 53,046. The truncation number N of DtN mapis 115. The CPU time is about 1 hour. Fig. 3 shows theabsolute value of scattered light. It is very interesting tosee that the retrieved light propagating along −x2 axis,which is the direction of object light, can be clearly seenwith relatively stronger intensity as well as transmittedlight along x1 direction as the reference light is incident.It is also interesting to note that some scattering pat-tern that appears in regular scattering from a cylinderor a sphere can be seen with much weaker intensity. Itis worthwhile pointing out that no reflection from theartificial boundary can be observed at all, which meansthe DtN map is very effectively working as a transparentboundary.

– 47 –


Fig. 4. Model B and its triangulation.

4

2

0

-2

-2 0 2 4

-4

-4

Fig. 5. The absolute value of scattered waves in Model B.

4.2 Model B

In the next model, the scatterer ΩB is represented by

ΩB = x ∈ R2; |u1 + u2|

2 ≥ 0.5,

which is created by two counter-propagating Gaussianbeams at 180 degrees along x1 axis:

u1(x1, x2) =1

2

√

xR

q(x1)exp

(

−ik1x

22

2q(x1)

)

exp(ik1x1),

u2(x1, x2) =1

2

√

xR

q(x1)exp

(

−ik1x

22

2q(x1)

)

exp(−ik1x1),

where xR ≈ 1.618. The micro-hologram size correspondsto about 2.2 µm in this case.

As shown in Fig. 4 the scatterer ΩB consists of seven-teen micro-ellipses. The number of triangles is 585,019,and the number of nodal points is 1,169,012. The trun-cation number N of DtN map is 191. The CPU time isabout 3 hours.

Fig. 5 shows the absolute value of the scattered field.

Retrieval of the object light was successfully simulatedwith relatively stronger intensity on the left side of themicro-hologram whereas the reference light is transmit-ting toward the right side. It is very clear that there isno reflection from the artificial boundary.

5. Conclusion

A finite element method with a DtN map was success-fully applied to an optical scattering problem. In compu-tational results, no reflection from the artificial bound-ary was observed, which proved that the DtN map effec-tively reduced an infinite domain problem to a boundeddomain problem even for the case of optical scatteringproblem.

Retrieval of the object light from a micro-hologramwas qualitatively simulated as scattering of an incidentreference light in two different configurations. It was con-firmed that this method can be effectively used for anal-yses of holographic data storage based on the micro-hologram.

Acknowledgments

The third author was supported by the Japan Soci-ety for the Promotion of Science under Grant-in-Aidfor Scientific Research (S), No.20224001and by the Min-istry of Education, Culture, Sports, Science and Tech-nology of Japan under Global COE Program, Mathe-matics for Industry. The fourth author was supportedby the Japan Society for the Promotion of Science un-der Grant-in-Aid for Young Scientists (B), No.18740056.The authors would like to thank Panasonic Communi-cations Co., Ltd. for the support to this study.

References

[1] H. J. Eichler, P. Kuemmel, S. Orlic and A. Wappelt, High-density disk storage by multiplexed microholograms, IEEEJ. Sel. Top. Quantum Electron, 4 (1998), 840–848.

[2] N. Kinoshita, H. Shiino, N. Ishii, N. Shimidzu and K. Kamijo,Integrated simulation technique for volume holographic mem-ory using finite-difference time-domain method, Jpn J. Appl.Phys., 44 (2005), 3503–3507.

[3] J. W. Goodman, Introduction to Fourier Optics, 3rd ed.,Roberts & Company, USA, 2005.

[4] J. L. Volakis, A. Chatterjee and L. C. Kempel, Finite element

method for electromagnetics, IEEE, New York, 1998.[5] D. W. Prather, M. S. Mirotznik and J. N. Mait, Boundary

integral methods applied to the analysis of diffractive opticalelements, J. Opt. Soc. Am. A, 14 (1997), 34–43.

[6] M. S. Mirotznik, D. W. Prather and J. N. Mait, A hybridfinite element-boundary element method for the analysis ofdiffractive elements, J. Modern Optics, 43 (1996), 1309–1321.

[7] J. P. Berenger, A perfectly matched layer for the absorptionof electromagnetic waves, J. Comput. Phys., 114 (1994), 185–200.

[8] J. B. Keller and D. Givoli, Exact non-reflecting boundary con-

ditions, J. Comput. Phys., 82 (1989), 172–192.[9] D.Koyama, Error estimates of the DtN finite element method

for the exterior Helmholtz problem, J. Comput. Appl. Math.,200 (2007), 21–31.

[10] M. Born and E. Wolf, Principles of Optics : ElectromagneticTheory of Propagation, Interference and Diffraction of Light,7th ed., Cambridge Univ. Press, UK, 1999.

– 48 –


Discontinuous Galerkin FEM of hybrid type

Issei Oikawa1 and Fumio Kikuchi1

1 Graduate School of Mathematical Sciences, The University of Tokyo, Tokyo 153-8914, Japan

E-mail oikawa, kikuchi ms.u-tokyo.ac.jp

Received September 25, 2009, Accepted March 28, 2010

Abstract

Recently, the discontinuous Galerkin FEM’s (DGFEM) are widely studied. They use discontin-uous approximate functions, where the discontinuity is dealt with by the Lagrange multiplierand/or interior penalty techniques. Such methods has a merit that various types of approxi-mate functions can be used besides the usual continuous piecewise polynomials, although theband-widths of arising matrices are often much larger than the conventional ones. We herepropose a hybrid displacement type DGFEM for the 2D Poisson equation with some mathe-matical and numerical results. In particular, we can use element matrices and vectors similarto those in the classical FEM.

Keywords Discontinuous Galerkin FEM, hybrid method, stabilization, error analysis


1. Introduction

Considerable attention has been drawn to the discon-tinuous Galerkin FEM’s (DGFEM) [1–3], whose root isreported to be in neutron transportation problems. Theyuse discontinuous approximate functions, where the dis-continuity is dealt with by the Lagrange multiplier and/or interior penalty methods. Such methods have a meritthat various approximate functions besides the usualpiecewise polynomials can be used, and are expected tobe robust to variation of element geometry. However,band-widths of the arising matrices can be much largerthan those of the conventional FEM.

Actually, another origin can be traced to solid me-chanics: the well-known non-conforming and hybridFEM’s use discontinuous approximate field functions.Typical examples of them are Pian’s hybrid stressmethod [4] and Tong’s hybrid displacement one [5, 6].One of the authors also developed a variant of the hybriddisplacement one, and applied it to plate problems [7,8].Such an approach enables the use of conventional el-ement matrices and vectors, although it suffered fromnumerical instability and were not fully successful [9].

Stimulated by rapid development of DGFEM, we pro-pose a DGFEM of hybrid displacement type by stabiliz-ing our old approach. We will show the idea with outlineof theoretical analysis for the 2D Poisson equation as amodel problem, and then give some concrete finite ele-ment models with a few numerical results and observa-tions. Application of our approach to linear elasticity isgiven in [10], and a closely related approach can be foundin [11]. Details of theoretical analysis and modificationof the present approach will be reported in due course.

2. Hybrid displacement formulation

2.1 Model problem

Let us consider the 2D Poisson equation over abounded convex polygonal domain Ω with the homo-

geneous Dirichlet condition on the boundary ∂Ω:

−∆u = f in Ω, u = 0 on ∂Ω, (1)

where ∆ is the Laplacian, and u and f are respectivelyan unknown and a given real-valued functions defined inΩ. The most popular weak formulation for (1) is to useH1

0 (Ω) and to find u ∈ H10 (Ω) s.t., for a given f ∈ L2(Ω),

(∇u,∇v)Ω = (f, v)Ω; ∀v ∈ H10 (Ω), (2)

where ∇ denotes the gradient, and (·, ·)Ω does the in-ner products of both L2(Ω) and L2(Ω)2, with the as-sociated norms designated by ‖ · ‖Ω. Since Ω is convex,u ∈ H2(Ω)∩H1

0 (Ω). For the definitions of L2(Ω), H10 (Ω),

H2(Ω) and various Hilbertian Sobolev spaces, see [2,12].

2.2 Definitions and notations

We first construct a family of triangulations T hh>0

of Ω by polygonal finite elements: each K ∈ T h is anm-polygonal domain (Fig. 1), where m is an integer ≥ 3and can differ withK. Thus the boundary ∂K ofK ∈ T h

is composed of m edges. We assume that m is boundedfrom above independently of T hh>0, K is not “toothin”, and ∂K does not intersect with itself. The di-ameter and measure of K are denoted by hK and |K|,respectively, while the length of an edge e ⊂ ∂K by |e|.Furthermore, h := maxK∈T h hK . The L2 and L2

2 innerproduct and norm for K are written as (·, ·)K and ‖·‖K .We also define the following forms for u, v ∈ L2(∂K):

〈u, v〉∂K =

∫

∂K

u v ds, |v|∂K = 〈v, v〉1/2∂K ,

where ds is the infinitesimal line element on ∂K. Forms〈·, ·〉e and | · |e for each edge e ∈ ∂K are given similarly.

Over T h, we consider the spaces (k = 0, 1, 2, . . . ):

Hk(T h) = v ∈ L2(Ω); v|K ∈ Hk(K) (∀K ∈ T h).

For v ∈ H1(T h) and K ∈ T h, its trace to ∂K is well de-fined as an element of L2(∂K) and is denoted by v|∂K or

– 49 –

JSIAM Letters Vol. 2 (2010) pp.49–52 Issei Oikawa and Fumio Kikuchi

a(x1, y1)a(x2, y2)

a a (xi, yi)

e: edge

a (xi+1, yi+1)

aa(xm−1, ym−1)

a(xm, ym)

K: element

Fig. 1. m-polygonal element K; non-convex case.

simply v, which can be double-valued on edges shared bytwo elements [1, 2]. For v ∈ H2(T h), we can also defineits normal derivative ∂v/∂n as an element of L2(∂K).

On the union Γh of edges in T h, we consider a kind offlux u ∈ L2(Γ

h), which is single-valued on each edgeshared by two elements, unlike various double-valuedfluxes in some DGFEM’s [1,2]. To deal with the bound-ary condition in (1), define a subspace of L2(Γ

h) by

L02(Γ

h) = v ∈ L2(Γh); v |∂Ω = 0.

2.3 Hybrid displacement-type DGFEM

Define a bilinear form B±h (·, · ; ·, ·) by

B±h (u, u; v, v)

=∑

K∈T h

[

(∇u,∇v)K +

⟨

∂u

∂n, v − v

⟩

∂K

±

⟨

u− u,∂v

∂n

⟩

∂K

+∑

e⊂∂K

ηK,e

hK,e

〈u− u, v − v〉e

]

; (3)

∀u, u, v, v ∈ H2(T h) × L2(Γh), where ηK,e > 0 is

the stabilization or interior penalty parameter for e ⊂∂K, hK,e is an edge length parameter such as |e| andhK , and the suffixes + and − of ± denote symmetricand asymmetric forms, respectively. Our old symmetricformulation [7, 8] lacked the penalty term and sufferedfrom numerical instability [9].

In our DGFEM, we prepare a finite element space V h

of the form:

V h = Uh × Uh,

where Uh and Uh are finite-dimensional subspaces ofH2(T h) and L0

2(Γh), respectively. We often use Uh such

that Uh ⊂ L02(Γ

h) ∩ C(Γh) to reduce number of un-knowns associated to Uh.

Then our finite element approximation is: Given f ∈L2(Ω), find uh, uh ∈ V h s.t.

B±h (uh, uh; vh, vh) = (f, vh)Ω; ∀vh, vh ∈ V h. (4)

Fundamental properties of the above formulation suchas the existence and uniqueness of the approximate so-lutions, error estimates, etc. will be discussed later.

2.4 Linear simultaneous equations

From (4), we have linear simultaneous equations ex-actly as in the classical FEM. Although we can deal withit as a whole, interior element unknowns associated toUh can be usually a priori eliminated elementwise (i.e.,by the so-called static condensation) to obtain matricesand vectors similar to the element stiffness matrices and

load vectors of the conventional FEM, cf. [4,9,10]. Thuswe can first construct linear simultaneous equations forelement boundary unknowns to be solved by appropriateFEM codes. Then the interior unknowns are obtainedby the post-processing. On the other hand, in the usualDGFEM’s where the element boundary flux uh is notused, the interior element function uh can be highly cou-pled with that of neighboring elements, so that the linearsimultaneous equations are often more dense than thoseof our hybrid DGFEM.

3. Abstract error analysis

To analyze (4) referring to [1, 2], we should preparesome conditions for Bh = B±

h and V h. To such an end,we need some semi-norms for v, v ∈ H2(T h)×L2(Γ

h):

|v|21,h =∑

K∈T h

|v|21,K , |v|2∗ =∑

K∈T h

∑

e⊂∂K

1

hK,e

|v|2e,

‖v, v‖2h = |v|21,h + |v − v|Γh |2∗ +

∑

K∈T h

h2K |v|22,K , (5)

where |·|k,K (k= 1, 2) are the usual semi-norm of Hk(K)[2, 12]. Clearly, these strongly depend on the triangula-tions. Then let us present the following three conditions.

[Consistency] The exact solution u ∈ H10 (Ω) ∩H2(Ω)

of (2) and its trace u ∈ L02(Γ

h) to Γh satisfy

Bh(u, u; v, v) = (f, v)Ω;∀v, v ∈ H2(T h) × L02(Γ

h).

[Boundedness] There exists a positive constant Cb s.t.

|Bh(u, u; v, v)| ≤ Cb‖u, u‖h‖v, v‖h; (6)

∀h > 0 and ∀u, u, v, v ∈ H2(T h) × L02(Γ

h0 ).

[Coerciveness] There exists a positive constant Cc s.t.

|Bh(vh, vh; vh, vh)| ≥ Cc‖vh, vh‖2h;

∀h > 0 and ∀vh, vh ∈ V h.

Under the above conditions, we can derive the follow-ing theorem essentially following the approach in [1, 2].

Theorem 1 The unique existence and uniform bound-

edness of the approximate solution uh, uh follow from

the boundedness and coerciveness above. Moreover, uti-

lizing the consistency condition as well, we have an error

estimate in the semi-norm ‖ · ‖h (u = trace of u) :

‖u− uh, u− uh‖h

≤

(

1 +Cb

Cc

)

infvh,vh∈V h

‖u− vh, u− vh‖h. (7)

Unfortunately, the above estimate does not give anyinformation on the L2 error estimate ‖u− uh‖Ω at leastexplicitly, so that we introduce one more condition:

[Adjoint consistency] The solution ψ ∈ H10 (Ω) of (2)

for g ∈ L2(Ω), instead of f , and its trace ψ satisfy

Bh(v, v;ψ, ψ) = (v, g)Ω;∀v, v ∈ H2(T h) × L02(Γ

h). (8)

For the symmetric formulation based on B+h , the present

condition reduces to the consistency one, but must beconsidered independently in the asymmetric case.

Now we can use Nitsche’s trick [1, 2, 12] to obtain thefollowing results for the L2 error estimation.

– 50 –


Theorem 2 Under the adjoint consistency with ψ and

ψ in (8) as well as the other three conditions, we have:

‖u− uh‖Ω

≤ Cb‖u− uh, u− uh‖h

× supg∈L2(Ω)\0

infvh,vh∈V h

‖ψ − vh, ψ − vh‖h

‖g‖Ω.

(9)

4. Polygonal Pk-Pk finite elements

As the simplest DGFEM, let us consider the Pk (k ∈N) approximations for both Uh and Uh among variouspossible choices. Thus v ∈ Uh is a single polynomial ineach K and is a discontinuous piecewise polynomial overT h. On the other hand, v ∈ Uh is a one-dimensionalpolynomial on each edge e ⊂ Γh, but we have two possi-bilities for Uh: a continuous space Uh ⊂ C(Γh)∩L0

2(Γh)

and a discontinuous one, i.e., just Uh ⊂ L02(Γ

h). If de-sired, we can use vertices on Γh as nodes, where continu-ity is imposed for the continuous Uh. We can sometimesconsider nodes for Uh, which are used only for auxil-iary purposes in computations unlike in the conventionalFEM. In principle, interior functions in Uh are indepen-dent of edge functions in Uh, and their restrictions to Kare independent of their restrictions to other elements.

For the triangular element with k = 1 and Uh ⊂C(Γh), we can prove that the statically condensed el-ement matrix and vector coincide with those of the clas-sical P1 triangle, though the interior FE solution doesnot necessarily coincide with the classical P1 solution.We can also consider arbitrary m-polygonal elements(m ≥ 3), but larger m may yield poorer results for smallfixed k.

5. Preliminary considerations on error

analysis

To give concrete error estimates of the finite elementschemes in Sec. 4, we must establish the former threeconditions in Sec. 3, and the adjoint consistency if pos-sible, as well as the estimation of the right-hand sideof (7). Since the available spaces are insufficient to de-scribe such processes in detail, we give only preliminaryconsiderations and brief comments below. The theoreti-cal analysis and the obtained results are essentially thesame for the continuous and discontinuous Uh’s.

5.1 Comments on the 4 conditions

As in [1,2], the consistency condition is easy to provefor the present hybrid DGFEM by using the Green for-mula and noting that v in (3) is single-valued on Γh.

To establish the boundedness condition for the presentconcrete schemes, we must assume the boundedness ofthe stabilization parameter ηK,e: there exists a positive

constant η s.t.

ηK,e ≤ η; ∀h > 0,∀K ∈ T h,∀e ⊂ ∂K.

We also use some trace theorems associated to each ele-ment K ∈ T h [1,2], so that we need appropriate regular-ity conditions on the family of triangulations T hh>0.In the cases of triangulations by triangles and quadri-

laterals, we can adopt the regularity conditions statedin [2, 12], but we must perform deeper analysis in othercase, i.e., m-polygonal elements with m ≥ 5. It appears,however, the convexity assumption on the element shapemay be omitted for the present DGFEM [10]. Anyway,we must continue our study further on this issue, andwe restrict our analysis to the established cases of tri-angular and quadrilateral elements, if necessary. As forhK,e, the choice hK,e = |e| is acceptable under appro-priate regularity conditions, but some other choice maybe possible. In general, the obtained constant Cb in (6)depends on T hh>0 and η, but is independent of h > 0.

Unlike the preceding two conditions, the coercivenessis entirely inside the finite element space V h. We needthe regularity conditions of triangulations and the spec-ification of hK,e, but also require the lower boundednessof ηK,e: there exists a positive constant η s.t.

ηK,e ≥ η; ∀h > 0,∀K ∈ T h,∀e ⊂ ∂K.

Just as in [1, 2], the existence of such a constant η isassured, but its concrete value is generally difficult toevaluate. In the asymmetric formulation, however, anypositive value is available as η at least theoretically.

As was already mentioned, the adjoint consistency istrivial for the symmetric formulation, but has not beenshown yet for the asymmetric one. In fact, it does nothold for some asymmetric DGFEM schemes [1].

5.2 Error estimates

Under appropriate regularity conditions on T hh>0,we can expect the following estimate for sufficientlysmooth v [1, 2, 12]: there exist positive constants Ck,s

s.t., ∀h > 0, ∀K ∈ T h, ∀v ∈ Hk+1(K), k = 1, 2, . . . and

s = 1, 2,

infvh∈Uh

|v − vh|s,K ≤ Ck,shk+1−sK |v|k+1,K . (10)

Similarly we can expect: there exists a positive con-

stant C0 s.t., ∀h > 0, ∀K ∈ T h, ∀v ∈ Hk(K) with

k = 1, 2, . . . ,

infvh,vh∈V h

maxe⊂∂K

(|v − vh|e + |v − vh|e) ≤ C0hk− 1

2

K |v|k,K ,

(11)and ∀v ∈ Hk+1(K) with k = 1, 2, . . . ,

infvh∈Uh

maxe⊂∂K

∣

∣

∣

∣

∂v

∂n−∂vh

∂n

∣

∣

∣

∣

e

≤ C0hk− 1

2

K |v|k+1,K . (12)

Now by noting (5), we can estimate the right-handsides of (7) and (9) concretely as follows.

Theorem 3 Under the first three conditions in Sec.3

and estimates (10), (11) and (12), we have, for a smooth

solution u ∈ Hk+1(Ω) ∩H10 (Ω) (k = 1, 2, . . . ),

‖u− uh, u− uh‖h ≤ C1hk‖u‖k+1,Ω,

where C1 is a positive constant independent of u and h(but may be a function of various other constants), and

‖ · ‖k+1,Ω is the norm of Hk+1(Ω). Furthermore, if the

adjoint consistency also holds, we have, with a positive

constant C2 similar to C1,

‖u− uh‖Ω ≤ C2hk+1‖u‖k+1,Ω.

– 51 –


Table 1. Observed orders of errors.

Observed orders∗)Formulations k for V h

|u − uh|1,h ‖u − uh‖Ω

Symmetric 1, 2 O(hk) O(hk+1)

Asymmetric 1, 2 O(hk) O(h2)

*) The integers k in the observed orders above are only

approximate values for the actual slopes. From Figs.2 and 3, we can see that the slopes for larger N areactually close to integral values.

32

2

1

10-2

10-3

84

N

P1−P1 rect. symmetricP1−P1 rect. asymmetric

10-1

L2-e

rro

r

16

Fig. 2. ‖u − uh‖Ω vs. N for P1-P1 rectangles.

2

1

3

1

P2−P2 rect. symmetricP2−P2 rect. asymmetric

321684

N

10-2

10-6

10-1

L2-e

rror

10-3

10-4

10-5

Fig. 3. ‖u − uh‖Ω vs. N for P2-P2 rectangles.

6. Numerical results

We will show some numerical results for a very specialcase of the model problem: Ω =]0, 1[2 (unit square) and

f(x, y) = 2π2 sin(πx) sin(πy).

Then we find that u(x, y) = sin(πx) sin(πy). We considertwo cases for the polynomial degrees: k = 1, 2, and boththe symmetric and asymmetric formulations. The shapesof finite elements are restricted to triangles and rectan-gles, and the triangulations are all uniform: N×N (N ∈N \ 2) square and Friedrichs-Keller ones for rectangu-lar and triangular elements, respectively. As for the inte-rior penalty terms, we take hK,e = |e| or hK,e = |K|/hK ,and ηK,e = η0 > 0. We calculated the finite element so-lutions for various values of N and η0.

Table 1 summarizes the numerically observed error be-haviors with h = 1/N . It is to be noted that the theoryessentially predicts the orders of errors correctly, butthe observed L2 errors for the asymmetric formulation

with k = 1 appear one order higher than the theoreticalone. Similar results are also reported in many literaturessuch as [1,2], but recent numerical experiments for someDGFEM’s in [13] show that such a phenomenon is prob-ably attributed to the uniformness of the meshes.

Figs. 2 and 3 illustrate observed errors in ‖ · ‖Ω versusN for P1-P1 and P2-P2 rectangular elements, where thepenalty terms ηK,e/hK,e are 8N and N for the symmet-ric and asymmetric formulations, respectively. We can-not discuss here the desirable values of the penalty termsnumerically, but a few results were reported in [10].


We have presented a hybrid-type DGFEM and shownsome numerical results. The essential points of erroranalysis were also shown, but we must make clear theregularity conditions of triangulations to discuss the de-pendence of various error constants on the element ge-ometries. We also wish to analyze the adjoint consistencyin the case of the asymmetric formulation. Applicationto more practical problems is a subject of future studies,and we will also formulate and analyze a slightly differ-ent formulation based on the “lifting operator”, whichis already used in some other DGFEM’s [1, 2].

Acknowledgments

The authors would like to thank Prof. B. Cockburn forfruitful discussions. This work was supported by JSPS,Grant-in-Aid for Scientific Research (C) 19540115.

References

[1] D. N. Arnold, F. Brezzi, B. Cockburn and L. D. Marini, Uni-

fied analysis of discontinuous Galerkin methods for ellipticproblems, SIAM J. Numer. Anal., 39 (2002) 1749–1779.

[2] S. C. Brenner and L. R. Scott, The Mathematical Theory ofFinite Element Methods, 3rd ed., Springer, Berlin, 2008.

[3] B. Q. Li, Discontinuous Finite Elements in Fluid Dynamicsand Heat Transfer, Springer, Berlin, 2006.

[4] T. H. H. Pian and C. -C. Wu, Hybrid and Incompatible Finite

Element Methods, Chapman & Hall, Boca Raton, 2005.[5] Y. C. Fung and P. Tong, Classical and Computational Me-

chanics, World Scientific, Singapore, 2001.[6] P. Tong, New displacement hybrid finite element models for

solid continua, Int. J. Numer. Meth. Eng., 2 (1970) 73–83.[7] F. Kikuchi and Y. Ando, A new variational functional for the

finite-element method and its application to plate and shell

problems, Nucl. Eng. Des., 21 (1972) 95–113.[8] F. Kikuchi and Y. Ando, Some finite element solutions for

plate bending problems by simplified hybrid displacement

method, Nucl. Eng. Des., 23 (1972) 155–178.

[9] H. A. Mang and R. H. Gallagher, A critical assessment of thesimplified hybrid displacement method, Int. J. Numer. Meth.

Eng., 11 (1977) 145–167.

[10] F. Kikuchi, K. Ishii and I. Oikawa, Discontinuous GalerkinFEM of hybrid displacement type – development of polygonal

elements –, Theor. Appl. Mech. Jpn, 57 (2009) 395–404.[11] R. Mihara and N. Takeuchi, The three dimension models de-

velopment by using linear displacement fields in HPM, pre-

sented at APCOM’07 in conj. with EPMESC XI, Dec. 3–6,

2007, Kyoto, Japan.[12] P. G. Ciarlet, The Finite Element Method for Elliptic Prob-

lems, 2nd ed., SIAM, Philadelphia, 2002.[13] J. Guzman and B. Riviere, Sub-optimal convergence of non-

symmetric discontinuous Galerkin methods for odd polyno-

mial approximations, J. Sci. Comput., 40 (2009) 273–280.

– 52 –


A circular and radial slit mapping

of unbounded multiply connected domains

Kaname Amano1 and Dai Okano1

1 Department of Electrical and Electronic Engineering and Computer Science, Graduate Schoolof Science and Engineering, Ehime University, Bunkyo-cho, Matsuyama 790-8577, Japan

E-mail amano cs.ehime-u.ac.jp

Received January 5, 2010, Accepted March 28, 2010

Abstract

We propose a numerical method for conformally mapping unbounded multiply connecteddomains onto a canonical domain with a mixture of circular and radial slits. It expressesan analytic function by a linear combination of complex logarithmic functions based on thecharge simulation method, and gives a simple form of approximate mapping function withhigh accuracy. A numerical example shows the effectiveness of our method.

Keywords numerical conformal mapping, charge simulation method, canonical slit domain


1. Introduction

Conformal mappings are familiar in science and en-gineering. However, exact mapping functions are notknown except for a limited class of domains. There-fore, numerical conformal mappings have been studiedfor decades [1–5]. In particular, conformal mappings ofmultiply connected domains attract a renewed interest.

It is known that two domains can be conformallymapped onto each other if, and only if, they agree inconnectivity n, and moreover 3(n − 2) (n ≥ 3) confor-mal invariants called moduli. Hence, canonical domainsthat specify geometric characters without fixing mod-uli are introduced. They often have slits, and the fol-lowing are well known [6]: the parallel slit domain, thecircular slit domain, the radial slit domain, the circlewith concentric circular slits, and the circular ring withconcentric circular slits. Koebe [7] gave 39 examples ofcanonical slit domains, which contained domains witha mixture of circular and radial slits called the circularand radial slit domain. The circular domain, all of whoseboundaries are circles without slits, is another impor-tant canonical domain. DeLlilo et al. [8] recently foundan explicit formula for mapping an unbounded canonicalcircular domain onto the circular and radial slit domainthat consists of infinite products using reflections, andproposed a numerical method using least squares.

We here propose a numerical method for conformallymapping unbounded multiply connected domains exte-rior to closed Jordan curves onto the unbounded circularand radial slit domain. It is an extension of the methodonto the circular slit domain or onto the radial slit do-main [9], which expresses an analytic function by a lin-ear combination of complex logarithmic functions basedon the charge simulation method. The proposed methodgives a simple form of approximate mapping functionwith high accuracy.

D

Cn

w = f (z)

0ζ l j

C1

z l k

Cl

S1

0r1

ESn

θn

Fig. 1. Conformal mapping of an unbounded multiply connecteddomain onto a circular and radial slit domain, together with

charge points and collocation points used in the charge simula-tion method.

2. Problem

Let D be an unbounded domain exterior to closedJordan curves C1, . . . , Cn (n = n1 + n2) in the z plane.Consider the conformal mapping of D onto a circular andradial slit domain E, which is the entire w plane withcircular slits S1, . . . , Sn1

of radii r1, . . . , rn1concentric

to the origin and radial slits Sn1+1, . . . , Sn of argumentsθn1+1, . . . , θn pointing at the origin as shown in Fig. 1. Ifn1 = n (n2 = 0) then E is the circular slit domain, andn2 = n (n1 = 0) the radial slit domain. We suppose thatC1, . . . , Cn are respectively mapped onto S1, . . . , Sn, andboth planes include the point at infinity.

We start from the following [6, 7]:

Theorem 1 For a given domain D, there exists a

unique analytic function w = f(z) such that it (i)

conformally maps D onto E, (ii) satisfies f(0) = 0,f(∞) = ∞ and (iii) has the Laurent expansion near

z = ∞ of the form

f(z) = z + c0 +c1

z+

c2

z2+ · · · . (1)

We aim to construct an approximate function of f(z),together with the constants r1, . . . , rn1

, θn1+1, . . . , θn as

– 53 –

JSIAM Letters Vol. 2 (2010) pp.53–56 Kaname Amano et al.

well. We express the mapping function as

f(z) = z exp a(z), (2)

where a(z) is an analytic function in D. This form im-plies f(0) = 0, f(∞) = ∞, and the following require-ments should be satisfied.

(i) Normalization condition: From (1),

limz→∞

f(z)

z= 1, i.e., a(∞) = 0. (3)

(ii) Boundary condition: f(z) maps C1, . . . , Cn1onto

the circular slits S1, . . . , Sn1of the radii r1, . . . , rn1

andCn1+1, . . . , Cn onto the radial slits Sn1+1, . . . , Sn of thearguments θn1+1, . . . , θn, so that

|f(z)| = rm, i.e., log |z| + Re a(z) = log rm,

z ∈ Cm, m = 1, . . . , n1, (4)

arg f(z) = θm, i.e., arg z + Im a(z) = θm,

z ∈ Cm, m = n1 + 1, . . . , n. (5)

The problem is now to find a(z) satisfying (3), (4) and(5) together with r1, . . . , rn1

, θn1+1, . . . , θn.

3. Numerical method

We apply the charge simulation method to the un-known function by

a(z) ≃ A(z)

= Q0 +

(

n1∑

l=1

+i

n∑

l=n1+1

)

Nl∑

j=1

Qlj log(z − ζlj)

= Q0 +

n1∑

l=1

Nl∑

j=1

Qlj log(z − ζlj)

+ in∑

l=n1+1

Nl∑

j=1

Qlj log(z − ζlj), (6)

where Q0 is an unknown complex constant and Qlj areunknown real constants called the charges. The singularpoints ζlj called the charge points are placed inside Cl,i.e., outside D.

We impose the following requirements on the approx-imate function.

(i) Single-valuedness condition: Eq. (6) is single-valuedif and only if∫

Cm

dA(z)

=

(

n1∑

l=1

+i

n∑

l=n1+1

)

Nl∑

j=1

Qlj

∫

Cm

d log(z − ζlj)

=

2πi

Nm∑

j=1

Qmj = 0 for 1 ≤ m ≤ n1,

−2π

Nm∑

j=1

Qmj = 0 for n1 + 1 ≤ m ≤ n,

because∫

Cm

d log(z − ζlj) =

2πi for l = m,

0 for l 6= m.

So that

Nl∑

j=1

Qlj = 0, l = 1, . . . , n. (7)

(ii) Normalization condition: Form (3), we require

A(∞) = Q0 + limz→∞

(

n1∑

l=1

+in∑

l=n1+1

)

Nl∑

j=1

Qlj log(z − ζlj)

= 0.

So that Q0 = 0 under (7) because

log(z − ζlj) = log z + log

(

1 −ζlj

z

)

,

and

A(z) =

(

n1∑

l=1

+i

n∑

l=n1+1

)

Nl∑

j=1

Qlj log(z − ζlj). (8)

(iii) Collocation condition: We require (8) to satisfythe boundary conditions (4) and (5) collocationally, i.e.,

n1∑

l=1

Nl∑

j=1

Qlj log |zmk − ζlj |

−n∑

l=n1+1

Nl∑

j=1

Qlj arg(zmk − ζlj) − log Rm

= − log |zmk|,

zmk ∈ Cm, k = 1 . . . , Nm, m = 1, . . . , n1, (9)

n1∑

l=1

Nl∑

j=1

Qlj arg(zmk − ζlj)

+n∑

l=n1+1

Nl∑

j=1

Qlj log |zmk − ζlj | − Θm

= − arg zmk,

zmk ∈ Cm, k = 1 . . . , Nm, m = n1 + 1, . . . , n,(10)

where zmk called the collocation points are placed on Cm,and Rm and Θm are approximations to rm and θm.

Eqs. (7), (9) and (10) make up a set of linear equationsfor the unknown constants Qlj and log R1, . . . , log Rn1

,Θn1+1, . . . , Θn. Once they are determined, we obtainA(z) by (8), and an approximate mapping functionF (z) ≃ f(z) by substituting it for a(z) in (2).

We use in computation the principal value of log-arithmic function, i.e., the branch of log z such that−π < arg z ≤ π. Consequently, log(z − ζlj) in (8) hasthe discontinuity of 2πi on the half line ζlj − t | t > 0,which causes discontinuities of A(z) in D. Therefore, wechange the expression (8) into a form that is mathemat-ically equivalent to (8) and is continuous in D when theprincipal value is used. We call an approximate mapping

– 54 –


function using such an expression of A(z) a continuous

scheme. Here we assume that, only for simplicity, eachCl is starlike with respect to its inside point ζl0, andusing (7) rewrite (8) into

A(z) =

(

n1∑

l=1

+i

n∑

l=n1+1

)

Nl∑

j=1

Qlj log(z − ζlj)

−

(

n1∑

l=1

+i

n∑

l=n1+1

)

Nl∑

j=1

Qlj log(z − ζl0)

=

(

n1∑

l=1

+i

n∑

l=n1+1

)

Nl∑

j=1

Qlj logz − ζlj

z − ζl0. (11)

The term log((z − ζlj)/(z − ζl0)) has the discontinuityon the line segment [ζlj , ζl0] inside Cl, and (11) is con-tinuous in D when the principal value is used. We havethe following:

Scheme 2 When each boundary curve Cl is starlike

with respect to its inside point ζl0, a continuous scheme

of the approximate mapping function is given by

F (z) = z expA(z), (12)

A(z) =

(

n1∑

l=1

+in∑

l=n1+1

)

Nl∑

j=1

Qlj logz − ζlj

z − ζl0, (13)

where the unknown constants Qlj, together with log R1,. . . , log Rn1

, Θn1+1, . . . , Θn, are determined by solving

the linear equations

Nl∑

j=1

Qlj = 0, l = 1, . . . , n, (14)

n1∑

l=1

Nl∑

j=1

Qlj log

∣

∣

∣

∣

zmk − ζlj

zmk − ζl0

∣

∣

∣

∣

−n∑

l=n1+1

Nl∑

j=1

Qlj argzmk − ζlj

zmk − ζl0− log Rm

= − log |zmk|,

zmk ∈ Cm, k = 1, . . . , Nm, m = 1, . . . , n1,(15)

n1∑

l=1

Nl∑

j=1

Qlj argzmk − ζlj

zmk − ζl0

+

n∑

l=n1+1

Nl∑

j=1

Qlj log

∣

∣

∣

∣

zmk − ζlj

zmk − ζl0

∣

∣

∣

∣

− Θm

= − arg zmk,

zmk ∈ Cm, k = 1, . . . , Nm, m = n1 + 1, . . . , n.(16)

It should be noted that the solvability of the linearequations to be solved in the charge simulation methodapplied to general multiply connected domains is math-ematically an open problem, together with the conver-gence of the solution. And it also should be noted that ifwe apply the charge simulation method to the unknown

function by

a(z) ≃ A(z) = Q0 +

n∑

l=1

Nl∑

j=1

Qlj log(z − ζlj), (17)

we have another scheme that appears more simple thanScheme 2. But, it may cause ill-conditioning of the lin-ear equations to be solved under some conditions. Wehere do not go into details of this problem. Scheme 2 isexperimentally free from this difficulty.

4. An example

Computations were carried out on a dual Intel Xeon3.06 GHz processor workstation with the Intel Fortrancompiler in double precision working. The IMSL librarywas used for solving linear equations.

The problem domain D is the exterior of three disks,

Cl : |z − ζl0| = ρl,

ρ1 = 1, ρ2 = 0.5, ρ3 = 1.5,

ζl0 = 2 exp2(l − 1)πi

3, l = 1, 2, 3.

Collocation points and charge points are placed by

zlj = ζl0 + ρl exp2(j − 1)πi

N,

ζlj = ζl0 + qρl exp2(j − 1)πi

N,

j = 1, 2, . . . , N, l = 1, 2, 3,

where 0 < q < 1 is a parameter for charge placement.Errors are estimated by

ǫFl= max

z∈Cl

||F (z)| − Rl|, ǫRl= |Rl − Rl(2N)|,

1 ≤ l ≤ n1, (18)

ǫFl= max

z∈Cl

| arg F (z) − Θl|, ǫΘl= |Θl − Θl(2N)|,

n1 + 1 ≤ l ≤ n, (19)

where Rl(2N) and Θl(2N) are the results for 2N simula-tion charges. In practice, ǫFl

is evaluated at 8N pointsuniformly placed on Cl.

Fig. 2 illustrates by square meshes the numerical con-formal mapping of D onto the circular and radial slitdomain (n1 = 1, n2 = 2), where C1 is mapped onto thecircular slit, and C2 and C3 onto the radial slits. Smalldots inside the boundary circles show the charge points.Fig. 2 illustrates also contour lines of log |F (z)| andarg F (z). It is difficult to give them a physical meaningin fluid flow problems or in electrostatic problems. How-ever, in an idealized steady heat flow problem, theymean isotherms (lines of constant temperature) and heatflow lines around a cylinder C1 of thermal conductivityλ = ∞ and two cylinders C2 and C3 of λ = 0 inducedby a point heat source.

Table 1 shows numerical results of the conformal map-ping, where κ is the L1 condition number of the coeffi-cient matrix of linear equations to be solved, and thevalues of R1, Θ2 and Θ3 are shown until a nonzero digitappears in the right-hand side of (18) and (19). We see

– 55 –


4

2

0

-2

-4

-4 -2 0 2 4

4

2

0

-2

-4

-4 -2 0 2 4

4

2

0

-2

-4

-4 -2 0 2 4

4

2

0

-2

-4

-4 -2 0 2 4

Fig. 2. Numerical conformal mapping of D onto the circular andradial slit domain, and contour lines of log |F (z)| and arg F (z).

Table 1. Numerical results of the conformal mapping (q = 0.8).

N ǫF1,2,3ǫR1,Θ2,3

R1, Θ2,3 κ

2.3E−02 7.4E−04 1.4683

16 3.1E−03 1.4E−03 2.393 6.2E+015.9E−02 4.7E−04 −2.2259

3.9E−04 2.9E−05 1.4675732 4.4E−05 3.1E−05 2.39447 7.4E+02

3.8E−03 1.7E−05 −2.22633

1.6E−07 9.7E−09 1.46754061864 1.7E−08 1.1E−08 2.39449861 5.3E+04

2.5E−05 5.6E−09 −2.226352156

4.7E−14 4.2E−14 1.46754060820969

128 4.0E−13 9.3E−15 2.394498619975067 1.3E+081.3E−09 2.0E−14 −2.22635216142479

that errors decay exponentially with respect to N , andhigh accuracy is achieved.

Fig. 3 illustrates the conformal mapping of D ontothe circular slit domain (n1 = 3) and contour lines oflog |F (z)|, which physically mean streamlines of a vortexflow around three cylindrical objects. Fig. 4 illustratesthe conformal mapping of D onto the radial slit domain(n2 = 3) and contour lines of arg F (z), which physicallymean streamlines of a point source flow around threecylindrical objects.


We here illustrated the case of circular domain as atypical example. However, the charge simulation methodis generally suitable for domains of curved boundaries[9], and gives a simple form of approximate mappingfunctions with high accuracy. As we mentioned before,the solvability and the convergence are important prob-lems to be studied mathematically.

Koebe [7] discussed also such bounded canonical slitdomains as a circular disk and a circular ring with amixture of circular and radial slits. In application to

4

2

0

-2

-4

-4 -2 0 2 4

4

2

0

-2

-4

-4 -2 0 2 4

Fig. 3. Numerical conformal mapping of D onto the circular slit

domain, and contour lines of log |F (z)|.

4

2

0

-2

-4

-4 -2 0 2 4

4

2

0

-2

-4

-4 -2 0 2 4

Fig. 4. Numerical conformal mapping of D onto the radial slitdomain, and contour lines of arg F (z).

physics, conformal mappings of bounded domains ontothese canonical slit domains, particularly onto the circu-lar ring with radial slits, are important, because steadyheat flow problems are usually solved as mixed bound-ary value problems in finite domains. They should beexamined in the near future.

Acknowledgments

The authors wish to thank Prof. M. Sugihara (TheUniversity of Tokyo) and Prof. H. Ogata (The Universityof Electro-Communications) for their helpful discussion.

References

[1] D. Gaier, Konstruktive Methoden der konformen Abbildung(in German), Springer, Berlin, 1964.

[2] P. Henrici, Applied and Computational Complex Analysis,

Vol. 3, John Wiley & Sons, New York, 1986.[3] L. N. Trefethen (ed.), Numerical Conformal Mapping, North-

Holland, Amsterdam, 1986; J. Comput. Appl. Math., Vol. 14,

Nos. 1 & 2, 1986.[4] P.K.Kythe, Computational Conformal Mapping, Birkhauser,

Boston, 1998.[5] T. A. Driscoll and L. N. Trefethen, Schwarz-Christoffel Map-

ping, Cambridge Univ. Press, Cambridge, 2002.[6] Z. Nehari, Conformal Mapping, McGraw-Hill, New York,

1952; Dover, New York, 1975.[7] P. Koebe, Abhandlungen zur Theorie der konformen

Abbildung. IV. Abbildung mehrfach zusammenhangender

schlichter Bereiche auf Schlitzbereiche (in German), Acta

Math., 41 (1916), 305–344.

[8] T. K. DeLillo, T. A. Driscoll, A. R. Elcrat and J. A. Pfaltz-graff, Radial and circular slit maps of unbounded multiply

connected circle domains, in: Proc. of R. Soc. A Math. Phys.

Eng. Sci., Vol. 464, pp. 1719–1737, 2008.

[9] K. Amano, A charge simulation method for numerical con-formal mapping onto circular and radial slit domains, SIAM

J. Sci. Comput., 19 (1998), 1169–1187.

– 56 –


A comparative study of principal component analysis

on term structure of interest rates

Nien-Lin Liu1

1 Department of Mathematical Sciences, Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu,Shiga 525-8577, Japan

E-mail gr012087 ed.ritsumei.ac.jp

Received January 10, 2010, Accepted March 31, 2010

Abstract

In this paper, principal component analysis (PCA) is applied to three different parametriza-tion of interest rates: zero rates, yield curve, and forward rates. This comparative study iscomplementary to Akahori, Aoki, and Nagata [1] where they claimed that, under the no-arbitrage principle, yield curve cannot be a random walk. Conversely the forward curve couldbe a random walk. In our result of PCA, however, we observed that of the general beliefs. Ourempirical results on the number of factors for the zero rates and the yield curve align withthe general beliefs. This is a puzzle.

Keywords forward rates, principal component analysis, term structure of interest rates,zero rates

Research Activity Group Mathematical Finance

1. Introduction

With regard to the studies of the term structure ofinterest rates, a principal component analysis (PCA) isa relevant method. Fase [2] used the PCA to accountfor the variance among interest rates. Litterman andScheinkman [3] used PCA to study the volatility of U.S.government bonds. In their empirical analyses, theyfound that there are three principal factors in theyield curve: the level, the steepness and the curvature.Buhler and Zimmermann [4], and Hiraki, Shiraishi andTakezawa [5] also indicated similar conclusions. Otherthan these, there have been plenty of studies applyingPCA to interest rates. Here we referred only a few.

The previous analyses were performed, whether im-plicitly or not, on the basis of a random walk hypothesis(RWH) on the yield curve. The hypothesis is, however,fragile. There could be many alternative random walk

hypotheses depending on the parametrizations since theyield curve is a stochastic process in an infinite di-mensional space. For example, RWH’s on yt(x) andrt(T ) := y(t, T − t) cannot be compatible (To make adistinction, we call the former the yield curve and thelatter zero rate). Also, there could be RWH on forwardrates ∂T r, etc. Then one is naturally led to ask: which

parameterization is consistent with RWH?

No-arbitrage principle might be a one good criterion.It imposes a restriction on the drift. In Akahori, Aoki,and Nagata [1] (AAN model, hereafter), it is shownmathematically that the restriction is not consistentwith RWH on yield curve; while the forward rate modelthey proposed is consistent with no-arbitrage condition.

The object of the present paper is to perform PCA onthe real market data (the daily data of Japanese bondsand American bonds from 2007/6/20 to 2008/3/31 andfrom 2007/5/15 to 2008/3/31 respectively) in three dif-

ferent ways: to the increments of (i) spot rates (ii) yieldcurves, and (iii) forward rates on the basis of the AANmodel. As a result, we obtain the following striking ob-servation:

• The number of factors are two or three in the casesof (i) and (ii) while it is very large when the PCAis applied on the basis of AAN model in (iii).

Since we do not adopt any hypothesis testing style, wewill not make any rigorous remark on the observation.However, the implication of the result could be the fol-lowing: we need to construct a no-arbitrage model otherthan random walk model to explain the result.

In the rest of the paper, after giving the setting inSection 2, we present the results of our PCA in Section3. Some concluding remarks will be made in Section 5.

2. Setting

2.1 Notations and definitions

We first explain the zero rates rt(T ), the yield curveyt(x), and the forward rates Ft(T1, T2) in terms of thezero-coupon bond price P (t, T ). The zero rates rt(T ),by which we mean the spot interest rate during [t, T ], isgiven by

rt(T ) = −log P (t, T )

T − t, t ≤ T,

and the yield curve yt(x) is given by

yt(x) = rt(t + x), x ≥ 0.

Here x represents the time to maturity. The forwardrates are determined in terms of the zero rates as follows:

Ft(T1, T2) :=rt(T2)(T2 − t) − rt(T1)(T1 − t)

T2 − T1. (1)

– 57 –

JSIAM Letters Vol. 2 (2010) pp.57–60 Nien-Lin Liu

Table 1. Japanese zero coupon bond rates. Table 2. American zero coupon bond rates.

r(t, T )∗100% Dates(t) r(t, T )∗100% Dates(t)

Maturities(T )2007/ 2007/ 2007/

· · ·2008/ 2008/

Maturities(T )2007/ 2007/ 2007/

· · ·2008/ 2008/

6/20 6/21 6/22 3/28 3/31 5/15 5/16 5/17 3/28 3/31

2009/6/20 1.041 1.048 1.039 · · · 0.552 0.553 2009/5/15 4.663 4.663 4.743 · · · 1.533 1.4772009/12/20 1.127 1.137 1.127 · · · 0.567 0.567 2009/11/15 4.63 4.628 4.673 · · · 1.596 1.535

2010/6/20 1.205 1.217 1.205 · · · 0.582 0.581 2010/5/15 4.608 4.608 4.658 · · · 1.732 1.673...

.

.....

.

... . .

.

.....

.

.....

.

.....

. . ....

.

..2028/12/20 2.38 2.411 2.391 · · · 2.255 2.28 2028/11/15 5.004 4.996 5.018 · · · 4.651 4.603

Table 3. The proportion of contributions of principle components.

factor 1 2 3 4 5 6 7 8 9

Japanese Eigenvalue 0.0310911 0.0035454 0.0005058 9.728E−5 9.377E−5 3.471E−5 3.125E−5 2.633E−5 2.032E−5

Cumulative of 0.8744 0.9741 0.9883 0.9911 0.9937 0.9947 0.9955 0.9963 0.9969contribution

American Eigenvalue 0.1908361 0.0134829 0.0011317 0.0010874 0.0006348 0.0005454 0.0004731 0.0004488 0.0004122Cumulative of 0.8961 0.9595 0.9648 0.9699 0.9729 0.9754 0.9776 0.9798 0.9817contribution


factor 1 2 3 4 5 6 7 8 9

Japanese Eigenvalue 0.109758 0.0075136 0.0003958 0.000213 4.583E−5 4.298E−5 1.816E−5 1.328E−5 1.163E−5


American Eigenvalue 0.2205559 0.0357081 0.0026116 0.0009109 0.0004769 0.000449 0.0003541 0.0002225 0.0001816


Table 5. Japanese forward rates. Table 6. American forward rates.

forward F (T1, T2) F (T2, T3) F (T3, T4) · · · F (T39, T40) forward F (T1, T2) F (T2, T3) F (T3, T4) · · · F (T39, T40)rates rates

2007/6/20 1.4705301 1.5967143 1.6872459 · · · 2.9668525 2007/5/15 4.4988967 4.4967845 4.0793043 · · · 5.3375217

2007/6/21 1.4920273 1.6183187 1.8175902 · · · 3.0396885 2007/5/16 4.4891413 4.5070055 4.1353261 · · · 4.45409782007/6/22 1.4775574 1.5958571 1.6795137 · · · 3.0196066 2007/5/17 4.395663 4.582337 4.1926413 · · · 3.8926576

..

....

..

....

. . ....

..

....

..

....

. . ....

2008/3/28 0.6038033 0.6340879 0.7399945 · · · 3.3048033 2008/3/28 1.7374076 2.1805746 1.7215435 · · · 7.08867932008/3/31 0.6011202 0.6293846 0.7276557 · · · 3.2890164 2008/3/31 1.6642391 2.125884 1.3967663 · · · 5.3220217

Here the forward rate Ft(T1, T2) represents the interestrate of the period [T1, T2] pre-agreed at time t.

2.2 The data

We investigate the daily data of Japanese bonds andAmerican bonds from 2007/6/20 to 2008/3/31 and from2007/5/15 to 2008/3/31, respectively, which were ob-tained from Bloomberg. The zero rates rt(T ) of Japanesebonds and American bonds are shown in Tables 1and 2. The maturities of Japanese bonds are T1 =2009/6/20, T2 = 2009/12/20, T3 = 2010/6/20, . . . , T40

= 2028/12/20, and the maturities of American bondsare T1 = 2009/5/15, T2 = 2009/11/15, T3 = 2010/5/15,. . . , T40 = 2028/11/15.

Since we use daily data, we set ∆t = 1/365 year, andPCA’s are applied to ∆yt(xj) := yt+∆t(xj)− yt(xj), forj = 1, 2, . . . (precise description of xj ’s will be givenlater), ∆rt(Ti) := rt+∆t(Ti)−rt(Ti) for i = 1, 2, . . . , and∆Ft(Ti, Ti+1) := Ft+∆t(Ti, Ti+1) − Ft(Ti, Ti+1) for i =1, 2, . . . , respectively. Here t runs through “2007/5/15 to2008/8/31” and “2007/6/20 to 2008/8/31” but only theorder is important.

3. The results of PCA’s

3.1 Zero rates

The results of PCA applied to the increments of thezero rates in Tables 1 and 2 are given in Table 3. The

results in Table 3 show that both Japanese and Amer-ican cases need just two factors to reach the 95% level.

3.2 Yield curves

To apply PCA for the yield curves yt(xj) = rt(t +xj) for j = 1, 2, . . . , we need to interpolate the abovedata since we do not have the data of rt(t + xj) fort + xj 6∈ Ti. The formulas for our interpolations arethe following: For (t, xj) with t+xj 6∈ Ti, one can choosesome i such that Ti < t + xj < Ti+1. We then set

yt(xj) = rt(Ti) +t + xj − Ti

Ti+1 − Ti

[rt(Ti+1) − rt(Ti)].

The results of PCA applied to the increments of theyield curves are presented in Table 4.

3.3 Forward rates

First, we calculate the forward rates for the period(Ti, Ti+1) at each time t using formula (1). The resultsof the computation are shown in Tables 5 and 6.

Second, the increments of the forward rates during(t, t + ∆t] of each period [Ti, Ti+1] are calculated andthe results are shown in Tables 7 and 8. The results ofPCA applied to the data in Tables 7 and 8 are shownin Table 9. The numbers of factors needed to reach the95% level for the Japanese and the American case are 27and 24, respectively. This is in sharp contrast with thecommon belief.

– 58 –


Table 7. The increments of Japanese forward rates. Table 8. The increments of American forward rates.

incrementsF (T1, T2) F (T2, T3) F (T3, T4) · · · F (T39, T40)

incrementsF (T1, T2) F (T2, T3) F (T3, T4) · · · F (T39, T40)of forward of forward

rates rates

2007/6/20 0.0214973 0.0216044 0.1303443 · · · 0.0728361 2007/5/15 −0.009755 0.010221 0.0560217 · · · −0.8834242007/6/21 −0.01447 −0.022462 −0.138077 · · · −0.020082 2007/5/16 −0.093478 0.0753315 0.0573152 · · · −0.56144

2007/6/22 −0.011377 −0.00928 −0.018038 · · · −0.007246 2007/5/17 0.1650326 0.0239669 0.0532446 · · · 0.9729728...

.

.

....

.

.

.. . .

.

.

....

.

.

....

.

.

.. . .

.

.

.2008/3/28 0.002377 −0.017995 0.0122022 · · · 0.099623 2008/3/28 −0.055092 −0.121917 0.1809946 · · · 1.13672282008/3/31 −0.002683 −0.004703 −0.012339 · · · −0.015787 2008/3/31 −0.073168 −0.054691 −0.324777 · · · −1.766658


factor 1 2 3 · · · 23 24 25 26 27

Japanese Eigenvalue 0.0428618 0.0258535 0.0203427 · · · 0.0015299 0.0014931 0.0013146 0.0012842 0.0011953Cumulative of 0.2508 0.4021 0.5211 · · · 0.9217 0.9304 0.9381 0.9456 0.9526contribution

American Eigenvalue 1.0707736 0.9374085 0.6408346 · · · 0.0495484 0.0489997 0.0431291 0.0392494 0.0386665

Cumulative of 0.1585 0.2973 0.3922 · · · 0.9485 0.9557 0.9621 0.9679 0.9736contribution

Table 10. Unit root tests for yield curves.

Dickey-Fuller unit root tests

Japanese American

Maturities no intercept intercept no intercept intercept

2y −0.6589 −1.86314 −0.74283 0.448493y −0.72458 −1.92723 −0.68893 0.4762494y −0.72467 −2.13511 −0.62806 −0.051275y −0.74028 −2.15931 −0.45198 0.160343

6y −0.74918 −1.85784 −0.41994 −0.021417y −0.72506 −1.6363 −0.3596 0.0091328y −0.68047 −2.58465 −0.27949 −1.44834

9y −0.54991 −3.31275 −0.21374 −1.5146310y −0.41498 −2.60629 −0.20753 −1.2821711y −0.40898 −8.42181 −0.15945 −2.9365112y −0.40173 −13.4186 −0.14312 −7.73586

13y −0.35195 −14.4515∗ −0.11753 −5.5876614y −0.3145 −19.491∗ −0.10599 −6.8265815y −0.22039 −5.51756 −0.12102 −3.3121716y −0.28959 −37.3714∗∗ −0.10384 −13.3047

17y −0.23697 −41.9922∗∗ −0.08858 −9.2589218y −0.20788 −54.7674∗∗ −0.08608 −9.6953519y −0.18644 −68.3224∗∗ −0.08147 −10.3979

20y −0.19222 −89.3159∗∗ −0.09881 −16.5516∗∗

4. Unit root tests

In modern time-series econometrics, the first questionof interest is whether the time series under considerationis stationary or not. A Dickey-Fuller test, as constructedby Dickey and Fuller [6] (DF statistics, hereafter), is usedto test the null hypothesis for a simple unit root. Thereare several different versions of the Dickey-Fuller test.The models we select are as follow:

(i) no intercept case:

Yt = ρYt−1 + et

(ii) intercept case:

Yt = µ + ρYt−1 + et

for t = 1, 2, . . . , where Y0 = 0, ρ is a real num-ber, and et is a sequence of independent normalrandom variables with mean zero and variance σ2.

Under the null hypothesis that ρ = 1, the time seriesYt converges to a stationary time series as t → ∞, if|ρ| < 1. If |ρ| = 1, the time series is not stationary and

prone to transform by differencing. The time series withρ = 1 is sometimes called a random walk.

Table 10 shows DF statistics for each variable of theyield curves. Note that Symbols ∗ and ∗∗ denote rejec-tions of the hypothesis of a unit root at the 0.05 and0.01 significance levels, respectively.

Tables 11 and 12 show DF statistics for spot rates. Ac-cording to the DF statistics, the null hypothesis cannotbe rejected for the yield curves and the spot rates mod-els. But it was shown in [1] that under the no-arbitrageprinciple, yield curve cannot be a random walk.

Next, Tables 13 and 14 show DF statistics for forwardrates. It also cannot reject the null hypothesis for for-ward rates model around no intercept (zero mean). Thisis consistent with what we expect.

5. Conclusion

The empirical results we obtained in this paper showsthat the number of factors of the forward rates are muchgreater than that of the general beliefs (two or threefactors). Our empirical results on the number of factorsfor the zero rates and the yield curve align with thegeneral beliefs.

The result implies that one cannot stick to both RWHand no arbitrage principle. In the modern finance, how-ever, the latter is indispensable. Then, we should searchfor an alternative model other than random walk modelto explain the observed reduction of the factors inPCA’s.

References

[1] J. Akahori, H. Aoki and Y. Nagata, Generalizations of Ho-Lee’s binomial interest rate model I: from one- to multi-factor,Asia-Pacific Financial Markets, 13 (2006), 151–179.

[2] M. M. G. Fase, A principal components analysis of marketinterest rate in the Netherlands, 1962–1970, European Eco-nomic Review, 4 (1973), 107–134.

[3] R. Litterman and J. Scheinkman, Common factors affectingbond returns, J. Fixed Income, 1 (1991), 54–61.

[4] A. Buhler and H. Zimmermann, A statistical analysis of theterm structure of interest rates in Switzerland and Germany,

J. Fixed Income, 6 (1996), 55–67.[5] T. Hiraki, N. Shiraishi and N. Takezawa, Cointegration, com-

– 59 –


Table 11. Unit root tests for Japanese spot rates.


Japanese

Maturities T1 T2 T3 T4 T5 T6 T7 T8 T9 T10

no intercept −0.72039 −0.7703 −0.79837 −0.82523 −0.82157 −0.81081 −0.80725 −0.78842 −0.78557 −0.78305intercept −1.89375 −1.78964 −1.68189 −1.81843 −1.82641 −1.89039 −2.04766 −1.96641 −1.82708 −1.68507






no intercept −0.19068 −0.17159 −0.15399 −0.13525 −0.11995 −0.10479 −0.09479 −0.08551 −0.07629 −0.07112

intercept −6.38742 −7.32645 −8.21177 −9.40071 −10.7515 −11.9364 −13.2237 −15.2162∗ −16.3058∗ −17.7318∗

Table 12. Unit root tests for American spot rates.


American


no intercept −0.77723 −0.74962 −0.69631 −0.69107 −0.64198 −0.62738 −0.5695 −0.52368 −0.49261 −0.46952intercept 0.513797 0.545814 0.641543 0.729569 0.444239 0.383126 0.199336 0.256097 0.272318 0.197896


no intercept −0.41455 −0.38525 −0.33981 −0.32201 −0.28262 −0.27002 −0.23341 −0.22482 −0.2035 −0.19648intercept 0.262638 0.354254 0.231874 0.10017 −0.21956 −0.42073 −0.90886 −0.95535 −1.15292 −1.24077





Table 13. Unit root tests for Japanese forward rates.


Japanese

Maturities F (T1, T2) F (T2, T3) F (T3, T4) F (T4, T5) F (T5, T6) F (T6, T7) F (T7, T8) F (T8, T9) F (T9, T10) F (T10, T11)



no intercept −0.72556 −0.59346 −0.35604 −0.23414 0.015362 0.09634 0.174599 0.004128 −0.12425 −0.15028intercept −2.55673 −3.94653 −2.99751 −23.6014∗∗ −33.686∗∗ −23.4227∗∗ −6.13967 −22.5949∗∗ −41.2303∗∗ −31.8423∗∗


no intercept −0.16864 −0.09409 0.039413 0.092443 0.118178 −0.00783 −0.02721 0.023392 0.113108 0.179216intercept −38.0905∗∗ −48.3496∗∗ −71.5954∗∗ −35.4532∗∗ −24.3048∗∗ −49.887∗∗ −33.8321∗∗ −37.0593∗∗ −30.527∗∗ −25.3252∗∗

Maturities F (T31, T32) F (T32, T33) F (T33, T34) F (T34, T35) F (T35, T36) F (T36, T37) F (T37, T38) F (T38, T39) F (T39, T40)

no intercept 0.242504 0.23077 0.267938 0.215193 0.242535 0.132529 0.106403 0.12901 0.022555nnintercept −5.57133 −4.662 −7.85187 −14.8512∗ −10.8391 −7.84777 −25.1263∗∗ −14.1574∗∗ −27.0239∗∗

Table 14. Unit root tests for American forward rates.


American


no intercept −0.68414 −0.56461 −0.68345 −0.38688 −0.72623 −0.37334 −0.38331 −0.3508 −0.39315 −0.10362

intercept 0.454597 0.047973 0.287712 −18.0873∗ −10.2475 −4.03377 −1.64815 −16.1725∗ −8.67673 −28.9328∗∗


no intercept −0.18101 0.023764 −0.16378 0.208544 −0.19323 0.209096 −0.29296 0.00976 −0.24565 −0.13813

intercept −26.6848∗∗ −26.6008∗∗ −30.2564∗∗ −20.6569∗∗ −29.4079∗∗ −14.4095∗ −25.1627∗∗ −72.1779∗∗ −47.7359∗∗ −20.9981∗∗


no intercept 0.011545 −0.06827 −0.37185 −0.49381 0.004055 −0.13014 −0.27221 −0.61188 −0.1716 −0.60276

intercept −21.2924∗∗ −56.8994∗∗ −128.755∗∗ −44.7746∗∗ −46.6104∗∗ −58.7658∗∗ −78.198∗∗ −56.2056∗∗ −97.9731∗∗ −136.005∗∗

Maturities F (T31, T32) F (T32, T33) F (T33, T34) F (T34, T35) F (T35, T36) F (T36, T37) F (T37, T38) F (T38, T39) F (T39, T40)

no intercept −0.64008 −0.53992 −0.25344 −0.44919 −0.35301 −0.67241 −0.61755 −1.14499 −1.04986

intercept −193.367∗∗ −63.7546∗∗ −35.6144∗∗ −57.5706∗∗ −130.449∗∗ −40.1184∗∗ −70.8578∗∗ −81.9102∗∗ −68.0185∗∗

mon factors, and the term structure of yen offshore interestrates, J. Fixed Income, 6 (1996), 69–75.

[6] D. A. Dickey and W. A. Fuller, Distribution of the estimationfor autoregressive time series with a unit root, J. Am. Stat.

Assoc., 74 (1979), 427–431.

– 60 –


Excluded volume effect in queueing theory

Daichi Yanagisawa1,2, Akiyasu Tomoeda3,4, Rui Jiang5 and Katsuhiro Nishinari4,6,1

1 Department of Aeronautics and Astronautics, School of Engineering, The University of Tokyo,Tokyo 113-8656, Japan

2 Research Fellow of the Japan Society for the Promotion of Science, Tokyo 102-8471, Japan3 Meiji Institute for Advanced Study of Mathematical Sciences, Meiji University, Kanagawa214-8571, Japan

4 Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo 153-8904, Japan

5 School of Engineering Science, University of Science and Technology of China, Hefei 230026,China

6 PRESTO, Japan Science and Technology Agency, Tokyo 102-0075, Japan

E-mail tt087068 mail.ecc.u-tokyo.ac.jp

Received December 6, 2009, Accepted March 1, 2010

Abstract

We have introduced excluded volume effect, which is an important factor to model a realisticpedestrian queue, into queueing theory. The probability distributions of pedestrian numberand pedestrian waiting time in a queue have been calculated exactly. Due to time needed toclose up the queue, the mean number of pedestrians increases as pedestrian arrival probability(λ) and leaving probability (µ) increase even if the ratio between them (i.e., ρ = λ/µ) remainsconstant. Furthermore, at a given ρ, the mean waiting time does not increase monotonicallywith the service time (which is inverse to µ), a minimum could be reached instead.

Keywords queueing theory, asymmetric simple exclusion process, pedestrian dynamics

Research Activity Group Applied Integrable Systems

1. Introduction

Queueing theory [1] is one of the most famous and im-portant theory in these days since it is applied to manysystems in the real world such as traffic systems [2], pro-duction systems [3], networks [4], and so on. The mathe-matical formulation for mean waiting time, which is usu-ally calculated by Little’s theorem [1], is widely used dueto its simplicity. In the queueing theory, the state of aqueue is represented by the number of jobs, which are ve-hicles, pedestrians, and packets in networks. When thereare some jobs in the queue, one job is always receivingservice, and when it leaves the queue, the service for nextjob starts immediately. This phenomenon is suitable fora queue of packets since operation for next packet startsinstantly by a computer. However, it is not realistic fora queue of vehicles and pedestrians since there is a delayof moving to service window due to the excluded volumeeffect which is not included in the queueing theory.

The excluded volume effect is studied in detail by ana-lyzing the asymmetric simple exclusion process (ASEP)[5]. Many traffic models and pedestrian dynamics mod-els are developed by extending ASEP [6, 7]. They arevery successful since the excluded volume effect worksadequately to represent real movement of vehicles andpedestrians. Therefore, we introduce the excluded vol-ume effect into the queueing theory for the first time inthis paper to develop a practical theory for a pedestrianqueue. The probability distributions and the means ofboth pedestrian number and pedestrian waiting time ina queue are calculated exactly and compared with those

obtained from normal queueing models.

2. Models and mathematical analysis

2.1 Outline of the three queueing models

As a comparison, in addition to our excluded vol-ume effect introduced queue (E-Queue), we briefly re-view normal queue (N-Queue) with continuous time (N-Queue (C)) (which is the most famous queueing modelknown as M/M/1 [1]), as well as that with discrete time(N-Queue (P)). In E-Queue and N-Queue (P), paral-lel update is adopted because it is realistic for one di-mensional pedestrian dynamics [8]. Fig. 1 is a schematicview of N-Queue (P) and E-Queue. At each time step apedestrian arrives at the queue with probability λ and apedestrian at the service window (pedestrian A) leavesthe queue with probability µ. In N-Queue (P) (Fig. 1(a)), pedestrian B moves to the service window as soonas pedestrian A leaves the queue. In contrast, pedestrianB cannot move to the service window at time step t+1 inE-Queue (Fig. 1 (b)) due to the excluded volume effect,he/she moves there at time step t + 2.

2.2 Master equations for N-Queue (P)

Since mathematical analysis on N-Queue (C) is de-scribed in detail in many books [1], we start from N-Queue (P). The master equations in the stationary stateare described as follows:

P (0) = (1 − λ)P (0) + (1 − λ)µP (1), (1)

P (1) = λP (0) + (1 − λ)µP (2)

– 61 –

JSIAM Letters Vol. 2 (2010) pp.61–64 Daichi Yanagisawa et al.

(a) <N-Queue (P)> (b) <E-Queue>Servicewindow

Servicewindow

D ABC

(1−λ)μ (1−λ)μ

1−λ

t

t+1

D ABC

t+2

D BC D BC

D BC

λ λμ μ

Fig. 1. Schematic views of time variation of queueing states. (a)N-Queue (P). (b) E-Queue. The cell at the right end in thequeue is the service window. λ ∈ [0, 1] and µ ∈ [0, 1] representthe arrival probability and the service probability, respectively.

2 1

12

12

12

Group A Group B

PA(2) PB(2)

Fig. 2. Schematic views of the stationary states of E-Queue inthe case n = 2. (a) Group A. The service window is occupied by

a pedestrian. (b) Group B. The service window is vacant.

n n 1

1n−1 n−1 1

1

1

(1−λ)(1−μ)

λ(1−μ) λ

1−λ

λμ

PA(n−1) PB(n−1)

PB(n)

PA(n+1)

PA(n)

Group A Group B

n+1

(1−λ)μ

Fig. 3. State transition diagram of E-Queue.

+ [λµ + (1 − λ)(1 − µ)]P (1), (2)

P (n) = λ(1 − µ)P (n − 1) + (1 − λ)µP (n + 1)

+ [λµ + (1 − λ)(1 − µ)]P (n) (n ≥ 2), (3)

where P (n) represents the probability that there are n ∈[0,∞) pedestrians in the queue. Note that the stationarystate exits only when λ < λcr is satisfied. λcr is a criticalvalue of λ, and when λ ≥ λcr, queue length tends toinfinity. By solving these recurrence equations amongthree terms, we obtain P (n) and PW (t) (t ∈ [0,∞))(Probability distribution of the waiting time, i.e., timebetween a pedestrian arrives at the queue and leavesthere). N =

∑∞n=0 nP (n) (Mean number of pedestrians

in the queue) and W =∑∞

t=0 tPW (t) (Mean waitingtime) are calculated as shown in Table 1.

2.3 Exact solution for E-Queue

In E-Queue, the state is determined not only by pedes-trian number n. Fortunately, due to deterministic move-

ment of pedestrians in the queue (i.e., pedestrians moveone cell in one time step if their proceeding cell is va-cant), two consecutive vacant cells never appear in thestationary state. As a result, there are 2n states whenthere are n pedestrians in the queue since we only needto consider whether there is a vacant cell or not in frontof each pedestrian. Schematic views of the stationarystates in the case n = 2 are depicted in Fig. 2.

Our target in this paper is to obtain probability dis-tributions of pedestrian number and pedestrian waitingtime, so that the 2n states do not need to be distin-guished completely. The important point is that whetherthe service window is occupied or not. Thus, the 2n

states are divided into two groups A and B. The ser-vice window is occupied in group A and it is vacant ingroup B. For instance, two states belonging to group A,and the other two states belonging to group B in thecase n = 2 as shown in Fig. 2.

We describe the sum of the probabilities of the station-ary states in group A as PA(n) and that in group B asPB(n) when there are n pedestrians in the queue. Thus,P (n) = PA(n) + PB(n). Note that we have PA(0) = 0and PB(0) = P (0). The state transition diagram of E-Queue is depicted as Fig. 3 and the master equations inthe stationary state are described as follows:

PA(1) = (1 − λ)(1 − µ)PA(1) + λPB(0) + (1 − λ)PB(1),(4)

PA(n) = λ(1 − µ)PA(n − 1) + (1 − λ)(1 − µ)PA(n)

+ λPB(n − 1) + (1 − λ)PB(n) (n ≥ 2),(5)

PB(0) = (1 − λ)PB(0) + (1 − λ)µPA(1), (6)

PB(n) = λµPA(n) + (1 − λ)µPA(n + 1) (n ≥ 1). (7)

These equations could also be obtained by reducing themaster equations where all 2n states are distinguished[9]. Solving the equations with normalization condition∑∞

n=0 P (n) = 1, we obtain the solutions in Table 1 inthe case λ < λcr. The probability distribution of thewaiting time PW (t) is also calculated as

PW (t)

= f(t, 1)P (0) +

Q(t,2)∑

n=1

[f(t − n, n)µPA(n)]

+

Q(t−1,2)∑

n=1

f(t − n, n + 1) [(1−µ)PA(n)+PB(n)]

(8)

where Q(a, b) returns a quotient of a/b, and

f(t, n) =

(

t − 1n − 1

)

µn(1 − µ)t−n (9)

is the negative binomial distribution. The means, i.e., Nand W , are also obtained and described in Table 1.

3. N-Queue v.s. E-Queue

In this section, we compare physical quantities of thethree queueing models. Note that λ and µ are probabili-ties in N-Queue (P) and E-Queue, while they are rates in

– 62 –


Table 1. Mathematical formulations of physical quantities of three queueing models. The parameter ρ = λ/µ represents the ratiobetween the mean service time (1/µ) and the mean arrival time (1/λ). The expressions of P (n) and PW (t) in the table are valid for

N-Queue (C) when n ≥ 0 and t ≥ 0, respectively, whereas, those are valid in the two parallel update’s queues only when n ≥ 1 andt ≥ 1. PW (0) = 0 for N-Queue (P) and E-Queue. Note that the Little’s theorem N = λW is satisfied in all three models.

Type N-Queue (C) N-Queue (P) E-Queue Other Expressions for E-Queue

PA(n) = rn−1λ

(1 − λ)µP (0)

λcr µ µµ

1 + µ

P (0) 1 − ρ 1 − ρ 1 −ρ

1 − λ

PB(n) = rn−1λ2

(1 − λ)2µP (0)

P (n) (1 − ρ)ρn1 − ρ

1 − µ

„

1 − µ

1 − λρ

«

n„

1 −ρ

1 − λ

«

rn

1 − µ + λµ

Nρ

1 − ρ(1 − λ)

ρ

1 − ρ

ρ

1 −ρ

1 − λ

r =1 − µ + λµ

(1 − λ)2ρ

PW (t) µ(1 − ρ) exp(−µ(1 − ρ)t)µ(1 − ρ)

1 − λ

„

1 − µ

1 − λ

«

t−1 „

µ −λ

1 − λ

« „

1

1 − λ− µ

«

t−1

W1

λ

ρ

1 − ρ

1 − λ

λ

ρ

1 − ρ

ρ

λ

„

1 −ρ

1 − λ

«

N-Queue (C). Besides, t is discrete in the former models,whereas it is continuous in the latter one. However, if weregard one time step in the former two models as unittime in N-Queue (C), we can compare the physical quan-tities of the models in the same time scale. The physicalquantities are described by three parameters, which areλ, µ, and ρ as in Table 1. However, since ρ = λ/µ, thereare only two independent variables; thus, we use ρ and µas independent ones in the following. Then, λ becomesa function of ρ and µ described as λ(ρ, µ) = ρµ.

3.1 Critical value λcr

In N-Queues, the critical value λcr = µ. In E-Queue,λcr = µ/(1 + µ). Since µ/(1 + µ) ≤ µ, the region wherestationary state exists in ρ − µ space is smaller in E-Queue than in N-Queues. Due to time needed to closeup vacant cells, which equals to one time step, the lengthof E-Queue diverges easier than that of N-Queues.

3.2 Mean pedestrian number in a queue

This subsection focuses on mean pedestrian numberN in a queue, which are denoted as N1, N2 and N3 inN-Queue (C), N-Queue (P), and E-Queue, respectively.As can be seen from Table 1 and Fig. 4 (a),

(i) At given µ, N1, N2 and N3 increase monotonicallywith the increase of ρ, which is defined as ρ = λ/µ.

(ii) At given ρ, N1 = N2 = N3 when µ → 0 (whichimplies λ → 0). With the increase of µ, N1 remainsunchanged, N2 decreases and N3 increases.

In N-Queue (C), in an infinitesimal time interval ∆t,the arrival probability and the leaving probability areλ∆t and µ∆t, which tends to zero. Thus, changing λand µ implies a rescaling of time interval, which doesnot change probability distribution of pedestrian num-ber. Therefore, N1 is independent of µ. In N-Queue (P),P (0) is independent of µ; however, since the common ra-tio in P (n)((1−µ)/(1−ρµ)ρ) decreases with the increaseof µ, P (n) becomes narrower and higher as in Fig. 4 (b).Consequently, N2 decreases as µ increases. In E-Queue,with the increase of µ, P (0) dramatically decreases asin Fig. 4 (b) and the common ratio r increases when

N-Queue (C)

N-Queue (P) (μ=0.2)


(a)

E-Queue (μ=0.2)

E-Queue (μ=0.4)

0.0 0.2 0.4 0.6 0.8 1.00

2

4

6

8

10

ρ=λ / μ

N

N-Queue (C)



(b) ρ=0.7

E-Queue (μ=0.2)

E-Queue (μ=0.4)

0 2 4 6 80.00

0.05

0.10

0.15

0.20

0.25

0.30

n

P(n

)

Fig. 4. (a) Mean number of pedestrians N against the ratio be-

tween mean service and arrival time ρ. (b) Probability distribu-tions of pedestrian number in a queue.

ρ > 1/(2 + µ), so that P (n) becomes wider and lower;thus, N3 increases. We would also like to explain thisphenomenon intuitively. A pedestrian takes “the time ofclosing up” plus “the service time”, denoted as an “ex-tended service time”, to go through the service window.When µ increases the extended service time does notsufficiently decrease since the time of closing up remainsas a constant. At given ρ, the increase of µ implies theincrease of λ, hence, N3 increases as µ increases.

– 63 –


N-Queue (C)

N-Queue (P)

E-Queue

ρ = 0.8

0 5 10 15 20 25 300

50

100

150

200

1 / μ

W

Fig. 5. Mean waiting time W against the mean service time 1/µ.

3.3 Mean waiting time in a queue

Fig. 5 shows the variation of W against 1/µ, whichis a mean service time, in the case ρ is constant. Withthe increase of 1/µ, W increases linearly in N-Queue(C) and quasi-linearly in N-Queue (P), which coincideswith our intuition. W also increases quasi-linearly in E-Queue when 1/µ is large; however, when 1/µ is small, itsurprisingly achieves minimum Wmin(= ρ/(1−

√ρ)2) at

1/µmin(= ρ/(1−√

ρ)), increases as 1/µ further decreases,and diverges at 1/µcr(= ρ/(1 − ρ)). Since the Little’stheorem N = λW [1] is satisfied in all three models,W = (1/µ)(N/ρ). At a given ρ, with the increase of 1/µ,N does not change in N-Queue (C) and increases in N-Queue (P), thus W increases in both cases. In contrast,N decreases with the increase of 1/µ in E-Queue, hence,the minimum could be reached.

As we have seen in Fig. 5, E-Queue becomes similar toN-Queues when service time is large and different fromit when service time is small. Thus, it is useful to knowquantitatively when we should consider the excluded vol-ume effect from the perspective of application. In Fig.6, the ρ-µ plane is divided by the curve R = 1.1, whereR is a ratio between W in N-Queue (C) and E-Queuedescribed as R(ρ, µ) = WE-Queue/WN-Queue (C). In thelower-left region in Fig. 6 R < 1.1, and the difference ofW is not critically large, so that it may be allowed touse N-Queue (C) for simple calculation when both ρ andµ are small. In contrast, R > 1.1 and the difference iscrucial in the upper-right region, therefore, the excludedvolume effect should be considered when both ρ and µare large. Note that Fig. 6 is an example of the dividingcurve, and it is possible to depict the other curves by de-termining R as a different value. Thus, this diagram ishelpful to know the error quantitatively and make a de-cision whether to use N-Queue or E-Queue for designinga queueing system for pedestrians.

4. Conclusion

In this paper, we have newly introduced the excludedvolume effect into the queueing model and obtained theprobability distributions and the means of the number ofpedestrians and the waiting time exactly. Such physicalquantities of the new model are compared with thoseof previous models which do not include the excluded

R>1.1

R<1.1

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

ρ = λ / μ

μ

Fig. 6. Curve of R = 1.1 on the ρ − µ plane.

volume effect. When the service time is large enough,all models become similar; however, when the servicetime becomes small, the time of closing up the queuedominates and the waiting time surprisingly increases inthe excluded volume effect introduced queueing model.We also obtain the diagram to examine when the effectof the excluded volume effect becomes prominent.

To construct a more realistic model, the movementin the queue should be stochastic since pedestrians donot homogenously close up the queue. Furthermore, thelength of a queue should be also calculated exactly, andthe validity of the model should be verified by real ex-periments or observations in the near future.

Acknowledgments

We would like to mention that this work is financiallysupported by JSPS and JST.

References

[1] G. Bolch, S. Greiner, H. de Meer and K. S. Trivedi, Queue-ing Networks and Markov Chains, Wiley-Interscience, U.S.A.,

1998.[2] D. Helbing, R. Jiang and M. Treiber, Analytical investigation

of oscillations in intersecting flows of pedestrian and vehicle

traffic, Phys. Rev. E, 72 (2005), 046130.[3] D. Helbing, M. Treiber and A. Kesting, Understanding inter-

arrival and interdeparture time statistics from interactions inqueueing systems, Physica A, 363 (2006), 62–72.

[4] S. Kasahara, Internet traffic modeling -towards queueing the-ory for the internet design- (in Japanese), IEICE Tech. Rep.,NS2001-217 (2002), 25–30.

[5] R. A. Blythe and M. R. Evans, Nonequilibrium steady states

of matrix product form: a solver’s guide, J. Phys. A: Math.Theor., 40 (2007), R333.

[6] D. Chowdhury, L. Santen and A. Schadschneider, Statistical

physics of vehicular traffic and some related systems, Phys.Rep., 329 (2000), 199.

[7] D. Helbing, Traffic and related self-driven many-particle sys-tems, Rev. Mod. Phys., 73 (2001), 1067–1141.

[8] C. Rogsch, A. Schadschneider, A. Seyfried and W. Klingsch,How to select the “Right One” - update schemes for pedes-trian movement in simulation and reality, in: Proc. of Trafficand Granular Flow ’09, to be published.

[9] D. Yanagisawa et al., in preparation.

– 64 –


Modeling of contagious downgrades

and its application to multi-downgrade protection

Hidetoshi Nakagawa1

1 Graduate School of International Corporate Strategy, Hitotsubashi University, 2-1-2 Hitot-subashi, Chiyoda-ku, Tokyo 101-8439, Japan

E-mail hnakagawa ics.hit-u.ac.jp

Received March 17, 2010, Accepted April 10, 2010

Abstract

In this paper, we use a multivariate affine jump process to model the downgrade intensitiesfor several categories of business sector in credit portfolios. Since multivariate affine jumpstructure enables us to consider self-exciting effects as well as mutually exciting effects, themodel can explain the downgrade clusters observed in the Japanese market. Also, we proposea new credit derivative named multi-downgrade protection (MDP) as an application of ourmodel and discuss its fair pricing.

Keywords downgrade risk, mutually exciting intensity model, downgrade protection


1. Introduction

In this paper, we present a new modeling of downgraderisk that has been minor to default risk in the literatureof credit risk study by using the top-down approachframework introduced by [1]. More specifically, we applya multivariate affine jump process (see [2]) or a general-ized mutually exciting Hawkes model (see [3]) to specifythe downgrade intensities for several categories of busi-ness sector in credit portfolios.

Fig.1 shows the trajectory of monthly numbers ofcategory-by-category downgrades announced by Ratingand Investment Information, Inc. (R&I) during April1998 to September 2009. At first glance, we can see thatthere are more downgrades or downgrade clusters fromMay 1998 to August 1999, the first half in 2002, andafter the second half of 2008 than other months.

One interpretation of these downgrade clusters is thatdowngrade risk is likely to be contagious in the sensethat one downgrade in one category may have some in-fluences not only on the downgrade intensity of the samecategory but also on those of the other categories. There-fore it looks natural to select a multivariate affine jumpprocess so as to model the downgrade intensities sincemultivariate affine jump structure enables us to considerself-exciting effects as well as mutually exciting effects.Dynamic rating transition is usually modeled by ratingtransition intensity matrix, but it seems difficult to usethe rating transition matrix to consider dynamic conta-gion of downgrade risk. Though a non-Markov frame-works is often used for modeling dependence like conta-gion, some strong assumptions are necessary to achievethe rating transition probability matrix via rating tran-sition intensity matrix as is pointed out in Chap. 6 of [4]and in Chap. 8 of [5].

As an application of our downgrade risk model, wepropose a new credit derivative named multi-downgrade

year

Num

ber

per

month

19

98

20

00

20

02

20

04

20

06

20

08

20

10

0

10

20

30

40FinancialGroup AGroup B

Fig. 1. Trajectory of monthly numbers of category-by-categorydowngrades announced by R&I during April 1998 to September2009. Group A consists of the industry sectors of Communi-cations, Consumer-Cyclical, Industrial and Technology, whichseems more influenced by business fluctuation, while Group Bconsists of Basic Materials, Consumer-Non-cyclical, Energy andUtilities, which seem less influenced by business fluctuation. In

all, 1,011 downgrades are observed. There are 263 downgradesare in Fin. category, 562 in Gr.A and 186 in Gr.B.

protection (MDP) that can be a efficient risk hedgingtool for large corporate bond portfolios. We also discussthe pricing of MDP under some simple assumptions.

Simply put, MDP is supposed to be the contract thatthe protection seller pays to the buyer the amount ac-cording to the pre-agreed rule over and over again when-ever the particular type of downgrade (for example,downgrade from the investment grade to the speculativegrade in Gr.A category) happens in the underlying port-folio during the predetermined period. From a practical

– 65 –

JSIAM Letters Vol. 2 (2010) pp.65–68 Hidetoshi Nakagawa

view, MDP seems more useful to manage downgrade riskof the portfolio that consists of a wide variety and num-ber of corporate bonds, because it seems more impor-tant in total risk management to consider how manydowngrades will happen rather than which bond will bedowngraded.

Indeed, we can finally achieve the consequence thatthe (conditional) expectation of the future downgradecount is essential to evaluate MDP under some assump-tions. Also, due to the general theory for multivariateaffine jump-diffusion process studied in [6] and [2], wecan easily compute the expectation of the future down-grade count based on our downgrade intensities. Weshow some numerical illustrations of computing the ex-pected future downgrade count that is related to theMDP pricing.

2. Modeling of downgrade intensities

We will model contagious downgrade risk with a mul-tivariate affine jump process, which is a slight general-ization of self-exciting intensity studied in [2].

Let (Ω,F , P ) be a complete probability space and (Ft)be the filtration that makes any processes appeared inthis paper adapted.

For some m ∈ N, let 0 (= τ i0) < τ i

1 < τ i2 < · · · (i = 1,

· · · ,m) be (Ft)-adapted point processes, that is, in-creasing sequences of (Ft)-stopping times. τ i

k is regardedas the time when k-th event of type i happens. Also,N1

t , · · · , Nmt are counting processes associated with the

point processes τ1kk∈N, · · · , τm

k k∈N, respectively.Suppose that [N i, N j ]t = 0 almost surely for any i, j

(i 6= j).Next, L1

t , · · · , Lmt are (Ft)-adapted pure jump pro-

cesses whose jump times coincide with those of N1t ,

· · · , Nmt . More specifically, for each i, Li

t can be char-acterized by independently and identically distributedrandom variables ηi

1, ηi2, · · · , that is,

Lit :=

Ni

t∑

k=1

ηik.

Here we suppose that for any k ∈ N and i = 1, · · · ,m,ηi

k is Fτ i

k

-measurable.

Then we need to specify the intensity process Xit as-

sociated with N it or equivalently Li

t, namely, an (Ft)-progressively measurable non-negative process so thatthe process M i

t defined by

M it := N i

t −

∫ t

0

Xisds

is an (Ft)-martingale.In this study, we aim to model the intensities so that

for any i, Xit can be influenced not only by occurrences

of type i event itself (namely “self-exciting” effect), butalso by the events other than type i (namely “mutuallyexciting” effect).

Now, we specify the mutually exciting downgrade in-tensity model as follows. Let m = 3. Hereafter, we regardthe super-indices 1, 2 and 3 as the downgrade of Fin.,Gr.A and Gr.B categories, respectively. We view Li

t as

the process of the cumulative number of type i events upto time t. This means that the jump size ηi

k is equal tothe number of type i events which coincidentally happenat time τ i

k.

We also assume that ~Xt =t(X1

t ,X2t ,X3

t ) satisfies thefollowing affine-jump type equation.

dX1t

dX2t

dX3t

=

κ1(c1 − X1t )

κ2(c2 − X2t )

κ3(c3 − X3t )

dt +

3∑

i=1

ξ1,i

ξ2,i

ξ3,i

dLit,

(1)

where κj , cj , ξj,ii=1,2,3 (j = 1, 2, 3) and the initial

value Xj0 are all non-negative constant parameters. This

specification can be regarded as a little generalization ofmutually exciting Hawkes model [3].

Note that Xjt (j = 1, 2, 3) can be represented as

Xjt = cj + e−κjt(Xj

0 − cj) +

∫ t

0

e−κj(t−s)3

∑

i=1

ξj,idLis.

(2)

Moreover we remark that Xjt ≥ mincj ,Xj

0 ≥ 0 forany t ≥ 0 provided that only positive jumps are allowedfor every Li

t, that is, P (ηik ≥ 0) = 1 for any k ∈ N. This

immediately follows from (2) and the assumption thatall the parameters are non-negative.

3. An application: valuation of multi-

downgrade protection

Here we simply define a new derivative named multi-downgrade protection (MDP) by the over-the-countercontract that the protection seller has to pay the buyersome amount according to the predetermined rule everytime a particular type of downgrade occurs in the un-derlying portfolio, independent of the individual namedowngraded. While [7] and [8] discuss a single downgradeprotection, no multi-downgrade case has been consid-ered.

We give the mathematical description of MDP here-after.

Suppose that Q is a risk-neutral probability measureand fixed. Denote by rt the instantaneous default-free in-terest rate and by Λ(t, s) = exp(−

∫ s

trudu) the discount

factor from time s (≥ t) to time t. Then the price denotedby Z(t, T ) at time t of the default-free zero-coupon bondwith maturity T is specified by EQ[Λ(t, T )|Ft].

Let N ·t and L·

t be respectively the counting processof the times and the cumulative number of a particulartype of downgrade up to time t.

Denote by CTt a (Ft)-predictable continuous process

that stands for the protection payoff at the time t whenone target downgrade happens before the contract ma-turity T .

Then the risk-neutral value V ·,Tt at time t of a pro-

tection leg of MDP with expiration T (≥ t) and payoffprocess CT

t is specified by the following expression.

V ·,Tt = EQ

[∫ T

t

Λ(t, s)CTs dL·

s

∣

∣

∣

∣

Ft

]

.

– 66 –


Using the integration-by-parts formula, we have

V ·,Tt = EQ

[

Λ(t, T )CTT L·

T

∣

∣

∣

∣

Ft

]

− CTt L·

t

+ EQ

[∫ T

t

L·sΛ(t, s)(rsC

Ts ds − dCT

s )

∣

∣

∣

∣

Ft

]

.

For further calculation, we assume the followings.

Assumption 1

( i ) The default-free interest rate rt and the particular

downgrade count L·t are independent under Q.

(ii) Specify the protection payoff CTt by Z(t, T )ϕ(t, T ),

where ϕ(t, T ) is an (Ft)-adapted process defined by

ϕ(t, T ) :=

∫ T

t

EQ[hu|Ft]du,

and the process ht follows under Q

dhu = α(β − ht)dt + σhdWht , h0 > 0,

where α, β and σh are positive constants and Wht is

a (Q, (Ft))-standard Brownian motion that is inde-

pendent of rt, L·t and Z(t, T ).

Just remark that we can show that CTt = Z(t, T )ϕ(t,

T ) is viewed as an approximate difference of the priceof corporate zero-coupon bond between before and afterdowngrade.

It is easy to see that for s ≥ t

EQ[hs|Ft] = (ht − β)e−α(s−t) + β. (3)

Hence

ϕ(t, T ) =ht − β

α

(

1 − e−α(T−t))

+ β(T − t).

At last, we achieve

dCTt = ϕ(t, T )dZ(t, T ) + Z(t, T )dϕ(t, T )

= (rtCTt − Z(t, T )ht)dt + (martingale term).

Thanks to Assumption 1 and the trivial consequencethat CT

T ≡ 0 and µC(s, T ) = rsCTs − Z(s, T )hs, we can

eventually achieve

V ·,Tt = −CT

t L·t + Z(t, T )

∫ T

t

EQ[L·s|Ft]E

Q[hs|Ft]ds.

The conditional expectation EQ[hs|Ft] is given by (3),so the remaining issue to solve is how to computeEQ[L·

s|Ft].

4. Numerical example

In this section, we focus on numerical computation ofthe expected downgrade count E[L2

t ] of Gr.A categoryunder the physical measure P . Although we must usethe pricing measure Q for MDP valuation, we dare tocalculate the expected downgrade count under originalprobability P because the parameters seen in Table 1 areactually estimated from the historical downgrade recordsin the Japanese market. Refer to [9] for parameter esti-mation based on the historical data.

In addition, set η1 = 1.98, η2 = 1.6, η3 = 1.27.

Table 1. The maximum likelihood estimates of the parametersof the downgrade intensities (1).

X10

κ1 c1 ξ1,1 ξ1,2 ξ1,3

19.11 4.08 3.18 1.51 0.00 0.00

X20

κ2 c2 ξ2,1 ξ2,2 ξ2,3

42.09 3.26 3.17 1.17 1.00 0.82

X30

κ3 c3 ξ3,1 ξ3,2 ξ3,3

24.47 4.34 1.01 0.38 0.44 1.22

0 1 2 3 4 5

050

100

150

200250

300

350

t

E[L

t2]

κ1=2.45κ1=2.86κ1=3.26κ1=3.67κ1=4.08κ1=4.9

Fig. 2. E[L2t] for different values of κ1.

A version of Corollary A.3. of [2] implies that

E[L2t ] = A(0, t) + B(0, t) ·

t(X1

0 ,X20 ,X3

0 , 0, 0, 0), (4)

where B(0, t) and A(0, t) are obtained as below.The deterministic function B(0, t) is specified by the

product of the exponential mapping of the 6× 6-matrixH:

η1ξ1,1 − κ1 η1ξ2,1 η1ξ3,1 1 0 0η2ξ1,2 η2ξ2,2 − κ2 η2ξ3,2 0 1 0η3ξ1,3 η3ξ2,3 η3ξ3,3 − κ3 0 0 1

(The components in the last three rowsare all zero.)

and ~e5 :=t(0, 0, 0, 0, 1, 0).

This exponential mapping exp(tH) can be numericallycalculated by using Runge-Kutta method. On the otherhand, we can represent A(0, t) as follows.

A(0, t) =

∫ t

0

(κ1c1, κ2c2, κ3c3, 0, 0, 0) · (exp(uH)~e5)du.

We indeed compute A(0, t) by a simple numerical inte-gration to obtain E[L2

t ] via (4).At last, we display some numerical results of compar-

ative analysis for some model parameters that are thelong mean reverting speeds κ1 and κ2 as well as the mu-tually exciting components ξ2,1 and ξ1,2.

First, we change the value of the recursion speed κ1

of downgrade intensity for Fin. category among the esti-mate 4.08 times 0.6, 0.7, 0.8, 0.9, 1, 1.2 to show the curvesof E[L2

t ] for t ∈ [0, 5] (see Fig. 2). Since some mutuallyexciting effect from Fin. to Gr.A is recognized due to thepositive estimate ξ2,1 = 1.17, we expect that the value ofκ1 affects time evolution of E[L2

t ]. We can observe that

– 67 –


the smaller the value of κ1 is, the larger the expectationof the target downgrade counts. Small κ1 means thatthe downgrade intensity of Fin. remains relatively higheven though time passes, so we consider that defaults arelikely to happen in Fin. and they are contagious for Gr.Abecause of the positive mutually exciting effect ξ2,1.

Also, the curve shape of E[L2t ] are turned from con-

cave to convex as κ1 is decreasing. This seems to implythat the level of κ1 determines whether the downgradeintensity is asymptotically stationary or not.

Second, we change the value of the recursion speedκ2 of Gr.A to the long-term downgrade intensity amongthe estimate 3.26 times 0.6, 0.7, 0.8, 0.9, 1, 1.2 to show thecurves of E[L2

t ] (see Fig. 3). As is the same as the κ1,we can see that the smaller the value of κ2 is, the largerE[L2

t ] and that the curve shape of E[L2t ] are turned from

concave to convex as κ2 is decreasing.Third, we change the value of the mutually exciting

component ξ2,1 from 0 to 6 by 1.2 (see Fig. 4). As isexpected, we see that the larger ξ2,1 becomes, the largerE[L2

t ] is.At last, we change the value of the inverse mutually

exciting component ξ1,2 (the estimate is zero) from 0to 1 by 0.2 (see Fig. 5). As is also expected, the largerξ1,2 becomes, the more sharply E[L2

t ] increases. A littlesurprisingly, we see the curve shape of E[L2

t ] becomesconvex for relatively small value of ξ1,2. Anyway largerξ1,2 means that each downgrade in Gr.A causes a largerjump of the downgrade intensity of Fin. category, sodowngrades are more likely to occur in Fin. and afterall they are contagious for Gr.A due to the positive mu-tually exciting effect ξ2,1.

On the whole, the expected downgrade count is likelyto be quite sensitive to the model parameters. This im-plies the importance of parameter estimation for valua-tion of MDP.

Acknowledgments

This research was supported by Grant-in-Aid for Sci-entific Research (A) No. 20241038 from Japan Societyfor the Promotion of Science (JSPS). The author alsothanks the anonymous reviewer for useful comments.

References

[1] K. Giesecke and L. R. Goldberg, A top-down approach to

multi-name credit, Working paper, Stanford Univ., 2005.[2] E. Errais, K. Giesecke and L. R. Goldberg, Pricing credit from

the top down with affine point processes, Working paper,

Stanford Univ., 2006.[3] A. G. Hawkes, Spectra of some self-exciting and mutually ex-

citing point processes, Biometrika, 58 (1971), 83–90.[4] D. Lando, Credit Risk Modeling: Theory and Applications,

Princeton Univ., 2004.[5] P. J. Schonbucher, Credit Derivatives Pricing Models: Models,

Pricing and Implementation, Wiley, UK, 2003.[6] D. Duffie, J. Pan and K. Singleton, Transform analysis and

asset pricing for affine jump-diffusions, Econometrica, 68

(2000), 1343–1376.[7] K. Aonuma, An evaluation model for downgrade protection,

Jpn J. Indust. Appl. Math., 18 (2001), 627–646.[8] T. Shimizu, Downgrade protection valuation model (in

Japanese), Master thesis of Dep. of Industrial and Manage-

ment Systems Engineering, Grad. School of Creative Science

0 1 2 3 4 5

050

100

150

200250

300

350

t

E[L

t2]

κ2=1.96κ2=2.28κ2=2.61κ2=2.93κ2=3.26κ2=3.91

Fig. 3. E[L2t] for different values of κ2.

0 1 2 3 4 5

050

100

150

200250

300

350

t

E[L

t2]

ξ2,1=0ξ2,1=1.2ξ2,1=2.4ξ2,1=3.6ξ2,1=4.8ξ2,1=6

Fig. 4. E[L2t] for different values of ξ2,1.

0 1 2 3 4 5

050

100

150

200250

300

350

t

E[L

t2]

ξ1,2=0ξ1,2=0.2ξ1,2=0.4ξ1,2=0.6ξ1,2=0.8ξ1,2=1

Fig. 5. E[L2t] for different values of ξ1,2.

and Engineering, Waseda Univ., 2003.[9] H. Nakagawa, Analyses of records of credit rating transi-

tion with mutually exciting rating-change intensity model (inJapanese), submitted.

– 68 –


Differential qd algorithm for totally nonnegative

Hessenberg matrices: introduction of origin shifts

and relationship with the discrete hungry

Lotka-Volterra system

Yusaku Yamamoto1 and Takeshi Fukaya2

1 Department of Computer Science and Systems Engineering, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan

2 Department of Computational Science and Engineering, Nagoya University, Furo-cho,Chikusa-ku, Nagoya 464-8603, Japan

E-mail yamamoto cs.kobe-u.ac.jp

Received February 27, 2010, Accepted April 21, 2010

Abstract

We propose an approach for introducing the origin shift into the multiple dqd algorithm forcomputing the eigenvalues of a totally nonnegative matrix. Numerical experiments show thatthe shift speeds up the convergence while retaining the accuracy of the computed eigenvalue.

Keywords eigenvalue, totally nonnegative, Hessenberg matrix, qd algorithm, origin shift


1. Introduction

Let A be an m × m matrix. A is called totally non-

negative (TN) if all of its minors are nonnegative. TNmatrices have many applications in areas such as com-binatorics and statistics [1]. Recently, there is a growinginterest in numerical algorithms for TN matrices, andseveral algorithms for eigenvalue computation, singularvalue decomposition and linear equation solving withTN coefficient matrices have been developed [1–3].

In [2], we proposed an algorithm for computing theeigenvalues of a totally nonnegative band matrix. Thealgorithm, called the multiple dqd (differential quotient-difference) algorithm, is a natural extension of the dqdalgorithm for computing the eigenvalues of a symmetricpositive-definite tridiagonal matrix [4]. It has two fea-tures: first, it exploits the fact that TN matrices can berepresented as a product of positive bidiagonal factorsand works directly on these bidiagonal factors. Second,it preserves the total nonnegativity throughout the iter-ations. These features enable us to show that the algo-rithm can compute all the eigenvalues of a TN matrixto high relative accuracy [2]. Unfortunately, due to thestructure of the algorithm, it seemed difficult to intro-duce the origin shift to accelerate the convergence intothe multiple dqd algorithm. For this reason, the order ofconvergence of this algorithm remained only linear.

In this paper, we consider the case where A is a TNHessenberg matrix. For this type of matrices, we showthat we can introduce the origin shift while retaining theabove two features. Our preliminary numerical experi-ment shows that the resulting algorithm exhibits fasterconvergence and can compute the smallest eigenvalue tothe same accuracy as the shiftless algorithm, although

theoretical proof of high relative accuracy has yet to beestablished. We also point out the close relationship be-tween our shifted multiple dqd algorithm and the dis-crete hungry Lotka-Volterra system [5].

2. The multiple dqd algorithm

In this section, we define our target problem and re-view the unshifted multiple dqd algorithm.

Let L and Ri (i = 1, . . . ,M) be m × m lower andupper bidiagonal matrices defined by

L =

q1

1 q2

1 q3

. . .. . .

1 qm

and

Ri =

1 ei1

1 ei2

1 . . .. . . ei,m−1

1

, (1)

respectively, where qk (1 ≤ k ≤ m) and eik (1 ≤ i ≤ M ,1 ≤ k ≤ m− 1) are some positive numbers. We considerthe problem of computing the eigenvalues of a matrixdefined as the product of these bidiagonal factors:

A = LR1R2 · · ·RM . (2)

A is a Hessenberg matrix with upper bandwidth M . Fur-thermore, A is totally nonnegative because it is a prod-uct of positive bidiagonal factors [1]. Then, from the gen-eral theory of the TN matrix, we know that all of the

– 69 –

JSIAM Letters Vol. 2 (2010) pp.69–72 Yusaku Yamamoto et al.

eigenvalues of A are simple, real and positive.The multiple dqd algorithm for A is a variant of the

LR algorithm that performs the LR step solely on thebidiagonal factors. Let M = 3 and consider applying anLR step to A. Noting that (2) already gives the LR de-composition of A, we can compute the next iterate A bymaking the product R1R2R3L and computing its LR de-composition. In the multiple dqd algorithm, we do thisby repeating the LR decomposition of a product of upperand lower bidiagonal factors as follows:

A = R1R2R3L

= R1R2L(1)R3

= R1L(2)R2R3

= L(3)R1R2R3 ≡ LR1R2R3. (3)

Here, the products such as R1R2R3L are not computedexplicitly and the LR transformations such as R3L =L(1)R3 are done with the dqd algorithm [4]. This algo-rithm has the following two features:

( i ) It works on the bidiagonal factors directly and neverforms their products explicitly.

(ii) Positivity of the bidiagonal factors is preserved dueto the characteristics of the dqd algorithm.

By combining these features with mixed error analysis ofthe dqd algorithm [4] and relative perturbation theoryon the eigenvalues of a TN matrix [1], one can showthat the multiple dqd algorithm can compute all theeigenvalues of a TN matrix to high relative accuracy [2].

3. Introduction of the origin shift

3.1 The shifted LR algorithm for a general matrix

As a preparation for introducing the origin shift intothe multiple dqd algorithm, we first explain the shiftedLR algorithm. Let s be some properly chosen shift anddenote the m × m identity matrix by I. Then, one stepof the shifted LR algorithm can be written as follows:

A − sI = L(0)R(0), (4)

A = R(0)L(0) + sI(

= (L(0))−1AL(0))

. (5)

Here, we define the LU decomposition in (4) so that thediagonals of R(0) are 1’s, in accordance with (1).

Now, assume that A is given in the factored form asA = LR and we want A also in the factored form A =LR. Then, the above formulae can be rewritten as

LR − sI = L(0)R(0), (6)

LR = (L(0))−1LRL(0). (7)

This shows that the computation of L and R from L andR can be done in the following two steps.

• Obtain the lower triangular factor L(0) from the LRdecomposition of LR − sI.

• Apply similarity transformation by L(0) to LR.

Based on this observation, we introduce the origin shiftinto the multiple dqd algorithm in the next subsection.

3.2 Introduction of the origin shift into the multiple dqd

algorithm

Assume that A is given as in (2). In considering theshifted version of the multiple dqd algorithm, it would benatural to require that the next iterate A is also given asa product of bidiagonal factors:

A = LR1R2 · · · RM , (8)

where L is a lower bidiagonal matrix whose lower subdi-agonal elements are all 1’s and Ri is a unit upper bidi-agonal matrix. We denote the diagonal elements of L byqk and the upper subdiagonal elements of Ri by eik.

We want to design the shifted algorithm in such away that the conditions (i) and (ii) explained in Section2 are satisfied. This is because then we will be able toperform relative error analysis of the algorithm as in theunshifted case. In the present subsection, we concentrateon constructing the shifted algorithm so that the condi-tion (i) is satisfied. Positivity of the variables will be dis-cussed in the next subsection. Also, in this subsection,we assume that breakdown of the algorithm, such as di-vision by zero, does not occur. A sufficient condition forthis will be given in the next subsection.

Inserting (2) and (8) into (6) and (7), we have

LR1 · · ·RM − sI = L(0)R(0), (9)

LR1 · · · RM = (L(0))−1LR1 · · ·RML(0). (10)

In (9), L(0) is a lower bidiagonal matrix whose subdi-agonal elements are all 1’s. Now, suppose that we knowL(0). Then we can compute L, R1, . . . , RM by applyingthe LR transformations repeatedly to the right-hand sideof (10) as follows:

RML(0) = L(1)RM , (11)

RM−1L(1) = L(2)RM−1, (12)

...

R1L(M−1) = L(M)R1, (13)

LL(M) = L(0)L. (14)

Here, the last equation is obtained by rewriting the equa-tion L = (L(0))−1LL(M). Eqs. (11) through (14) showthat once we know L(0), all the subsequent computationscan be done by working only on the bidiagonal factors.The remaining problem is how to compute L(0). From(9), it seems that one needs to form LR1 · · ·RM − sIexplicitly and compute its LR decomposition. But thisapproach would violate the condition (i).

Fortunately, we can compute L(0) simultaneously withL, R1, . . . , RM by rearranging the computations. To seethis, first compare the (1, 1) elements of both sides of (9)to obtain

q(0)1 = q1 − s. (15)

Next, by writing down (11) through (13) element by el-ement, we have the following set of equalities:

eM,k−1 = eM,k−1q(0)k

q(1)k−1

(2 ≤ k ≤ m), (16)

– 70 –


q(1)k = eM,k + q

(0)k − eM,k−1 (1 ≤ k ≤ m), (17)

eM−1,k−1 = eM−1,k−1q(1)k

q(2)k−1

(2 ≤ k ≤ m), (18)

q(2)k = eM−1,k + q

(1)k − eM−1,k−1 (1 ≤ k ≤ m), (19)

...

e1,k−1 = e1,k−1q(M−1)k

q(M)k−1

(2 ≤ k ≤ m), (20)

q(M)k = e1,k + q

(M−1)k − e1,k−1 (1 ≤ k ≤ m), (21)

where we adopted the convention that

eM,0 = eM−1,0 = · · · = e1,0 = 0, (22)

eM,m = eM−1,m = · · · = e1,m = 0. (23)

Finally, we write down (14) element-wise to obtain

qk = qk

q(M)k

q(0)k

(1 ≤ k ≤ m), (24)

q(0)k+1 = qk+1 + q

(M)k − qk (1 ≤ k ≤ m − 1). (25)

Eqs. (15) through (25) determine L(0), L, R1, . . . , RM

completely. Indeed, starting from q(0)1 given by (15) and

using (17), (19), (21), (24) and (25), we can compute

q(1)1 , q

(2)1 , . . . , q

(M)1 , q1 and q

(0)2 in this order (Note that

we need not compute ei,k−1 when k = 1). Once q(0)2 has

been computed, we can proceed to the second round;

we compute eM,1, q(1)2 , eM−1,1, q

(2)2 , . . . , e1,1, q

(M)2 , q2 and

q(0)3 in this order using (16) through (25). By continuing

this process, all the elements of L, R1, . . . , RM , alongwith the elements of L(0), can be computed. We call thisprocess the shifted multiple dqd algorithm. Obviously,this algorithm obviates the need to form the productLR1R2 · · ·RM explicitly and works only on the bidiago-nal factors. Thus the condition (i) is satisfied.

3.3 The condition for positivity of the variables

In the previous subsection, we derived the equations(15) through (25) of the shifted multiple dqd algorithmwithout asking whether the variables are positive or not.We also assumed that breakdown of the algorithm doesnot occur. In this subsection, we show that all the vari-ables remain positive and breakdown does not occur ifthe shift satisfies a certain condition. More specifically,we have the following theorem.

Theorem 1 Let A be a TN matrix given by (2) and

denote its smallest eigenvalue by λm. If s < λm, then

breakdown does not occur in one step of the shifted mul-

tiple dqd algorithm. Moreover, all the elements of L(i)

(0 ≤ i ≤ M), Ri (1 ≤ i ≤ M) and L are positive.

Proof We first show the positivity of L(0). Let C ≡ A−sI and denote the k × k leading principal submatricesof A and C by A1:k

1:k and C1:k1:k , respectively. Since A is

a TN matrix, its eigenvalues are all real and positive.Moreover, from the interlace inequality concerning theeigenvalues of a leading principal submatrix of a TN ma-trix [6], the smallest eigenvalue of A1:k

1:k is larger than orequal to the smallest eigenvalue of A. Hence all the eigen-

values of C1:k1:k = A1:k

1:k − sIk are positive and accordingly∣

∣C1:k1:k

∣

∣ > 0 (1 ≤ k ≤ m). In this case, the left-hand sideof (9) is LR-decomposable and the diagonal elements ofthe lower triangular factor are given as

q(0)k =

∣

∣C1:k1:k

∣

∣

∣

∣C1:k−11:k−1

∣

∣

> 0, (26)

where C1:01:0 ≡ 1. This shows the positivity of L(0).

Next, we show the positivity of L(i) and Ri (1 ≤ i ≤M). First, observe that L(1) and RM are computed fromL(0) and RM by the LR transformation (11). It is wellknown that the LR transformation can be done with thedifferential qd algorithm that uses only addition, multi-plication and division. Hence if the input bidiagonal fac-tors are positive, there is no breakdown and the outputbidiagonal factors are also positive. Since both L(0) andRM are positive in (11), we know that both L(1) andRM are positive. By repeating this reasoning, we knowthat L(2), . . . , L(M) and RM−1, . . . , R1 are positive.

Finally, since L, L(0) and L(M) are positive, we knowfrom (24) that L is also positive. Also, the positivityof L(0), L(1), . . . , L(M) guarantees the absence of break-down, because the denominators on the right-hand sidesof (16), (18), (20) and (24) are positive.

(QED)

From Theorem 1, we know that the shifted multipledqd algorithm also satisfies the condition (ii) if s < λm.

4. Relationship with the discrete hungry

Lotka-Volterra system

The shifted multiple dqd algorithm given in the pre-vious section has a close relationship with the discretehungry Lotka-Volterra system. To see this, we renamethe variables in (15) through (25) as follows:

qk → U(n)(k−1)(M+1)+1, (27)

eM−i+1,k → U(n)(k−1)(M+1)+i+1, (28)

qk → U(n+1)(k−1)(M+1)+1, (29)

eM−i+1,k → U(n+1)(k−1)(M+1)+i+1, (30)

q(0)k → V

(n)(k−1)(M+1)+1, (31)

q(i)k → V

(n)(k−1)(M+1)+i+1, (32)

s → −1

δ(n). (33)

Then (16) through (21), (24) and (25) can be rewrittenas follows:

U(n+1)(k−2)(M+1)+2 =

U(n)(k−2)(M+1)+2V

(n)(k−1)(M+1)+1

V(n)(k−2)(M+1)+2

, (34)

V(n)(k−1)(M+1)+2 = U

(n)(k−1)(M+1)+2 + V

(n)(k−1)(M+1)+1

− U(n+1)(k−2)(M+1)+2, (35)

U(n+1)(k−2)(M+1)+3 =

U(n)(k−2)(M+1)+3V

(n)(k−1)(M+1)+2

V(n)(k−2)(M+1)+3

, (36)

– 71 –


V(n)(k−1)(M+1)+3 = U

(n)(k−1)(M+1)+3 + V

(n)(k−1)(M+1)+2

− U(n+1)(k−2)(M+1)+3, (37)

...

U(n+1)(k−1)(M+1) =

U(n)(k−1)(M+1)V

(n)(k−1)(M+1)+M

V(n)(k−1)(M+1)

, (38)

V(n)k(M+1) = U

(n)k(M+1) + V

(n)(k−1)(M+1)+M

− U(n+1)(k−1)(M+1),

(39)

U(n+1)(k−1)(M+1)+1 =

U(n)(k−1)(M+1)+1V

(n)k(M+1)

V(n)(k−1)(M+1)+1

, (40)

V(n)k(M+1)+1 = U

(n)k(M+1)+1 + V

(n)k(M+1) − U

(n+1)(k−1)(M+1)+1.

(41)

Here, the ranges of k in (34) through (41) are the same asthose in (16) through (21), (24) and (25). Correspondingto (15), (22) and (23), we make the following definitions:

V(n)1 = U

(n)1 +

1

δ(n), (42)

U(n+1)−M+1 = · · · = U

(n+1)0 = 0, (43)

U(n)(m−1)(M+1)+2 = · · · = U

(n)m(M+1) = 0. (44)

Eqs. (34) through (41) can be written succinctly as

V(n)k = U

(n)k + V

(n)k−1 −

U(n)k−M−1V

(n)k−1

V(n)k−M−1

, (45)

U(n+1)k =

U(n)k V

(n)k+M

V(n)k

. (46)

where 1 ≤ k ≤ m(M + 1) in (45) and 1 ≤ k ≤ (m −1)(M + 1) + 1 in (46). Moreover, (42) and (43) can betransformed into the following conditions:

V(n)−M = V

(n)−M+1 = · · · = V

(n)0 =

1

δ(n), (47)

U(n)−M = U

(n)−M+1 = · · · = U

(n)0 = 0. (48)

Eqs. (45) through (48) are nothing but the discrete hun-

gry Lotka-Volterra (dhLV) system [5], if we set V(n)k =

δ(n)V(n)k . However, in the context of the dhLV system,

the parameter δ(n) has the meaning of step size and mustbe positive. On the other hand, in the multiple dqd al-gorithm, we want to set δ(n) negative (that is, set s posi-tive) to accelerate the convergence. We showed that pos-itivity of the bidiagonal factors is retained in this case ifs is smaller than the smallest eigenvalue of A.


To confirm our analysis in Section 3, we performed apreliminary numerical experiment. We set m = 10 andM = 3 and generated eik and qk using random num-bers in (0, 1]. For this matrix, the smallest eigenvalue isλm = 1.24375120694785×10−4 [2]. We applied the mul-tiple dqd algorithm with and without the origin shift tothis matrix. Ideally, the shift s should be updated at eachiteration. However, we fixed s to one of 0.9λm, 0.99λm,

10−10

10−20

10−30

ei,m−1(0)

unshifted

s = 0.9

s = 0.99

s = 0.999

m

m

m

1 2 3 4 5 6Iterations

1

Σ

λ

λ

λ

Fig. 1. Convergence ofP

3

i=1e(0)

i,m−1.

or 0.999λm here, because we have not devised an ef-ficient shifting strategy yet. So the convergence of theshifted algorithm should still be linear, though it shouldbe faster than that of the unshifted algorithm.

In Fig. 1, we plotted the sum of ei,m−1 (1 ≤ i ≤ 3) asa function of the iteration number. As can be seen, thisquantity decays faster as s approaches λm, showing theeffect of the shift in accelerating the convergence.

In the shifted algorithm, eik and qk remained positivethroughout the computation. This confirms the predic-tion of Theorem 1. Also, the computed smallest eigen-value agreed to the correct value to 15 decimal digits,indicating relative accuracy of the shifted algorithm.

6. Conclusion

In this paper, we showed that the origin shift can beincorporated into the multiple dqd algorithm when thetarget matrix is a TN Hessenberg matrix. It remains asthe subject of our future research to prove the relativeaccuracy of the algorithm theoretically.

Acknowledgments

We are grateful to the anonymous referee, whose com-ments helped us to improve the quality of this paper.We also would like to thank Prof. Masashi Iwasaki,Prof. Satoshi Tsujimoto, Prof. Yoshimasa Nakamura,Ms. Akiko Fukuda and Mr. Kensuke Aishima for valu-able discussion. This work is partially supported bythe Ministry of Education, Science, Sports and Culture,Grant-in-aid for Scientific Research.

References

[1] P. Koev, Accurate eigenvalues and SVDs of totally nonnega-tive matrices, SIAM J. Matrix Anal. Appl., 27 (2005), 1–23.

[2] Y. Yamamoto and T. Fukaya, Differential qd algorithm for to-tally nonnegative band matrices: convergence properties anderror analysis, JSIAM Letters, 1 (2009), 56–59.

[3] M. Gasca and J. M. Pena, Total positivity and Neville elimi-

nation, Lin. Alg. Appl., 165 (1992), 25–44.[4] K. V. Fernando and B. N. Parlett, Accurate singular values

and differential qd algorithms, Numer.Math., 67 (1994), 191–

229.[5] A.Fukuda, E. Ishiwata, M.Iwasaki and Y.Nakamura, The dis-

crete hungry Lotka-Volterra system and a new algorithm forcomputing matrix eigenvalues, Inverse Problems, 25 (2009),

015007.[6] C. Li and R. Mathias, Interlacing inequalities for totally non-

negative matrices, Lin. Alg. Appl., 341 (2002), 35–44.

– 72 –


Solutions of Sakaki-Kakei equations of type 3, 5 and 6

Koichi Kondo1

1 Graduate School of Engineering, Doshisha University, 1-3 Tatara-Miyakodani, Kyotanabe,Kyoto 610-0394, Japan


Received March 31, 2010, Accepted April 30, 2010

Abstract

The purpose of this paper is to obtain general solutions of Sakaki-Kakei equations of type3, 5 and 6. We first obtain general solution of two dimensional discrete dynamical systemassociated with arithmetic and harmonic mean through a conjugacy of the iteration map. Wenext show that the arithmetic and harmonic mean system is semiconjugate to Sakaki-Kakeiequations of type 3, 5 and 6 under some conditions. From those results, we obtain their generalsolutions. We finally clarify behaviors of the solutions.

Keywords Sakaki-Kakei equation, arithmetic and harmonic mean, conjugacy of map


1. Introduction

In [1], two dimensional discrete dynamical system as-sociated with arithmetic and harmonic mean (AHM) isconsidered. It is shown that some particular solutionsof AHM are obtained by hyperbolic and trigonometricfunctions, and that AHM is solvable chaotic system un-der some conditions with respect to initial values. In [2],the higher order discrete systems of AHM are presented.The general solutions of the systems are obtained bytridiagonal determinants, and the Lyapunov exponentsof the systems are obtained through determinant solu-tions. In [3], Sakaki and Kakei focused on the fact thatthe conserved quantity of AHM can be obtained by anidentity of hypergeometric function. Then, they derivedtwelve types of two dimensional discrete systems fromother identities of hypergeometric function. The derivedsystems have the conserved quantities in terms of hy-pergeometric function, however, their solutions are notdiscussed. The purpose of this paper is to obtain generalsolutions of Sakaki-Kakei equations of type 3, 5 and 6.Here, the types of Sakaki-Kakei equations are numberedin order of appearance in their paper. For simplicity, theSakaki-Kakei equations of type 3, 5 and 6 are named asSK3, SK5 and SK6, respectively.

This paper is organized as follows. In Section 2, wefirst derive a conjugacy of iteration map of AHM fromone dimension reduction, and obtain general solution ofAHM through conjugacy of map. In Section 3, we nextshow that AHM is semiconjugate to SK3, SK5 and SK6under some conditions. From the solution of AHM, weobtain general solutions of SK3, SK5 and SK6. In Section4, we clarify behaviors of their solutions. In Section 5,some conclusion are mentioned.

2. Conjugacy of AHM

In [1], the equation of AHM is given by

an+1 =an + bn

2, bn+1 =

2anbnan + bn

(1)

for n = 0, 1, 2, . . . and a0, b0 ∈ R.In [3], the conserved quantity of (1) is derived from an

identity of hypergeometric function,

2F1(α, β, γ; x) = 1 +∞∑

n=1

(α)n(β)n

(γ)nn!xn, |x| < 1, (2)

where (α)n =∏n−1

j=0 (α+ j) for n = 1, 2, 3, . . . . Let

In =1

an2F1

(

1

2, 1, 1; 1 −

bnan

)

. (3)

Then, it follows that In = In+1 for n = 0, 1, 2, . . . .Hence, In is the conserved quantity of (1). Moreover,(3) can be rewritten by virtue of the integral expression,

2F1

(

1

2, 1, 1; x

)

=Γ(1)

Γ(1/2)2

∫ ∞

0

dt

(t+ 1 − x)√t. (4)

Since the integral in (4) is integrable, it holds that

2F1

(

1

2, 1, 1; x

)

=1

√1 − x

. (5)

From (3) and (5), it follows that In = 1/√anbn.

In [1], AHM is reduced to one dimensional system byusing conserved quantity In = 1/(In)2 = anbn. Let c =a0b0. Then, In = I0 yields anbn = c for n = 0, 1, 2, . . . .By eliminating bn in (1) with bn = c/an, AHM is reducedto

an+1 =1

2

(

an +c

an

)

, c ∈ R\0, n = 0, 1, . . . . (6)

Let us denote the iteration function of (6) as

Φ(c;x) =1

2

(

x+c

x

)

, c ∈ R\0. (7)

Then, (6) is expressed as an+1 = Φ(an). Note here thatΦ is Newton iteration function Φ(x) = x − f(x)/f ′(x)for f(x) = x2 − c (cf. [2]).

If maps Ψ : X → X, ψ : X → Y , σ : Y → Y for setsX, Y satisfy Ψ = ψ−1 σ ψ, and ψ is homeomorphic,

– 73 –

JSIAM Letters Vol. 2 (2010) pp.73–76 Koichi Kondo

namely, one-to-one, onto, and continuous function withcontinuous inverse, then Ψ is dynamically equivalent toσ. We say that Ψ : X → X is conjugate to σ : Y → Y ,and ψ is a conjugacy of Ψ (cf. [4, pp. 108–109]).

In [1], [4, p. 172], the map Φ(c;x) for c < 0 is conjugateto the Bernoulli shift, which is well known as chaoticdynamical system. Thus, it turns out that Φ is chaoticsystem if c < 0.

In this paper, we derive another conjugacy of Φ inorder to obtain general solution for any c ∈ R\0. Letus introduce a function φ defined by

φ(c;x) =x−

√c

x+√c, c ∈ R\0. (8)

Note here that square root in (8) is not single-valuedfunction if its argument value is negative. For simplicityof discussion, all of square roots in this paper are treatedas single-valued function such that

√c = i

√−c if c < 0.

Here, i is imaginary unit. From (8), we have

φ−1(c;x) =√c1 + x

1 − x. (9)

Let Q(x) = x2. From (7), (8) and (9), it formally holdsthat

Φ = φ−1 Q φ. (10)

Suppose that c > 0. Let R∗ = R ∪ ∞. From (8),

(9), the map φ : R∗ → R

∗ with satisfying φ(∞) = 1,φ(−

√c) = ∞ is obviously homeomorphic. It holds that

Q : R∗ → R

∗. Hence, the map Φ : R∗ → R

∗ is conjugateto Q : R

∗ → R∗. Suppose that c < 0. Let us denote

the unit circle in C as S = z ∈ C | |z| = 1. From (8)and

√c = i

√−c, it follows that x ∈ R

∗, φ(x) = eiθ ∈S, θ = −2 tan−1(

√−c/x). The map φ : R

∗ → S withsatisfying φ(∞) = 1 is continuous bijection. From (9), itfollows that φ−1(eiθ) = −

√−c sin(θ)/(1 − cos(θ)). The

map φ−1 is continuous. Hence, the map φ : R∗ → S is

homeomorphic. It holds that Q : S → S. Thus, the mapΦ : R

∗ → R∗ is conjugate to Q : S → S.

Let us denote the imaginary axis in C as T = iy ∈ C |y ∈ R ∪ ∞. In order to obtain solutions of SK3, SK5and SK6, we show a conjugacy of map Φ : T → T .Similar to above discussion, it follows that φ : T → Sis homeomorphic if c > 0, and that φ : T → R

∗ ishomeomorphic if c < 0. Recall that Q : S → S andQ : R

∗ → R∗. Hence, the map Φ : T → T is conjugate

to Q : S → S if c > 0, or Q : R∗ → R

∗ if c < 0. Thus,we have the following theorem.

Theorem 1 The map Φ : R∗ → R

∗ is conjugate to

Q : R∗ → R

∗ if c > 0, or Q : S → S if c < 0. The

map Φ : T → T is conjugate to Q : S → S if c > 0, or

Q : R∗ → R

∗ if c < 0.

In the case where c = −1, this fact was first provedby Cayley in 1879. In the case where c = ±1, it is shownin [4, pp. 274–275].

It follows from an+1 = Φ(an) and (10) that φ(an+1) =Q(φ(an)). Let zn = φ(an). Recall that Q(x) = x2. FromTheorem 1, it turns out that the system an+1 = Φ(an)is equivalent to zn+1 = (zn)2 if a0 ∈ R

∗ or a0 ∈ T . Fromz0 = φ(a0), an = φ−1(zn), and the solution zn = (z0)

2n

of zn+1 = (zn)2, we have the following theorem.

Theorem 2 Suppose that c ∈ R\0. If a0 ∈ R∗ or

a0 ∈ T , then the general solution of an+1 = Φ(an) is

an = (φ−1 x2n

φ)(a0), n = 0, 1, 2, . . . . (11)

Recall that bn = c/an, c = a0b0. From (8), (9) andTheorem 2, we have the following theorem.

Theorem 3 Let c = a0b0, λ1 = a0 +√c, and λ2 =

a0 −√c. The general solution of AHM (1) is

an =√cλ2n

1 + λ2n

2

λ2n

1 − λ2n

2

, bn =√cλ2n

1 − λ2n

2

λ2n

1 + λ2n

2

(12)

for n = 0, 1, 2, . . . . Here, a0, b0 are both real numbers, or

both pure imaginary numbers, which satisfy a0b0 6= 0 and

a0+b0 6= 0. If a0 = 0, b0 6= 0, AHM has singular solution

an = b0/2n, bn = 0 for n = 1, 2, 3, . . . . If a0 6= 0, b0 = 0,

AHM has singular solution an = a0/2n, bn = 0 for n =

1, 2, 3, . . . . If a0 + b0 = 0, AHM does not have solution.

The solution (12) can be also obtained through the so-lution in [2], which is expressed by tridiagonal determi-nant. The tridiagonal determinant satisfies linear differ-ence equation of the second order. Solving the equationand rewriting its solution, we can obtain (12).

3. Solutions of SK3, SK5 and SK6

In [3], the equation of SK3 is given by

an+1 =(an + bn)2

an − bn, bn+1 =

4anbnan − bn

, (13)

which has the conserved quantity,

I(3)n =

1

an2F1

(

1

2,3

4,3

4;bnan

)

. (14)

Note here that I(3)n in [3] is erratum. The equation of

SK5 is given by

an+1 =(2an − bn)2

4an

, bn+1 =bn

2

4an

, (15)

which has the conserved quantity,

I(5)n =

1√an

2F1

(

1

2, 1, 1;

bnan

)

. (16)

The equation of SK6 is given by

an+1 =4an(an − bn)2

(2an − bn)2, bn+1 =

−bn2(an − bn)

(2an − bn)2, (17)

which has same conserved quantity I(6)n as (16).

Along the line similar to AHM, we first rewrite theconserved quantities of SK3, SK5 and SK6. It followsfrom (2) that

2F1

(

1

2,3

4,3

4; x

)

= 1 +

∞∑

n=1

(

12

)

n

n!xn =

1√

1 − x. (18)

Substituting (18) into (14) and substituting (5) into (16),we have the following theorem.

Theorem 4 All of the conserved quantities of SK3,

SK5 and SK6 are I(3)n = I

(5)n = I

(6)n = 1/

√an − bn.

From Theorem 4, we next reduce SK3, SK5 and SK6to one dimensional discrete systems by using In =

– 74 –


(I(3)n )−2 = (I

(5)n )−2 = (I

(6)n )−2 = an−bn. Let c = a0−b0.

Then, In = I0 yields an − bn = c for n = 0, 1, 2, . . . . Byeliminating bn in (13), (15) and (17) with bn = an− c, weobtain one dimensional systems of SK3, SK5 and SK6,respectively, as following theorem.

Theorem 5 If c = a0 − b0 6= 0, SK3 (13) is reduced to

an+1 =1

c(2an − c)2, n = 0, 1, 2, . . . . (19)

If c = a0 − b0 6= 0, SK5 (15) is reduced to

an+1 =(an + c)2

4an

, n = 0, 1, 2, . . . . (20)

If c = a0 − b0 6= 0, SK6 (17) is reduced to

an+1 =4c2an

(an + c)2, n = 0, 1, 2, . . . . (21)

Let us denote the iteration functions Φ3(c;x), Φ5(c;x)and Φ6(c;x) of (19), (20) and (21) as

Φ3 =(2x− c)2

c, Φ5 =

(x+ c)2

4x, Φ6 =

4c2x

(x+ c)2(22)

for c ∈ R\0, respectively. Then, (19), (20) and (21)are expressed as an+1 = Φ3(an), an+1 = Φ5(an) andan+1 = Φ6(an), respectively.

If maps Ψ : X → X, ψ : X → Y , σ : Y → Y for setsX, Y satisfy ψ Ψ = σ ψ, and ψ is continuous, onto,and at most m-to-one, then we say that Ψ : X → X issemiconjugate to σ : Y → Y , and ψ is a semiconjugacyof Ψ (cf. [4, p. 125]).

Let us define the functions η3, η5 and η6 by

η3(c;x) =cx2

x2 − 1, η5(x) = x2, η6(x) =

1

x2. (23)

From (7), (22) and (23), it formally holds that

η3 Φ = Φ3 η3 if c = 1, (24)

η5 Φ = Φ5 η5 if c = c, (25)

η6 Φ = Φ6 η6 if c =1

c. (26)

Let us denote that R+ = x ∈ R |x ≥ 0∪∞, R

− =x ∈ R |x ≤ 0 ∪ ∞, U1 = x ∈ R | |x| ≥ 1 ∪ ∞,D1 = x ∈ R | 1 ≤ x < +∞ ∪ ∞, and D2 = x ∈R | 0 ≤ x ≤ 1. Let us define cD = cx ∈ R

∗ |x ∈ D forc ∈ R\0 and a set D. From (23), it follows that η3 :U1 → cD1, η3 : T → cD2, η5 : R

∗ → R+, η5 : T → R

−,η6 : R

∗ → R+, and η6 : T → R

− are continuous, onto,and at almost two-to-one maps. From (7), (22), it holdsthat Φ : R

∗ → R∗, Φ : T → T , Φ : U1 → U1 if c = 1,

Φ3 : cD1 → cD1, Φ3 : cD2 → cD2, Φ5 : R+ → R

+,Φ5 : R

− → R−, Φ6 : R

+ → R+, and Φ6 : R

− → R−.

Thus, we have the following theorems.

Theorem 6 If c = 1, c ∈ R\0, the maps Φ : U1 →U1 and Φ : T → T are semiconjugate to Φ3 : cD1 → cD1

and Φ3 : cD2 → cD2, respectively.

Theorem 7 If c = c ∈ R\0, the maps Φ : R∗ → R

∗

and Φ : T → T are semiconjugate to Φ5 : R+ → R

+ and

Φ5 : R− → R

−, respectively.

Theorem 8 If c = 1/c ∈ R\0, the maps Φ : R∗ →

R∗ and Φ : T → T are semiconjugate to Φ6 : R

+ → R+

and Φ6 : R− → R

−, respectively.

Let an be a solution of AHM for a0 ∈ R∗ or a0 ∈ T .

Namely, an satisfies an+1 = Φ(an) for n = 0, 1, 2, . . . .From the map ηj for j = 3, 5, 6, we derive ηj(an+1) =ηj(Φ(an)) for j = 3, 5, 6, respectively. Suppose that Φsatisfies one of the conditions of Theorems 6, 7 and 8.From (24), (25) and (26), it follows that

ηj(an+1) = Φj(ηj(an)). (27)

Let an = ηj(an). Then, we have an+1 = Φj(an). Namely,an = η3(an), an = η5(an) and an = η6(an) for n =0, 1, 2, . . . are solutions of SK3, SK5 and SK6, respec-tively. The solution an of AHM is given by (11), wherean, a0 are replaced with an, a0, respectively, and c in φtakes a value c = 1 for SK3, c = c for SK5, and c = 1/cfor SK6. Recall that bn = an − c, c = a0 − b0. All ofthe solutions bn for SK3, SK5 and SK6 are given bybn = an − a0 + b0 for n = 0, 1, 2, . . . .

The initial value a0 of AHM is determined from a0

such that ηj(a0) = a0. Though a0 can take two real orpure imaginary values, each of an = ηj(an) given by achosen a0 becomes a solution of an+1 = Φj(an). Obvi-ously, the systems (13), (15) and (17) generate uniquesolutions, so that both of solutions are equivalent. Forsimplicity, we choose a0 as a0 =

√

a0/(a0 − c) for SK3,a0 =

√a0 for SK5, and a0 = 1/

√a0 for SK6.

From Theorems 7 and 8, there exist solutions an =η5(an), an = η6(an) for a0 ∈ R

∗ = R+ ∪ R

−. FromTheorem 6, there exists solution an = η3(an) for a0 ∈cD1 ∪ cD2. Moreover, we can obtain solution of SK3for a0 ∈ cR− as follows. Let U2 = x ∈ R | |x| ≤ 1.From (7), (22) and (23), it holds that η3 : U2 → cR−,Φ3 : cR− → cD1, and Φ : U2 → U1 if c = 1. Notehere that the map Φ : U2 → U1 is not semiconjugateto Φ3 : cR− → cD1. If a0 ∈ U2, η3(a0) = a0 ∈ cR−,then it follows that Φ(a0) = a1 ∈ U1, η3(a1) ∈ cD1, andΦ3(a0) = Φ3(η3(a0)) ∈ cD1. Hence, it holds that (27)for n = 0. We have a1 = η3(a1). Since a1 ∈ cD1, it holdsthat (27) for n = 1, 2, 3, . . . by Theorem 6. There existssolution an = η3(an) for a0 ∈ cR−. The domain of initialvalue of SK3 is R

∗ = cD1 ∪ cD2 ∪ cR−.

Thus, we have the following theorems.

Theorem 9 Let c = a0 − b0, λ3 = (√a0 +

√b0)/

√c,

and λ4 = (√a0 −

√b0)/

√c. The general solution of SK3

(13) is

an =c

4

(

λ2n

3 + λ2n

4

)2

, bn =c

4

(

λ2n

3 − λ2n

4

)2

(28)

for n = 0, 1, 2, . . . and real initial values a0, b0 such that

a0 6= b0. If a0 = b0, SK3 does not have solution.

Theorem 10 Let c = a0 − b0, λ5 =√a0 +

√c, and

λ6 =√a0 −

√c. The general solution of SK5 (15) is

an = c

(

λ2n

5 + λ2n

6

λ2n

5 − λ2n

6

)2

, bn = an − a0 + b0 (29)

for n = 0, 1, 2, . . . and real initial values a0, b0 such

that a0 6= 0, a0 6= b0 and 2a0 6= b0. If a0 = b0, SK5

has singular solution an = a0/4n, bn = b0/4

n for n =0, 1, 2, . . . . If a0 = 0 or 2a0 − b0 = 0, SK5 does not have

– 75 –


solution.

Theorem 11 Let c = a0 − b0, λ7 =√c +

√a0, and

λ8 =√c−

√a0. The general solution of SK6 (17) is

an = c

(

λ2n

7 − λ2n

8

λ2n

7 + λ2n

8

)2

, bn = an − a0 + b0, (30)

for n = 0, 1, 2, . . . and real initial values a0, b0 such that

a0 6= b0, 2a0 6= b0. If a0 − b0 = 0 or 2a0 − b0 = 0, SK6

does not have solution.

4. Behaviors of solutions

In this section, we clarify behaviors of the general solu-tions (28), (29) and (30). Let us introduce the functionsφ3, φ5, φ6 and their inverse by

φ3(x) =

√x−

√x− c

√c

, φ−13 (x) =

c

4

(

x+1

x

)2

, (31)

φ5(x) =

√x−

√c

√x+

√c, φ−1

5 (x) = c

(

1 + x

1 − x

)2

, (32)

φ6(x) =

√c−

√x

√c+

√x, φ−1

6 (x) = c

(

1 − x

1 + x

)2

. (33)

Thus, we have the following theorem.

Theorem 12 Solutions an of (28), (29) and (30) are

expressed as

an = (φ−1j x2n

φj)(a0), n = 0, 1, 2, . . . (34)

for j = 3, 5, 6, respectively.

It may seem that φ3, φ5 and φ6 are conjugacies of Φ3,Φ5 and Φ6, respectively, because (34) is same as (11) inTheorem 2. However, the maps φj : R

∗ → S for c < 0,j = 3, 5, 6 are not homeomorphic, so that they cannotbe conjugacies of Φj , respectively.

Let z0 = φj(a0) ∈ C, zn = (z0)2n

∈ C. From (34), itholds that an = φ−1

j (zn). Let us denote zn in polar form

zn = rneiθn , 0 ≤ rn, 0 ≤ θn < 2π for n = 0, 1, 2, . . . .

Then, we have zn = (r0)2n

ei2nθ0 . The behavior of thesolution an = φ−1

j (zn) depends on r0, θ0. From (31), (32)and (33), it turns out that there exist the following sixcases. (i) If 0 < r0 < 1, θ0 = 0, then zn monotonicallyconverges to 0. (ii) If 0 < r0 < 1, θ0 6= 0, then zn

oscillatory converges to 0. (iii) If r0 > 1, θ0 = 0, thenzn monotonically diverges. (iv) If r0 > 1, θ0 6= 0, thenzn oscillatory diverges. (v) If r0 = 1, θ0 6= 0, then itfollows that |zn| = 1 and the map θn/(2π) 7→ θn+1/(2π)is conjugate to the Bernoulli shift (cf. [4, p. 125]). Hence,zn is chaotic. (vi) If z0 = 1 or z0 = 0, then zn is fixedpoint. Thus, we have the following theorems.

Theorem 13 The solution (28) of SK3 (13) behaves as

follows. If a0(a0−b0) > 0, then an and bn monotonically

diverge. If a0(a0 − b0) < 0, then an and bn oscillatory

diverge. If a0b0 < 0, then an and bn are chaotic. If a0 6=0, b0 = 0, then an = a0, bn = 0 is fixed point.

Theorem 14 The solution (29) of SK5 (15) behaves as

follows. If a0(a0−b0) > 0, then an and bn monotonically

converge to a0 − b0, 0, respectively. If a0b0 < 0, then an

and bn oscillatory converge to a0 − b0, 0, respectively. If

a0(a0 − b0) < 0, then an and bn are chaotic. If a0 6= 0,b0 = 0, then an = a0, bn = 0 is fixed point.

Theorem 15 The solution (30) of SK6 (17) behaves

as follows. If a0b0 < 0, then an and bn monotonically

converge to a0 − b0, 0, respectively. If a0(a0 − b0) > 0,then an and bn oscillatory converge to a0−b0, 0, respec-

tively. If a0(a0 − b0) < 0, then an and bn are chaotic. If

a0 = 0, b0 6= 0, then an = 0, bn = b0 is fixed point. If

a0 6= 0, b0 = 0, then an = a0, bn = 0 is fixed point.

In special cases for initial values a0, b0, the solutionsare expressed by hyperbolic and trigonometric functions.Thus, we have the following theorems.

Theorem 16 Let c = a0 − b0. If a0 > b0 > 0, then the

solution of SK3 (13) is given by an = c cosh2(2nµ), bn =c sinh2(2nµ) where µ = tanh−1

√

b0/a0. If a0 > 0, b0 <0, then the solution of SK3 is given by an = c cos2(2nµ),bn = −c sin2(2nµ) where µ = tan−1

√

−b0/a0.

Theorem 17 Let c = a0 − b0. If a0 > b0 > 0, then

the solution of SK5 (15) is given by an = c coth2(2nµ),bn = c(coth2(2nµ) − 1) where µ = tanh−1

√

c/a0. If

b0 > a0 > 0, then the solution of SK5 is given by

an = −c cot2(2nµ), bn = −c(cot2(2nµ) + 1) where

µ = tan−1√

−c/a0.

Theorem 18 Let c = a0 − b0. If a0 > 0, b0 < 0, then

the solution of SK6 (17) is given by an = c tanh2(2nµ),bn = c(tanh2(2nµ) − 1) where µ = tanh−1

√

a0/c. If

b0 > a0 > 0, then the solution of SK5 is given by

an = −c tan2(2nµ), bn = −c(tan2(2nµ) + 1) where

µ = tan−1√

−a0/c.

Theorems 16, 17 and 18 can be proved by using doubleangle formulae of hyperbolic and trigonometric func-tions, similar to proof about particular solutions of AHMin [1].

5. Conclusion

In this paper, we first obtain the general solution ofAHM for real and pure imaginary initial values througha conjugacy of the iteration map. We next show AHM issemiconjugate to SK3, SK5 and SK6 under some condi-tions. We obtain the general solutions of SK3, SK5 andSK6 from the general solution of AHM. We finally showbehaviors of their solutions. Moreover, we obtain partic-ular solutions by hyperbolic and trigonometric functionsfor special cases of initial values. Further problems are toobtain solutions of the other types of Sakaki-Kakei equa-tions, and to derive an identity of hypergeometric func-tion associated with the higher order systems of AHM.

References

[1] Y. Nakamura, Algorithms associated with arithmetic, geo-

metric and harmonic means and integrable systems, J. Com-put. Appl. Math., 131 (2001), 161–174.

[2] K. Kondo and Y. Nakamura, Determinantal solutions of solv-able chaotic systems, J. Comput. Appl. Math., 145 (2002),

361–372.[3] T. Sakaki and S. Kakei, Difference equations with an invari-

ant expressed in terms of the hypergeometric function (inJapanese), Trans. JSIAM, 17 (2007), 455–462.

[4] R. L. Devaney, A first course in chaotic dynamical sys-tems: theory and experiment, Addison-Wesley, Reading, Mas-sachusetts, 1992.

– 76 –


A strategy of reducing the inner iteration counts

for the variable preconditioned GCR(m) method

Kensuke Aihara1, Emiko Ishiwata2 and Kuniyoshi Abe3

1 Graduate School of Science, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo162-8601, Japan

2 Department of Mathematical Information Science, Tokyo University of Science, 1-3 Kagu-razaka, Shinjuku-ku, Tokyo 162-8601, Japan

3 Faculty of Economics and Information, Gifu Shotoku University, 1-38 Nakauzura, Gifu-shi,Gifu 500-8288, Japan

E-mail j1409601 ed.kagu.tus.ac.jp

Received February 14, 2010, Accepted May 6, 2010

Abstract

It has been clarified by numerical experiments that a variable preconditioned GCR(m) methodusing the SOR method is efficient for solving a sparse linear system. However there are casesthat the residual norm of variable preconditioned GCR method stagnates. Then the inneriteration counts increase, and more computation time is required. Therefore, we propose astrategy to reduce the inner iteration counts in case of stagnation of the residual norm byusing a certain parameter related to convergence behavior. Numerical experiments show thatour strategy is indeed effective.

Keywords linear systems, GCR method, variable preconditioning, inner iteration counts


1. Introduction

We treat the Krylov subspace (KS) methods for solv-ing a large sparse linear system

Ax = b, (1)

where A is a nonsingular and nonsymmetric n×n matrix,and the right-hand vector b is an n-vector.

It is known that a preconditioning strategy enhancethe convergence of KS methods. In a conventional pre-conditioned KS methods, a preconditioner K is con-structed such that K can approximate A (K ≈ A) andK−1v can be computed easily, where v is an iterationvector in KS. On the other hand, a variable precondition-ing in which different preconditioners can be applied ateach iteration has been proposed. The preconditioning isperformed by roughly solving Az = v in order to obtainan approximation to A−1v instead of computing K−1v.As an alternative to the generalized minimal residual(GMRES) method with the variable preconditioning us-ing KS method, the FGMRES [1] and the GMRESR [2]methods have been developed. A variable preconditionedgeneralized conjugate residual (VPGCR) method usingthe successive over relaxation (SOR) method has re-cently been proposed in [3,4]. It has been reported thatSOR is more efficient than the KS methods as a solverapplied to the system Az = v. However there are casesthat the residual norm of VPGCR stagnates regardlessof the choice of the solver for Az = v. Then the totalnumber of iterations for computing Az = v increases,and computation time is more required than the casesthat the stagnation does not occur.

For the class of bi-conjugate gradient stabilized (BiCG

stab) methods, the factor in the loss of convergencespeed has been considered to be the accuracy of BiCGcoefficients [5]. However, for the GCR method, a relationbetween a recurrence coefficient and convergence behav-ior has not previously been examined. Therefore, in thispaper, we focus on the recurrence coefficient which isessential to update the residual vectors rk in GCR, anddescribe the relationship to the convergence behavior.Then we propose a strategy to reduce the number of it-erations for solving Az = v in case of stagnation of theresidual norm. Numerical experiments demonstrate thatour strategy is more efficient than the original VPGCRusing SOR when the residual norm stagnates.

2. GCR(m) method with a variable pre-

conditioning

By multiplying right-hand side of the linear system (1)by K−1, we have the right preconditioned linear system

(AK−1)(Kx) = b. (2)

In general, the preconditioned GCR(m) algorithm canbe derived from the right preconditioned linear system(2) and K−1r is computed as the preconditioning. Herem is the restart cycle. The variable preconditioning pro-posed in [3, 4] is performed by roughly solving a linearsystem

Az = r (3)

by some iterative method to obtain an approximation toA−1r instead of computing K−1r. The variable precon-ditioned GCR(m) algorithm is described as follows:

– 77 –

JSIAM Letters Vol. 2 (2010) pp.77–80 Kensuke Aihara et al.

Variable Preconditioned GCR(m) algorithm:

1. Let x0 be an initial guess2. repeat3. set r0 = b − Ax0

4. roughly solve Az0 = r0 using some iterativemethod to get z0 and p0 = z0

5. set q0 = Ap0

6. for k = 0, 1, . . . ,m − 17. ρk = (rk, qk)8. σk = (qk, qk)9. αk = ρk/σk

10. xk+1 = xk + αkpk

11. rk+1 = rk − αkqk

12. if ‖rk+1‖2/‖r0‖2 ≤ εTOL then stop13. roughly solve Azk+1 = rk+1 using some iterative

method to get zk+1

14. τi = (Azk+1, qi), (i ≤ k)15. βk,i = −τi/σi, (i ≤ k)

16. pk+1 = zk+1 +∑k

i=0 βk,ipi

17. qk+1 = Azk+1 +∑k

i=0 βk,iqi

18. end for19. x0 = xm

20. end repeat

The iterative loops for solving the linear systems (1) and(3) are referred to as the outer-loop and inner-loop, re-spectively. The outer-loop is stopped when the relativeresidual norm becomes εTOL.

SOR is effective for the solver applied to the inner-loopas shown by numerical experiments [3, 4]. Therefore, inthis paper, we apply SOR to the inner-loop, and adoptthe stopping criteria mentioned in [3, 4]:

1. ‖z(l)k+1 − z

(l−1)k+1 ‖∞/‖z

(l)k+1‖∞ ≤ δ

2. (The maximum inner iteration counts l)=Nmax.

Here z(l)k denotes the l-th approximation for comput-

ing Azk = rk at the k-th iteration of the outer-loop.The inner-loop is stopped when either condition 1 or 2is satisfied.

The algorithm mainly costs a matrix-vector product,solving Azk+1 = rk+1 (i.e. the inner-loop), computingpk+1 as a linear combination of zk+1 and all previouspi’s, and similarly computing qk+1 at each step.

3. A strategy of reducing the inner iter-

ation counts

There are cases that the residual norm of VPGCRstagnates regardless of the choice of the inner solver. Thetotal number of iterations required for the inner-loopincreases, and the computational cost is more expensivewhen the residual norm stagnates. Note that most of thecomputation time for VPGCR is spent for the inner-loopif the number of iterations of the inner-loop is much morethan that of the outer-loop.

In this section, we introduce a certain parameter re-lated to convergence behavior. Then we propose a strat-egy for more efficient execution of the variable precondi-tioning in case of stagnation of the residual norm. Thebasic idea is to reduce the number of inner iterations bypartly omitting the inner-loop when the residual normstagnates.

3.1 The relation between a recurrence coefficient and

convergence behavior

For the class of BiCGstab methods, it has been re-searched that the accuracy of BiCG coefficients influ-ences the convergence speed [5]. The effect of roundingerrors that arise from the BiCG coefficients has been an-alyzed, and a strategy for a more stable determination ofthe coefficients has also been proposed in [5]. However, tothe best of our knowledge, the relation between a recur-rence coefficient and convergence behavior has not beenstudied for the methods based on the minimal residualapproach like GCR. Therefore we define a new parame-ter ρk to examine the convergence behavior of GCR asfollows:

ρk ≡ρk

‖rk‖2‖qk‖2. (4)

The value of (4) denotes the angle between the residualvector rk and the vector qk used in the calculation ofthe recurrence coefficient αk. If the residual vector rk

is nearly orthogonal to the vector qk (i.e. if |ρk| ≈ 0)at each iteration, the recurrence coefficient αk ≈ 0. Theresidual vector of GCR is expressed by

rk+1 = Rk+1(A)r0,

where Rk+1(λ) is the residual polynomial of degree k +1, with Rk+1(0) = 1 (cf. [6]). The leading coefficient

for Rk+1(λ) is given by∏k

i=0 αi. Now in computationswith finite precision arithmetic, the degree of polynomialRk+1(λ) will be invariant if |ρk| ≈ 0. This will lead tothe stagnation of the residual norm.

We show the theoretical relation between ρk and theresidual vector. The recurrence coefficient αk can bewritten by

|αk| =|ρk|

σk

=1

(qk, qk)|ρk|‖rk‖2‖qk‖2 = |ρk|

‖rk‖2

‖qk‖2.

Since the residual vector rk+1 is updated by

rk+1 = rk − αkqk,

the difference between rk+1 and rk is expressed by

‖rk+1 − rk‖2 = |αk|‖qk‖2 = |ρk|‖rk‖2.

As a result, the relation between ρk and the residualvector is expressed as

|ρk| =‖rk+1 − rk‖2

‖rk‖2.

|ρk| represents the angle between the residual vector rk

and the vector qk, and also implies the relative rate ofthe residual norm reduction. Therefore |ρk| is proper toexamine the convergence behavior of GCR.

3.2 Reducing the inner iteration counts

As stated in preceding section, in finite precision com-putation, the residual norm stagnates if |ρk| ≈ 0, andit implies that the residual vectors cannot be updated.The approximation zk+1 solved at the inner-loop is alsonot changed compared with zk. In this case, since thecurrent vector pk+1 is linearly dependent on the basisvector pk, the (k +1)-th Krylov subspace is equal to thek-th Krylov subspace (in finite precision). Then comput-

– 78 –


Table 1. Characteristics of coefficient matrices.

Matrix Dimension nnz entries Distribution

rdb1250 1250 7300 4.0E−00∼9.1E+01

airfoil 2d 14214 259688 2.8E−07∼1.8E+05

nnz : The number of nonzero

0 10 20 30 40 50 60 7010

−14

10−12

10−10

10−8

10−6

10−4

10−2

100

Number of iterations

Relative residual 2−norm

GCR(50)

VPGCR(50)[SOR]

10−2−VPGCR(50)[SOR]



Fig. 1. Convergence history for rdb1250.

0 10 20 30 40 50 60 7010

−20

10−15

10−10

10−5

100


| ρk

| / ||

rk ||2|| q

k ||2

Fig. 2. History of |ρk| for rdb1250.

ing (3) in the inner-loop is useless as a preconditioning.Therefore we propose that we omit the inner-loop whenthe following condition is satisfied:

|ρk| < η, (5)

where 0 < η < 1 is a threshold value. Now we incorpo-rate the formula (5) into VPGCR algorithm and rewritethe lines 9 and 13 as follows:

9. αk = ρk/σk, ρk = ρk/(‖rk‖2√

σk)13. if |ρk| ≥ η then

roughly solve Azk+1 = rk+1 using some iterativemethod to get zk+1

else zk+1 = rk+1.

Here, we can compute ρk by reusing ρk, σk, and ‖rk‖2,with only computing a square root and two scalar oper-ations.


In this section we present some numerical experi-ments on model problems with nonsymmetric matrices.GCR(m) without preconditioning, VPGCR(m) usingSOR and alternative implementation with the thresholdvalue η proposed in preceding section are abbreviated asGCR(m), VPGCR(m)[SOR] and η-VPGCR(m)[SOR],respectively.

0 20 40 60 80 10010

−14

10−12

10−10

10−8

10−6

10−4

10−2

100


Relative residual 2−norm

GCR(50)

VPGCR(50)[SOR]




Fig. 3. Convergence history for airfoil 2d.

0 20 40 60 80 10010

−20

10−15

10−10

10−5

100


| ρk

| / ||

rk ||

2|| q

k ||

2

Fig. 4. History of |ρk| for airfoil 2d.

4.1 Computational condition

Numerical calculations were carried out in double-precision floating-point arithmetic on a PC (Intel(R)Core(TM)2 Duo T8100 2.10GHz CPU) with a java1.6.001 compiler. We take up matrices from Tim Davis’sSparse Matrix Collection [7] such that the residual normof VPGCR(m)[SOR] stagnates. We display the dimen-sion, the number of nonzero entries and the distributionof absolute value for the nonzero entries of the matricesin Table 1.

The iterations of the inner-loop and outer-loop werestarted with 0, and the right-hand vector b was given bysubstituting a vector x∗ = (1, . . . , 1)T into the equationb = Ax∗. The stopping criterion of the outer-loop was‖rk‖2/‖r0‖2 ≤ εTOL = 10−12. Moreover, the parame-ters δ and Nmax in the stopping criteria of the inner-loopwere set at 10−2 and 50, respectively. The relaxation pa-rameter ω was set at the optimal values in increments of0.1 for VPGCR(m)[SOR]. The η-VPGCR(m)[SOR] wascarried out with the threshold value η = 10−2, 10−3 and10−5. The restart cycle m was set at 50.

The convergence histories for matrices rdb1250 andairfoil 2d are displayed in Figs. 1 and 3, respectively. Thehistories of |ρk| for matrices rdb1250 and airfoil 2d aredisplayed in Figs. 2 and 4, respectively. The plots showthe number of iterations on the horizontal axis of all fig-ures versus the relative residual 2-norm (‖rk‖2/‖r0‖2)on the vertical axis of Figs. 1 and 3, and |ρk| = |ρk|/(‖rk‖2‖qk‖2) on the vertical axis of Figs. 2 and 4, re-spectively.

Tables 2 and 3 show the number of iterations of the

– 79 –


Table 2. Numerical results for rdb1250.

Solver[Preconditioning] Iterations Time[sec] Residual

GCR(50) 369 0.29 9.4E−13

VPGCR(50)[SOR] 59 (2950) 0.39 (1.2) 5.5E−13

10−2-VPGCR(50)[SOR] 62 (1400) 0.24 (1.2) 5.8E−13

10−3-VPGCR(50)[SOR] 59 (1300) 0.23 (1.2) 5.5E−13

10−5-VPGCR(50)[SOR] 59 (1450) 0.24 (1.2) 5.5E−13

Table 3. Numerical results for airfoil 2d.

Solver[Preconditioning] Iterations Time[sec] Residual

GCR(50) † – 1.0E−00

VPGCR(50)[SOR] 106 (5300) 12.66 (1.8) 7.0E−11

10−2-VPGCR(50)[SOR] 61 (1084) 3.28 (1.8) 7.5E−11

10−3-VPGCR(50)[SOR] 106 (1545) 4.75 (1.8) 5.0E−11

10−5-VPGCR(50)[SOR] 106 (2200) 6.11 (1.8) 5.1E−11

† : No convergence

outer-loop required for successful convergence (the to-tal number of iterations of the inner-loop), the com-putation time (the value of the relaxation parameter ωused in SOR) and the explicitly computed residual norm‖b − Axk‖2/‖b − Ax0‖2, which are abbreviated as “It-erations”, “Time[sec]” and “Residual”, respectively.

4.2 Numerical results for matrix rdb1250

From Figs. 1, 2 and Table 2, we can observe the fol-lowing: The residual norm of GCR(50) decreases slowly.The residual norm of VPGCR(50)[SOR] decreases, butafter the 15th step, it stagnates until the iteration isrestarted. Notice that |ρk| ≈ 0 while the residual normstagnates, in contrast, |ρk| ≈ 1 while the residual normis reduced. The results indicate that |ρk| reflects the con-vergence behavior. Note that the value |ρk| was modifiedby restart.

The number of iterations of the outer-loop for η-VPGCR(50)[SOR] required for successful convergenceare about the same as that for VPGCR(50)[SOR]. Inthe case of η = 10−2, the convergence history ofthe residual norm after the restart is slightly differ-ent from VPGCR(50)[SOR]. The number of iterationsof the inner-loop for η-VPGCR(50)[SOR] are about50% of that for VPGCR(50)[SOR]. The convergencebehavior of η-VP GCR(50)[SOR] are similar to thatof VPGCR(50)[SOR], though the inner-loop is omit-ted for |ρk| < η. The computation time for eachof η-VPGCR(50)[SOR] are at most 62% of that forVPGCR(50)[SOR].

Note that the number of iterations of the outer-loopfor VPGCR(50)[SOR] required for successful conver-gence is less than that of GCR(50). However, the stag-nation of the residual norm causes an unnecessary in-crease of the number of iterations of the inner-loop. Thenthe computation time for VPGCR(50)[SOR] is morerequired than that for GCR(50). η-VPGCR(50)[SOR]reduces the number of iterations of the inner-loopsufficiently, then it is superior to GCR(50) and VPGCR(50)[SOR] in terms of computation time.

4.3 Numerical results for matrix airfoil 2d

From Figs. 3, 4 and Table 3, we can observe the fol-lowing: The residual norm of GCR(50) stagnates eventhough 20000 iterations were repeated. The residualnorm of VPGCR(50)[SOR] decreases with the stagna-

tion in twice. The restart modifies |ρk|. It implies thatthe stagnation is cured.

The convergence histories of the residual norm of η-VPGCR(50)[SOR] with η = 10−3 and 10−5 are similarto that of VPGCR(50)[SOR]. For each of these methods,the number of iterations of the outer-loop required forsuccessful convergence are the same. The number of it-erations of the inner-loop for η-VPGCR(50)[SOR] aremuch less than that for VPGCR(50)[SOR]. Conse-quently the computation time for η-VPGCR(50)[SOR]with η = 10−3 and 10−5 are at most 38% and 49%of that for VPGCR(50)[SOR], respectively. Thus η-VPGCR(50)[SOR] is still better. On the other hand,the convergence history of the residual norm of η-VPGCR(50)[SOR] with η = 10−2 is different from thatof VPGCR(50)[SOR]. The residual norm stagnates un-til about the 40th step, but after that, it decreasessmoothly. These results indicate that the convergencebehavior of VPGCR is not affected by omitting theinner-loop if |ρk| is adequately small.


We have defined a parameter ρk which relates therecurrence coefficient with the convergence behavior ofGCR. Then, by using the parameter, we have proposeda strategy to reduce the number of iterations of theinner-loop when the residual norm stagnates. Numer-ical experiments show that the number of iterations ofthe inner-loop and computation time are reduced in caseof stagnation of the residual norm. A practical choice ofthe threshold value η is about 10−3. As a future work, wewill apply the strategy of reducing the number of itera-tions of the inner-loop to other variable preconditionedKS methods, and further investigate the theoretical re-lation between ρk and the restart.

Acknowledgments

The authors thank the reviewer for their careful read-ing and helpful suggestions. The authors would like toalso thank Prof. Gerard L.G. Sleijpen for helpful advice.

References

[1] Y. Saad, A flexible inner-outer preconditioned GMRES algo-rithm, SIAM J. Sci. Comput., 14 (1993), 461–469.

[2] H.A.van der Vorst and C.Vuik, GMRESR: A family of nested

GMRES methods, Numer. Lin. Alg. Appl., 1 (1994), 369–386.[3] K. Abe, S. -L. Zhang, H. Hasegawa and R. Himeno, A SOR-

base Variable Preconditioned GCR Method (in Japanese),

Trans. JSIAM, 11 (2001), 157–170.[4] K. Abe and S. -L. Zhang, A variable preconditioning using

the SOR method for GCR-like methods, Int. J. Numer. Anal.

Model., 2 (2005), 147–161.

[5] G. L. G. Sleijpen and H. A. van der Vorst, Maintaining con-vergence properties of BiCGstab methods in finite precision

arithmetic, Numer. Algorithms, 10 (1995), 203–223.[6] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS,

Boston, 1996.[7] T. Davis, Sparse Matrix Collection, http://www.cise.ufl.edu/

research/sparse/matrices/.

– 80 –


On a knapsack based cryptosystem using real quadratic

and cubic fields

Keiichiro Nishimoto1 and Ken Nakamula1

1 Mathematics and Information Sciences, Graduate School of Science and Engineering, TokyoMetropolitan University, 1-1 Minami-Osawa, Hachioji, Tokyo 192-0397, Japan

E-mail nishimoto-keiichiro ed.tmu.ac.jp

Received December 21, 2009, Accepted March 20, 2010

Abstract

In [1], a knapsack based cryptosystem is proposed using number fields as a scheme of quantumpublic key cryptosystems. We studied on key generation of this scheme in the case of imaginaryquadratic fields [2]. In this paper, we study the cases of real quadratic fields and cubic fields.We first give some propositions for practical key generation. We then estimate various densitiesof the generated knapsack problems for these cases and for the imaginary quadratic case. Wefurther generate explicit public keys and knapsack problems for several special cases and testthe resistance against low-density attacks.


1. Introduction

Shor proved that the integer factoring problem andthe discrete logarithm problem (DLP) could be solvedin polynomial time by using the quantum turing machine(QTM) in [3]. Therefore, public-key cryptosystems basedon these problems are not secure when a QTM is re-alized. A concept of quantum public-key cryptosystem(QPKC) with a concrete scheme (OTU2000) in [1] givesthe first answer to this problem. OTU2000, a knapsackbased cryptosystem, seems to be secure even againstQTM adversaries. We need QTMs only to solve theDLP in number fields for generating public keys fromprivate keys. We are interested in generating keys with-out QTMs and estimating its security. We gave practicalkey generation algorithms for imaginary quadratic fieldsin [2].

The purpose of this paper is to give some proposi-tions and to report the results for OTU2000 over realquadratic and cubic fields. As a consequence, we canefficiently generate explicit public keys such that low-density attacks almost always fail at the stage of solvingthe shortest vector problem.

In Section 2, we generalize the practical key genera-tion algorithm for imaginary quadratic fields in [2] toarbitrary fields. In Section 3, we give important propo-sitions to implement OTU2000 over real quadratic fieldsand cubic fields. In Section 4, we show experimental re-sults about various densities of the generated knapsackproblem and study the resistance against low-density at-tacks. In Section 5, we discuss conclusions and futureproblems.

2. Key generation of OTU2000

First, we generalize the key generation algorithm in[2]. Let K be a number field defined by a monic irre-ducible polynomial f ∈ Z[x] of degree r, OK the ring of

integers of K, and ω1 := 1, ω2, . . . , ωr form an integralbasis of K. We also define a subset At of OK by

At :=

z1ω1 + · · · + zrωr

∣

∣

∣

∣

zi ∈ Z, −t

2≤ zi ≤

t

2

. (1)

Algorithm 1 Given (n, k, f), this algorithm outputsa private key (f, g, e, p, S) and a public key (n, k, b).

1. Choose ℓ ∈ Z suitably, and let P be the set of primeelements of K in A2ℓ.

2. Randomly take a subset S = S1, · · · , Sn of n non-associate elements of P .

3. Choose a rational odd prime number p so that pOK

is a prime ideal of OK satisfying the following con-dition (2), and randomly choose g ∈ OK such that〈g (mod pOK)〉 = (OK/pOK)×.

k∏

j=1

Sij∈ Ap for ∀Si1 , · · · , Sik

⊂ S. (2)

4. Randomly choose e ∈ Z with 0 ≤ e ≤ pr − 2. Foreach i from 1 to n, compute ai such that gai ≡ Si

(mod pOK) and compute bi ≡ ai + e (mod pr − 1).Set b := (b1, · · · , bn). Then output the private key(f, g, e, p, S) and the public key (n, k, b).

Remark 2 Actually, we implemented and experi-

mented under the following settings. We set ℓ to be the

smallest so that A2ℓ has at least n non-associate prime

elements of K. If q is the smallest p satisfying (2), we

take p in the step 3 randomly between q and 2q. Then Sis almost determined by ωi and n, but it does not cause

a problem since K and p are still hidden.

3. Real quadratic fields and cubic fields

We first give important propositions to implementOTU2000 over real quadratic fields and cubic fields. Dif-ferent from the imaginary quadratic case, there are two

– 81 –

JSIAM Letters Vol. 2 (2010) pp.81–84 Keiichiro Nishimoto et al.

difficulties in Algorithm 1 when we take S and p. Oneis how to judge associate in P . It is easy to judge as-sociate in imaginary quadratic fields since the group ofunits is finite and simple. In order to judge associate inother fields, there are so many α, β ∈ A2ℓ with the samenorm, and it is necessary to compute and see whetherα/β ∈ OK for each such pair, since the group of units isinfinite. Hence, we change the step 1 as follows:

1. Let P = ∅. Repeat the following by increasing ℓuntil P has at least n element. For each π ∈ A2ℓ,if π is a prime element with norm coprime to anyelement of P , then replace P by P ∪ π. For eachπ ∈ P , if π′ is a non-associate conjugate of π whichbelongs to A2ℓ, then replace P by P ∪ π′.

Then P does not have associate elements.Next is how to verify the condition (2). There is a

simple sufficient condition of (2) for imaginary quadraticfields using norms [1,2]. Then, to choose p, we may com-pute norms n times instead of computing multiplications(

nk

)

times. There is, however, no such sufficient conditionfor real quadratic fields and cubic fields. Therefore, wepropose several sufficient conditions in the following.

3.1 The case of real quadratic fields

Proposition 3 Let K = Q(√

θ) be a real quadratic

field and OK = Z[ω] be the ring of integers of K, where

θ is a square-free positive integer, ω = (1 +√

θ)/2 if

θ ≡ 1 (mod 4) and ω =√

θ otherwise. For zij ∈ Z and

zi ∈ Z>0, write∏k

j=1(z1j + z2jω) = X1 +X2ω and (z1+

z2ω)k = X1′ + X2

′ω with Xi,Xi′ ∈ Z. Assume |zij | ≤ zi

(i = 1, 2, j = 1, . . . , k). Then |Xi| ≤ Xi′(i = 1, 2).

Proposition 3 means that the product of k integers inA2ℓ always belongs to Ap if the condition

(ℓ + ℓω)k ∈ Ap (3)

holds. Namely, (3) is a sufficient condition of (2). There-fore, we can choose p by one power computation. Thesize of p, however, grows big as θ grows big in general.So we propose the following proposition to make the sizeof p as small as possible.

Proposition 4 Let K,OK and ω be as above. Write

(c + ω/c)k = X1,c + X2,cω, where c is a positive integer.

Put cm := ⌊√

ω +0.5⌋. Then the minimum Xi,c is Xi,cm

if ω =√

θ and is Xi,cmor Xi,cm±1 otherwise (i = 1, 2).

By Proposition 4, if we let P be a subset of the setz1 + z2ω | zi ∈ Z, |z1| ≤ ℓc, |z2| ≤ ⌊ℓ/c⌋ instead of theset A2ℓ in the step 1, then the condition

(

ℓc +

⌊

ℓ

c

⌋

ω

)k

∈ Ap (c = ⌊√

ω + 0.5⌋) (4)

is a sufficient condition of (2). Similar type of refinementis possible for imaginary quadratic fields, too.

3.2 The case of cubic fields

We can generalize Proposition 3 for cubic fields asfollows:

Proposition 5 Let K = Q(ω) be a cubic field such that

the ring of integers of K is OK = 〈1, ω, ω2〉, where ω is

a root of f(x) = x3 + a2x2 + a1x + a0 with a0, a1, a2 ≤ 0

irreducible over Q. For zij ∈ Z and zi ∈ Z>0, write∏k

i=1(z1j + z2jω + z3jω2) = X1 +X2ω +X3ω

2 and (z1 +z2ω + z3ω

2)k = X1′ + X2

′ω + X3′ω2 with Xi,Xi

′ ∈ Z.

Assume |zij | ≤ zi (i = 1, 2, 3, j = 1, . . . , k). Then |Xi| ≤Xi

′ (i = 1, 2, 3).

Proposition 5 is not so practical since OK must have apower basis with another condition. Therefore we pro-pose a proposition applicable to any cubic fields.

Proposition 6 Let K be a cubic field such that the

ring of integers of K is OK = 〈1, ω2, ω3〉. For zij ∈ Z

and s, t, u ∈ Z≥0, write∏k

j=1(z1j+z2jω2+z3jω3) = X1+

X2ω2 + X3ω3, ω2tω3

u = z1(tu) + z2

(tu)ω2 + z3(tu)ω3 and

∑

s+t+u=k|z1(tu)| + |z2

(tu)|ω2 + |z3(tu)|ω3

(

ks

)(

k−st

)

=

M1 + M2ω2 + M3ω3, where Xi, zi(tu),Mi ∈ Z. If M =

max |zij | and C = max |Mi|, then |Xi| ≤ MkC.

By Proposition 6, the coefficients of the product of kintegers in A2ℓ is bounded by Cℓk. Therefore, the con-dition

ℓkC ≤ 2p (5)

is a sufficient condition of (2). In order to evaluate C, wemay compute ω2

tω3u for about k2/2 pairs (t, u). Then we

can choose p by condition (5) for arbitrary cubic fields.In such a way, we can generate keys efficiently. Whetherit is effective or not, this kind of discussion is possiblefor general number fields.


We now give the results of our experiment usingMAGMA [4]. Our environment is as follows:

CPU: AMD Opteron 246x2 2GHz(Dual),Software: MAGMA V2.15-14.

4.1 Bit size of p and various densities

In computational experiments, we generated 10 pri-vate keys for each given input parameter (n, k, θ or f)with k = ⌊

√n⌋ and compared the average of bit size

of p, various densities d, κ [5] and D [6]. Becausemaxi bi ≈ pr, we regard n/ log2 pr = d, k log2 n/ log pr =κ and dH(k/n) = D, where H(x) := −x log2 x − (1 −x) log2(1− x) is an entropy function. In Fig. 1, θ meansthe quadratic field K = Q(

√θ), f means the cubic field

K = Q[x]/(f [1]x3 + f [2]x2 + f [3]x + f [4]) and h meansthe class number of the given field.

In Fig. 1, we see that the bit size of p grows big as thediscriminant and the class number of the field grow bigfor imaginary quadratic fields. For real quadratic fieldsand cubic fields, we can guess that there are many fieldswith small p because there are many fields with classnumber 1. We, however, could not make p small whenthe discriminant is big.

In Fig. 2, we see that each density for real quadraticfields is almost the same as that density for imaginaryquadratic fields. On the other hand, it is slightly low forcubic fields. Furthermore, we can make density d enoughhigh with an appropriate choice of the parameters. But,we can not make densities κ and D enough high becausek ≪ n. Therefore, it seems that there is an efficientreduction from the generated knapsack problem to the

– 82 –


40

60

80

100

120

140

160

180

200

100 150 200 250 300 350 400 450 500

Bit

siz

e of p

Dimension n

θ= −1, h=1θ= −14, h=4θ= −74, h=10θ= −163, h=1

θ= 2, h=1θ= 14, h=1θ= 74, h=2θ= 161, h=1

50

60

70

80

90

100

110

120

130

140

150

160

100 150 200 250 300 350 400 450 500

Bit

siz

e of p

Dimension n

f = [1, 0,−1,−1], h =1

f = [1, 0,−3,−1], h =1

f = [1, 0,−4,−1], h =1

Fig. 1. Comparison of the bit size of p.

lattice shortest vector problem [5,6]. In the next subsec-tion, we discuss about this in detail.

4.2 Resistance against low-density attack

A Knapsack problem generated by OTU2000 is as fol-lows: Given a set b1, b2, · · · , bn of public key and acipher text c =

∑ni=1 mibi (mi ∈ 0, 1), find the mi’s.

The reduced lattice problem from a knapsack prob-lem is as follows: Given a lattice L spanned by v1, v2,. . . , vn+1, find the shortest vector of L, where λ is aninteger larger than

√n and

v1 = (1, 0, · · · , 0, λb1),

v2 = (0, 1, · · · , 0, λb2),

...

vn = (0, 0, · · · , 1, λbn),

vn+1 = (0, 0, · · · , 0, λc).

(6)

The idea of the low-density attack is that if we canfind the shortest vector of the lattice L, it may be thesolution of the knapsack problem, i.e. (m1, · · · ,mn, 0)may be the shortest vector of L with high probability.Here we use the type of lattice proposed by Lagarias andOdlyzko [7] because k ≪ n.

In computational experiments, we generated 5 pub-lic keys for each given input parameter (n, k, θ or f) andchose 200 plain texts randomly for each key. After that,

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

100 150 200 250 300 350 400 450 500

Den

sity

d

Dimension n

θ= −1θ=−74

θ=2θ=74

f =[1, 0, −1, −1]f =[1, 0, −4, −1]

0

0.2

0.4

0.6

0.8

1

100 150 200 250 300 350 400 450 500

Pse

udo-d

ensi

ty κ

Dimension n

θ= −1θ=−74

θ=2θ=74

f =[1, 0, −1, −1]f = [1, 0, −4, −1]

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

100 150 200 250 300 350 400 450 500

Den

sity

D

Dimension n

θ= −1θ=−74

θ=2θ=74

f = [1, 0, −1, −1]f =[1, 0, −4, −1]

Fig. 2. Comparison of the densities d, κ, D.

we computed cipher texts and created the correspondinglattices as above. Furthermore, we applied LLL reduc-tion algorithm [8] to these lattices. Then OTU2000 isbroken if the vector v = (m1, · · · ,mn, 0) belongs to theLLL reduced basis. We also take λ = n and LLL pa-rameters δ to 0.999 and η to 0.501. Below we show thebreaking rate of the attack with 1000 lattices.

From Fig. 3, we saw that the breaking rate decreasesas densities grow high when the size n is less than about150. We have actually generated public keys when thesize n is up to 200. For these reduced lattice problems,our experiments showed that LLL reduction can not findthe shortest vector if the size n exceeds 150.

– 83 –


0

5

10

15

20

25

30

35

40

80 90 100 110 120 130 140 150

θ= −1θ=−14

θ=−163θ=2

θ=14θ=161

f = [1, 0, −1, −1]f = [1, 0, −3, −1]f = [1, 0, −4, −1]

Dimension n

Bre

akin

g r

ate

Fig. 3. Comparison of the breaking rate.

5. Conclusions and considerations

We proposed some propositions to implement OTU2000 over real quadratic fields and cubic fields. Fur-thermore, we showed experimental results about vari-ous densities and estimated the resistance against low-density attack. As a result, we saw that densities grewhigh when the class number was small and the discrimi-nant was small for real quadratic fields. A similar fact isalso observed at least for cubic fields we experimented.These densities are slightly lower than those for imag-inary quadratic fields with small class number. But itis worth to use the case of real quadratic and cubicfields with small class number. Because, for imaginaryquadratic fields, densities are low when the class numberis large, and there are only finitely many fields when theclass number is bounded.

In addition, the breaking rate by the low-density at-tack decreases as densities grow high when the size n isless than about 150. On the other hand, our experimentshows that LLL algorithm can not find the shortest vec-tor if the size n exceeds 150 regardless of the densities.Hence, it seems that we should take a security parametern more than at least 150 to generate keys which have re-sistance against low-density attack with LLL reduction.

The lower estimate (5) of p is not so sharp. To im-prove the estimate is an important future problem. Weestimated resistance against low-density attack only us-ing LLL reduction. Therefore, it is an important futureproblem that we estimate resistance using BKZ reduc-tion and other reductions. It will also be interesting tostudy the case of number fields of higher degrees.

References

[1] T. Okamoto, K. Tanaka and S. Uchiyama, Quantum public-key cryptosystems, in: Proc. of CRYPTO 2000, B. Mihir ed.,Lect. Notes Comput. Sci., Vol. 1880, pp. 147–165, Springer-Verlag, Berlin, 2000.

[2] K. Nishimoto and K. Nakamula, On key generation of OTU2000 and related problems, Trans. JSIAM, 18 (2008), 185–197.

[3] P. W. Shor, Algorithms for quantum computation: discretelogarithms and factoring, in: Proc. of the 35th Annual Sym-posium on Foundations of Computer Science, pp. 124–134,

1994.[4] MAGMA Group, MAGMA, http://magma.maths.usyd.edu.

au/magma/MagmaInfo.html.[5] P. Q. Nguyen and J. Stern, Adapting density attacks to low-

weight knapsacks, in: Proc. of ASIACRYPT 2005, R. Bimaled., Lect. Notes Comput. Sci., Vol. 3788, pp. 41–58, Springer-Verlag, Berlin, 2005.

[6] N.Kunihiro, New definition of density on knapsack cryptosys-

tem, in: Proc. of AFRICACRYPT 2008, V. Serge ed., Lect.Notes Comput. Sci., Vol. 5023, pp. 156–173, Springer-Verlag,Berlin, 2008.

[7] J. C. Lagarias and A. M. Odlyzko, Solving low-density subsetsum problems, J. ACM, 32 (1985), 229–246.

[8] A. K. Lenstra, H. W. Lenstra Jr., and L. Lovasz, Factor-ing polynomials with rational coefficients, Math. Ann., 261

(1982), 515–534.

– 84 –


Cryptanalysis of the birational permutation signature

scheme over a non-commutative ring

Naoki Ogura1 and Uchiyama Shigenori1

1 Department of Mathematics and Information Sciences, Tokyo Metropolitan University, Tokyo192-0397, Japan

E-mail ogura-naoki ed.tmu.ac.jp

Received March 31, 2010, Accepted May 5, 2010

Abstract

In 2008, Hashimoto and Sakurai proposed a new efficient signature scheme, which is a non-commutative version of Shamir’s birational permutation signature scheme. Shamir’s schemeis a generalization of the Ong-Schnorr-Shamir scheme and was broken by Coppersmith et al.using its linearity and commutativity. The HS (Hashimoto-Sakurai) scheme is expected to besecure against the attack from its non-commutative structure. In this paper, we propose anattack against the HS scheme, which is practical under the condition that its step size andthe number of steps are small. We discuss its efficiency by using some experimental results.

Keywords non-commutative ring, birational permutation, Rainbow, Grobner basis,MQPKC


1. Introduction

In 1984, the OSS signature scheme was proposed byOng et al. [1]. Also, in 1994, Shamir [2] proposed the so-called birational permutation signature scheme as a gen-eralization of the OSS scheme. (Indeed, Tsujii et al. [3]had already found a similar scheme in 1986.) The se-curity of the birational permutation signature schemeis based on the hardness of the problem of finding asolution for simultaneous multivariate quadratic equa-tions (MQ system) over an integer residue ring; we callthe problem “MQ problem”. The problem of decidingwhether an MQ system over a finite field has a solu-tion or not belongs to the set of NP-complete prob-lems, and quantum polynomial algorithms for solvingthe MQ problem are still unknown. On the other hand,in 1997, Satoh and Araki [4] proposed a quaternion ver-sion of the OSS scheme. Unfortunately, practical attacksagainst these scheme were proposed [5–7]. Then, in 2008,Hashimoto and Sakurai [8] proposed a non-commutativeversion of Shamir’s scheme. They expected that its non-commutativity makes us difficult to apply these attacks.Also, they discussed the HS scheme is comparable toShamir’s scheme in efficiency.

In this paper, we propose an attack against the HSscheme, which is efficient under the condition that itsstep size and the number of steps are small. Note thatthe condition would be preferable for increasing effi-ciency and reducing the key size. We firstly reduce theHS scheme to some commutative scheme. Then we ap-ply Patarin-like [9] attack against the commutative bi-rational permutation signature scheme. Also, we discussefficiency of our attack with some experimental results.Moreover, we suggest some specific parameters for theHS scheme based on our cryptanalysis.

This paper is organized as follows. In Section 2, weexplain that the HS scheme can be considered as ascheme over an integer residue ring, that is, a commuta-tive ring. In Section 3, we describe an attack against theHS scheme (or some Rainbow-type scheme). In Section4, we show experimental results against the HS scheme.In Section 5, we suggest some possible parameters forthe HS scheme based on our cryptanalysis. In Section 6,we conclude this paper.

2. Reduction to commutative case

In this section, we briefly introduce the HS scheme andexplain how to reduce the HS scheme to a commutativescheme.

Let N be a large prime or the product of two largeprimes and define ZN := Z/NZ. We define that R is anon-commutative subring of a matrix ring over a residueclass ring of an integral ring of some algebraic numberfield modulo N . We construct R as ZN -free module.Also, R has the property that at ∈ R for a ∈ R, where at

is the transpose of a. The public-key of the HS scheme isthe map P = B G A, where A and B are secret bijec-tive affine transformations. The map G = (G2, . . . , Gl) :Rl → Rl−1 is defined as the following.

Gi(X1, . . . ,Xl) :=∑

j≤i−1

XjtV t

1ijXi +∑

j≤i−1

XitV2ijXj

+∑

j1,j2≤i−1

Xj1tWij1j2Xj2 ,

where V1ij , V2ij , Wij1j2 ∈ R. Refer to [8] about the HSscheme for more information.

Hashimoto and Sakurai studied the security of someclass of the HS scheme, which is a non-commutative ver-sion of the OSS scheme. They showed that some type of

– 85 –

JSIAM Letters Vol. 2 (2010) pp.85–88 Naoki Ogura et al.

the HS scheme is resistant to Coppersmith’s attack [7]under the condition that factoring of N is infeasible.Moreover, Hashimoto and Sakurai mentioned that somenon-commutative OSS scheme (which is included in thescheme above) is resistant to Coppersmith’s (first) at-tack under the condition that factoring of N is infea-sible. Though all of the HS scheme do not necessarilydepend on infeasibility of factorization, we would takelarge N with expectation to increase its security.

Now we explain a way of reduction and define a com-mutative scheme obtained from the HS scheme. Thisreduction was partially discussed in [8]. At first weexpress elements in R by using a ZN basis αi

ri=1.

Namely, we set as Xi =∑r

k1=1 xijαj (i = 1, . . . , l),

V1ij =∑r

k2=1 v1ijk2αk2

(i = 2, . . . , l, j = 1, . . . , l − 1),

V2ij =∑r

k3=1 v2ijk3αk3

(i = 2, . . . , l, j = 1, . . . , l − 1)

and Wij1j2 =∑r

k4=1 wij1j2k4αk4

(i = 2, . . . , l, j1, j2 =1, . . . , i − 1), where xik, v1ijk, v2ijk, wij1j2k ∈ ZN .

Then, each terms of the map Gi can be written asa linear combination with αk1

tαk2

tαk3, αk1

tαk2αk3

. Forexample,

XjtV t

1ijXi =∑

k1,k2,k3≤r

xjk1v1ijk2

xik3αk1

tαk2

tαk3.

Since αk1

tαk2

tαk3, αk1

tαk2αk3

∈ R, the elements can bealso expressed as the linear combination of αi

ri=1. So

the map Gi can be written as the following.

r∑

k′=1

∑

j≤i−1

∑

k1,k2≤r

(v′ijk1k2k′xik1

xjk2)

+∑

j2≤j1≤i−1

∑

k1,k2≤r

(w′ij1j2k1k2k′xj1k1

xj2k2)

αk′ ,

where ∃v′ijk1k2k′ , w′

ij1j2k1k2k′ ∈R. Hashimoto and Saku-rai [8] mentioned that the representation of αk1

tαk2αk3

as the linear combination of αi is involved in the se-curity of the HS scheme. However, the security of theHS scheme is related to the form of not αk1

tαk2αk3

butv′

ijk1k2k′ . So, even if αk1

tαk2αk3

has some simple form,it is considered that the HS scheme would be securewhen V1ij , V2ij are selected randomly. (Of course, if weconsider special types such as the OSS scheme, the formof αk1

tαk2αk3

is closely related to the security of thescheme.)

We showed the HS scheme can be reduced to somecommutative scheme. Based on the observation, we de-fine the Rainbow-type [10] signature scheme as the fol-lowing. Let K be a finite field or an integer residue classring and set N be the order of K. We select two inte-gers r, l such that Klr is large enough to satisfy secu-rity requirements and set n := lr. We define a functionν : r +1, . . . , n → r, 2r, . . . , lr as ν(i) < i ≤ ν(i)+ r.

[Secret-key]

i) Generate a bijective affine transformation A : Kn

→ Kn.

ii) Generate an affine transformation B : Kn−r →Kn−r

iii) For each i from r+1 to n, generate a ν(i)×r-matrix

Vi = (vij1j2)j1=1,...,ν(i), j2=1,...,r over K.

iv) For each i from r + 1 to n, generate aν(i)-dimensional lower triangular matrix Wi =(wi,j1,j2)1≤j1,j2≤ν(i) over K.

[Public-key]Construct a map P = B G A, where G = (g(r+1),. . . , gn) : Kn → Kn−r is the map below.

gi(x1, . . . , xn) :=∑

j1≤ν(i)<j2≤ν(i)+r

vij1j2xj1xj2 +∑

j2≤j1≤ν(i)

wij1j2xj1xj2 .

[Signing]

i) By applying a hash function to a message, generatem ∈ Kn−r.

ii) Compute m′ := B−1(m) = (y(r+1), . . . , yn).

iii) Select x1, . . . , xr ∈ K randomly.

iv) Compute σ′ := G−1(m′) = (x1, . . . , xn) by solvingthe following inductive linear equations. For each kfrom 1 to l − 1,

ykr+1 −∑

j2≤j1≤kr

w(kr+1)j1j2xj1xj2

=kr+r∑

j2=kr+1

kr∑

j1=1

v(kr+1)j1j2xj1

xj2 ,

...

ykr+r −∑

j2≤j1≤kr

w(kr+r)j1j2xj1xj2

=

kr+r∑

j2=kr+1

kr∑

j1=1

v(kr+r)j1j2xj1

xj2 .

(1)

v) Let a signature be σ := A−1(σ′).

[Verification]

i) By applying a hash function to a message, generatem ∈ Kn−r.

ii) Verify that m corresponds with the element gener-ated by applying P to the signature.

In what follows, we set N be the order of K. We callthe scheme above the commutative HS scheme or ther-Rainbow scheme. Note that, Rainbow [10], which wasproposed by Ding et al. in 2005, which uses similar in-ductive construction. However, from our perspective, Nis large and r, l are small. So the scheme above is dif-ferent from the original Rainbow scheme with respect tothe setting of parameters.

Here, we consider the performance of the commutativeHS scheme. The dominant part of the signing is compu-tation of affine transformations, summation

∑

j2≤j1≤kr

and solving the linear equations (1). The total compu-tational complexity is O(n3 lg2 N). The same holds forthe complexity of the verification. Moreover, the size ofsecret-key and public-key is O(n3 lg N). So, when param-eters n = lr is small, we have the advantage of improvingefficiency and reducing key size.

– 86 –


Table 1. Algorithm A: our attack against the r-Rainbow scheme.

Input: a public function P = (P1, . . . , Pn), parameters n, r, l,

a message m

Output: a valid signature for m

while true dopoly

in

i=1← The polynomial representations of Pi −mi

for k from 1 to r dopoly(n+k) ← A random linear polynomial

a0 + a1x1 + · · ·+ anxn (ai ∈ K)end forI ← The ideal generated by poly1, . . . , poly(n+r)

f1, . . . , ft ← A Grobner basis of I

V ← The variety of I (which is generated by fi)if V 6= ∅ then

return σ ∈ V (select randomly)end if

end while

3. Attack against the HS scheme

In this section, we describe our attack in detail. Weremind you of the condition that N is large and n = rlis small.

3.1 Our attack

In this subsection, we explain our algorithm. Table 1shows our algorithm of breaking the r-Rainbow scheme.The essence of our attack is that, if x1, . . . , xr ∈ K isfixed, then the map P can be considered as an almostbijective map. Note that the idea was used at [11] forattacks against variants of HFE [12]. We can expect thatalmost all random polynomials can be a good choice,that is, V is not empty set, because the solution space ofpolyi

ni=1 has at least an r-dimensional linear space. So

we can expect that Grobner basis algorithm works verywell.

We use the software Magma [13] for our implementa-tion, and the the default algorithm in Magma for com-puting a Grobner basis is F4 algorithm proposed byFaugere [14]. If a lexicographical Grobner basis of anideal I is determined, computing the variety V (I) is notso difficult.

3.2 Analysis of our algorithm

Our algorithm uses Grobner basis algorithm, so itwould be difficult to investigate its complexity directly.Then, in order to analyze the complexity of our algo-rithm, we employ Patarin’s attack [9] as some approxi-mation of our algorithm.

Let S(k)(x) be the matrix below corresponding theequations (1).

kr∑

j1=1

v(kr+1)j1(kr+1)xj1 · · ·kr∑

j1=1

v(kr+r)j1(kr+1)xj1

.... . .

...kr∑

j1=1

v(kr+1)j1(kr+r)xj1 · · ·kr∑

j1=1

v(kr+r)j1(kr+r)xj1

.

Also, we define ∆(k)ij (x) be (i, j)-cofactor of S(k)(x).

Then, we have the following relation by Cramer’s for-

Table 2. Experimental results against the r-Rainbow scheme.

r 2 2 2 2 3 3 4

l 3 4 5 6 3 4 3

N 140 140 140 140 140 140 140

time[s] 0.02 0.08 1.1 169 0.08 2.1 11

Table 3. Experimental results against the r-Rainbow scheme forr = 2, l = 4.

lg N 100 110 120 130 140 150

time[s] 0.24 0.25 0.26 0.27 0.28 0.29

mula.

x(kr+1) =

r∑

j3=1

(

y(kr+j3) −∑

j2≤j1≤kr

w(kr+j3)j1j2xj1xj2

)

×∆

(k)1j3

(x)

|S(k)(x)|...

x(kr+r) =

r∑

j3=1

(

y(kr+j3) −∑

j2≤j1≤kr

w(kr+j3)j1j2xj1xj2

)

×∆

(k)rj3

(x)

|S(k)(x)|, (2)

where |S(k)(x)| is the determinant of S(k)(x). Note that

|S(k)(x)|,∆(k)ij (x) are some polynomial with respect to

x = (x1, . . . , xn) whose degree is r, r − 1, respectively.Here, we assume that y1, . . . , yr is a linear combinationof x1, . . . , xr. For i from r +1 to n, we can express xi byusing x1, . . . , xr and y1, . . . , y(n−r) as the following.

xi =h(i)(y1, . . . , y(n))

f (ν(i))(y1, . . . , y(n)),

where h(i) is some polynomial whose degree (with re-spect to y1, . . . , yn) is (r+1)(ν(i)/r)−1 and f (ν(i)) is somepolynomial whose degree (with respect to y1, . . . , yn−r)is (r + 1)(ν(i)/r) such that fν(i) | fν(i+1). We can verifythe relation by using (2) recursively. So we can applyPatarin’s attack, that is, to find the relation betweenm and σ by substituting y = B−1(m), x = A(σ). Thecomputational complexity of deducing some relationsis O(n3(r+1) lg2 N). In our situation, l, r (and n =lr) are very small, so our algorithm works againstthe HS scheme. Note that various experiments showthat Grobner basis algorithm would work faster thanPatarin’s attack.


In this section, we give some experiments against ther-Rainbow scheme. Tables 2, 3 are experimental resultsof our attack. We used the computer with 2GHz CPU(AMD Opteron 246), 4GB memory, and 160GB harddisk. For our implementation, we employed MagmaV2.15-3. We showed that the complexity of our attack isO(n3(r+1) lg2 N). So our attack would be practical if nr

can be polynomial to r, l, lg N , for example, r can be re-garded as a constant. In fact, Tables 2 and 3 suggest that

– 87 –


Table 4. Specific parameters for the r-Rainbow scheme with N =65537(≃ 216).

r 5 6 7 7 8 9

l 5 3 2 4 3 2

security[bit] 80 112

sig[bit] 400 288 224 448 384 288

sk[kB] 8.44 3.40 1.57 11.48 7.39 3.06

pk[kB] 13.71 4.45 1.64 17.84 10.16 3.34

keygen[ms] 0.75 0.33 0.19 0.95 0.62 0.28

signing[ms] 7.92 3.16 1.43 10.73 6.89 2.78verification[ms] 7.88 3.08 1.36 11.08 7.14 2.81

our attack is practical if r, l satisfying (rl)(r+1) ≤ 218.

However, if r or l is large, our attack would be imprac-tical. For example, for the parameters r = 4, l = 11, wehave n3(r+1) ≈ 282, so our attack would not be efficientagainst the r-Rainbow scheme in this case.

5. Selection of parameters

In this subsection, we remark on the security of theHS scheme and suggest specific parameters for the HSscheme. We analyzed the complexity of our attackagainst the HS scheme in Section 3. This shows that our

attack is efficient under the condition that (rl)(r+1) ≤

218, that is, the parameters r, l are small. In contrast,many attacks against the Rainbow-type scheme wereproposed. These attacks are collectively called “rank at-tack”. For more information, see [15]. The complexity ofrank attack is O(Nrn4 lg2 N). This shows that rank at-tack is efficient under the condition that the parametersN, r are small.

We propose specific parameters for the HS scheme asfollows. The columns “sig”, “pk” and “sk” mean the sizeof signature, public-key and secret-key, respectively. Thesecurity of these parameters is based on above cryptanal-ysis. For example, for the parameters r = 5, l = 5, N =65537, we have n3(r+1) ≃ 283.6, Nrn4 ≃ 298.6. So weconsider the HS scheme corresponding to the parametersatisfies almost 80 bit security, that is, it would achievethe similar security level to 1024 bit RSA. The bit sizes ofsignature, public-key and secret-key are taken as possiblemaximum sizes based on the commutative HS scheme.The columns “keygen”, “signing” and “verification” areexperimental results implemented on Magma. Althoughthese discussions are very roughly, this might be a cer-tain guideline for setting parameters for the HS schemeor the Rainbow-type scheme.

Note that we discussed not the original HS scheme butthe r-Rainbow scheme (the commutative HS scheme)only. So we need to promote research about construct-ing good non-commutative ring for the HS scheme. Forexample, the parameter N = 65537 is small prime, sowe cannot use non-commutative OSS scheme. On theother hand, efficiency, especially key size, should be con-sidered on. Finding how to generate keys for the originalHS scheme is challenging future work.

6. Conclusion

We proposed an attack against the Hashimoto-Sakuraischeme. Our proposed attack is a polynomial-time algo-

rithm with respect to its input sizes r, l, lg N under thecondition that nr = (rl)

ris a polynomial in n and lg N .

Also, we discussed its efficiency of the attack and showed

that it is practical if (rl)(r+1) ≤ 218 by using some exper-

iments. In our attack, firstly we reduce the HS schemeto some commutative scheme. Then, we select r linearequations randomly, and solve a public-key relation withadded these equations by using Grobner bases algorithm.Note that not all the HS scheme are broken, namely, our

algorithm would not work efficiently if (rl)(r+1)

is large.It implies that the scheme would be secure in the casethat N, l are not small and r is large, for example, the

case that (rl)(r+1)

> 227 and Nr(rl)4 ≥ 280. Investi-

gating security of the scheme for the HS scheme withspecific parameters is our future work.

References

[1] H. Ong, C. P. Schnorr and A. Shamir, An efficient signature

scheme based on quadratic equations, in: Proc. of 16th ACMSymp. Theory Comp., pp. 208–216, 1984.

[2] A. Shamir, Efficient signature schemes based on birationalpermutations, in: Proc. of Crypto ’93, 1994; Lect. Notes inComput. Sci., Vol. 773, pp. 1-12, Springer-Verlag, 1993.

[3] S.Tsujii, K.Kurosawa, T. Itoh, A.Fujioka and T.Matsumoto,A public-key cryptosystem based on the difficulty of solvinga system of non-linear equations (in Japanese), IEICE Trans.Inf. & Syst., J69-D (1986), 1963–1970.

[4] T. Satoh and K. Araki, On construction of signature schemeover a certain non-commutative ring, IEICE Trans. Funda-

mentals, E80-A (1997), 40–45.[5] J. M. Pollard and C. P. Schnorr, An efficient solution of the

congruence x2 + ky2 ≡ m( (mod n)), IEEE Trans. Inform.

Theory, 33 (1987), 702–709.[6] D. Coppersmith, J. Stern and S. Vaudenay, The security of

the birational permutation signature scheme, J. Cryptology,10 (1997), 207–221.

[7] D.Coppersmith, Weakness in quaternion signatures, in: Proc.of Crypto ’99, Lect. Notes in Comput. Sci., Vol. 1666, pp. 305–314, Springer-Verlag, 1999.

[8] Y. Hashimoto and K. Sakurai, On construction of signature

schemes based on birational permutations over noncommuta-tive rings, presented at the 1st Int. Conf. on Symbolic Com-putation and Cryptography (SCC2008) held in Beijing, April2008; Cryptology ePrint, http://eprint.iacr.org/2008/340+.

[9] J. Patarin, Cryptanalysis of the Matsumoto and Imai publickey scheme of Eurocrypt’88, in: Proc. of CRYPTO ’95, Lect.Notes in Comput. Sci., Vol. 963, pp. 248–261, Springer-Verlag,

1995.[10] J. Ding and D. Schmidt, Rainbow, a new multivariable poly-

nomial signature scheme, Lect. Notes in Comput. Sci., Vol.3531, pp. 164–175, Springer-Verlag, 2005.

[11] N. T. Coutois, M. Daum and P. Felke, On the security ofHFE, HFEv- and Quartz, in: Proc.of PKC2003, Lect.Notes inComput. Sci., Vol. 2567, pp. 337–350, Springer-Verlag, 2003.

[12] J. Patarin, Hidden Field Equations (HFE) and Isomorphismsof Polynomials (IP): two new families of asymmetric algo-rithms, in: Proc. of EUROCRYPT ’96, Lect. Notes in Com-put. Sci., Vol. 1070, pp. 33–48, Springer-Verlag, 1996.

[13] Magma, http://magma.maths.usyd.edu.au/magma/.[14] J-C Faugere, A new efficient algorithm for computing

Grobner bases (F4), J. Pure Appl. Algebra, 139 (1999), 61–

68.[15] J. Ding, B-Y Yang, C-H O. Chen, M-S Chen and C-M Cheng,

New differential-algebraic attacks and reparametrization ofRainbow, Lect. Notes in Comput. Sci., Vol. 5037, pp. 242–257, Springer-Verlag, 2008.

– 88 –

JSIAM Letters Vol.2 (2010) p.89 c©2010 Japan Society for Industrial and Applied Mathematics

Erratum to “Cryptanalysis of the birational permutation

signature scheme over a non-commutative ring”

[JSIAM Letters, 2 (2010), 85–88]

Naoki Ogura1 and Shigenori Uchiyama1

1 Department of Mathematics and Information Sciences, Tokyo Metropolitan University, Tokyo192-0397, Japan

Received September 13, 2010, Accepted September 14, 2010

The name of the second author was wrongly printed as “Uchiyama Shigenori”, but it should be read as “ShigenoriUchiyama” as shown here.

– 89 –


Proposal and efficient implementation of multiple division

divide-and-conquer algorithm for SVD

Yutaka Kuwajima1, Youichiro Shimizu1 and Takaomi Shigehara1

1 Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama City, Saitama 338-8570, Japan

E-mail kuwa mail.saitama-u.ac.jp


Abstract

We propose a divide-and-conquer algorithm with multiple division for singular value decompo-sition (SVD). The algorithm turns out to be efficient for reducing the execution time in the casethat the deflation occurrence rate of the input matrix is low, which is exactly the case that thestandard divide-and-conquer algorithm (DC2-SVD) with division number two requires O(n3)arithmetic operations. Here n is the size of the input matrix. The comparison with DC2-SVDas well as another up-to-date algorithm I-SVD is made through numerical experiment.

Keywords singular value decomposition, divide-and-conquer method, multiple division


1. Introduction

Consider the singular value decomposition (SVD) ofan upper-bidiagonal matrix B ∈ Rn×(n+1) with diago-nals aj and subdiagonals bj (j = 1, . . . , n). The divide-and-conquer algorithm with division number k = 2(DC2-SVD) [1] is a well-established numerical algorithmfor this purpose. Although DC2-SVD keeps the numer-ical accuracy at high level, the numerical cost of DC2-SVD is not necessarily low, and it requires O(n3) arith-metic operations in a general case such that the deflationoccurrence rate is low. To overcome this, we propose, inthis paper, the multiple division divide-and-conquer al-gorithm (DCK-SVD) for SVD. DCK-SVD is an applica-tion of multiple division divide-and-conquer algorithmfor a real symmetric tridiagonal eigenproblem [2,3], andit can reduce the operation count to 3k/[2(k2−1)], com-pared to DC2-SVD, where k (≪ n) is the division num-ber in DCK-SVD. In numerical experiment, DCK-SVDis compared with DC2-SVD in LAPACK [4] as well asI-SVD [5], which is an up-to-date algorithm for SVD ofupper-bidiagonal matrices, introduced in a broad view ofapplying the integrable systems to numerical computa-tion.

2. Framework of DCK-SVD

Denote the set of l-dimensional real vectors and theset of l1 × l2 real matrices by Rl and Rl1×l2 , respec-tively. For simplicity, we assume that the column numbern + 1 of B is a multiple of the division number k (n + 1= mk, m ∈ N). To save the space, we write down someequations for k = 3. Generalization to general k is obvi-

ous. Let Il, e(l)j and 0l be the l-dimensional unit matrix,

its j-th column vector and l-dimensional zero vector, re-spectively. Without loss of generality, we assume that Bis irreducible (bj 6= 0; j = 1, . . . , n) in the following. Wealso assume that the singular values (SVLs) of B are

labeled in ascending order 0 < σ1 ≤ σ2 ≤ · · · ≤ σn.We first divide B into k blocks in the obvious way as

B ≡

B1

αT1 βT

1

B2

αT2 βT

2

B3

∈ Rn×(n+1), (1)

where αi = amie(m)m , βi = bmie

(m)1 (i = 1, . . . , k−1) and

Bi ∈ R(m−1)×m (i = 1, . . . , k). Suppose that we havethe SVD Bi = UiΣiV

Ti of Bi, where Σi = (Di,0m−1) ∈

R(m−1)×m with a diagonal matrix Di ∈ R(m−1)×(m−1),and Ui ∈ R(m−1)×(m−1), Vi ∈ Rm×m are orthogonalmatrices. Then, we have B = U ′MV ′T , where

M ≡

Σ1

α′T1 β′T

1

Σ2

α′T2 β′T

2

Σ3

∈ Rn×(n+1) (2)

with α′i = V T

i αi, β′i = V T

i+1βi (i = 1, . . . , k − 1), and

U ′ ≡ ⊕k−1

i=1 [Ui ⊕ (1)] ⊕ Uk ∈ Rn×n,

V ′ ≡⊕k

i=1 Vi ∈ R(n+1)×(n+1)(3)

are orthogonal matrices. So, if we find the SVD M =UΣV T of M , we obtain the SVD B = UΣV T of B,where U ≡ U ′U , V ≡ V ′V are orthogonal. The methodfor computing the SVD of M is separately discussed laterin this section.

Since Bi (i = 1, . . . , k) has a form similar to B, theSVD of Bi is obtained by a recursive usage of DCK-SVD,in principle. In the present implementation, however, weuse the DC2-SVD routine DBDSDC in LAPACK for thispurpose.

The problem of finding the SVD of M in (2) is equiv-alent to the positive semidefinite real symmetric eigen-

– 91 –

JSIAM Letters Vol. 2 (2010) pp.91–94 Yutaka Kuwajima et al.

problem of

MT M = D2 + SST ∈ R(n+1)×(n+1), (4)

where D ≡⊕k

i=1[Di ⊕ (0)] is diagonal and S ∈R(n+1)×(k−1) is determined from (2) in the obvious way.The matrix in (4) has such a form as a diagonal matrixplus low-rank perturbation, and its spectral decomposi-tion can be obtained within O(n2) arithmetic operationsby the method in [3].

2.1 Singular values of M

If the perturbation is of rank one (k = 2), then theeigenproblem for (4) is easily solved with high numericalaccuracy [6]. By repeating this procedure k − 1 times ina general case, we first obtain the eigenvalues λj ≥ 0of MT M , leading to the SVLs σj =

√

λj of M (j =1, . . . , n).

2.2 Singular vectors of M

To obtain the eigenvectors of MT M , define

F (λ) ≡

(

F (λ) ST WWT S WT (λIn+1 − D2)W

)

, λ ∈ R (5)

with F (λ) ≡ Ik−1−ST (In+1−WWT )(λIn+1−D2)−1S,

where W ≡ (e(n+1)i(1) , . . . , e

(n+1)i(s) ) with s distinct positive

integers i(1), . . . , i(s) equal to or less than n + 1. If λis an eigenvalue of MT M (λ = λj), the matrix F (λj)becomes singular, and the eigenvector corresponding tothe eigenvalue λj of MT M (the right singular vector(SVC) corresponding to the SVL σj of M) is given by

vj = [(In+1 − WWT )(λjIn+1 − D2)−1S,−W ]xj (6)

with xj ∈ Ker F (λj) (xj 6= 0k+s−1). The computationof xj is carried out by the inverse power method. Onthe method for a choice of W in F (λj) for keeping thenumerical accuracy, see [3]. The left SVC correspondingto the SVL σj is given by

uj = σ−1j Mvj = σ−1

j [Dvj + (e(n)m , . . . , e

(n)(k−1)m, O)xj ]

(7)

with D ≡ ⊕k−1

i=1 [Di ⊕ (0)] ⊕ Dk, where the secondequality follows from (5) and (6).

3. Technical notes on implementation

3.1 Removal of SVLs σ ≃ 0 of B

Since B is irreducible, B does not have zero SVL. How-ever, it might have an arbitrarily tiny SVL in general.Such tiny SVLs are largely removed by the followingprocedure, where c1 is a small cut-off parameter.

1) By applying successive Givens transformations fromthe right, B is reduced to an upper-bidiagonal formwith the zero vector in the last column. This corre-sponds to RQ decomposition of B; B = RQr withupper-bidiagonal matrix R ∈ Rn×(n+1) and orthog-onal matrix Qr ∈ R(n+1)×(n+1).

2) Define R′ ∈ Rn×(n−1) by removing the first and thelast columns from R.

3) By applying successive Givens transformations fromthe left, R′ is reduced to a lower-bidiagonal form

with the zero vector in the last row. This corre-sponds to QL decomposition of R′; R′ = QlL

′ withorthogonal matrix Ql ∈ Rn×n and lower-bidiagonalmatrix L′ ∈ Rn×(n−1).

4) Define

B ≡ QTl BQT

r =

(

B′ 0n−1

bn,1e(n)T1 0

)

∈ Rn×(n+1),

where B′ ∈ R(n−1)×n is upper-bidiagonal. In gen-eral, fill-in might occur only in the bottom-left ele-ment of B by this procedure.

5) If |bn,1| < c1‖B‖F , set bn,1 = 0. Here ‖ · ‖F is theFrobenius norm.

Repeating 1)–5) for the output B′ successively, we canfurther remove tiny SVLs from B. Note that though thesmallness of the fill-in indicates the existence of a smallSVL for B, the converse is not true. As a result, all tinySVLs cannot be removed by the above procedure in gen-eral.

3.2 Deflation

By using suitable permutation matrices Pu ∈ Rn×n

and Pv ∈ R(n+1)×(n+1), M in (2) is reduced to

M1 = PTu MPv =

(

D′ OST

1 ST2

)

∈ Rn×(n+1), (8)

where S1 ≡ (s(1)i,j ) ∈ Rn1×(k−1), S2 ∈ Rk×(k−1) and

D′ ≡ diag(d′1, . . . , d′n1

) ∈ Rn1×n1 (0 < d′1 ≤ · · · ≤ d′n1)

with n1 = n − k + 1. The SVD of M1 is essentially thesame as that of M . So we consider M1 in the following.

It is called deflation to remove the trivial solutionsin the SVD of M1 in (8) (equivalently the eigenprob-lem of MT

1 M1 ∈ R(n+1)×(n+1)). First of all, MT1 M1

has eigenvalue zero, that is removed as follows: By us-ing QR decomposition S2 = Q2R2 with orthogonal ma-trix Q2 ∈ Rk×k and upper-triangular matrix R2 =(R′T

2 ,0k−1)T ∈ Rk×(k−1), we have

M1(In1⊕ Q2) =

(

D′ O 0n1

ST1 R′T

2 0k−1

)

≡ (M2,0n).

Hence the (right) SVC corresponding to zero SVL of M1

is (0Tn1

, qT2;k)T with the last column q2;k of Q2, and we

are left with M2. Deflation for M2 occurs when thereexists l (1 ≤ l ≤ n1) such that s

(1)l,1 = · · · = s

(1)l,k−1 = 0.

In this case, M2 has a SVL d′l with the corresponding

left and right SVCs e(n)l for both. If this condition is

satisfied n−r times, then by using suitable permutationmatrices Qu, Qv ∈ Rn×n, M2 is transformed to

QTu M2Qv =

(

D′1 O

S′T1 R′T

2

)

⊕ D′2 ≡ M3 ⊕ D′

2,

where D′1 ∈ Rn2×n2 , D′

2 ∈ R(n−r)×(n−r) are positivedefinite diagonal matrices with n2 = r − k + 1. Thus,after the deflation, we are left with the r × r nontrivialpart M3. We call δ ≡ 1 − r/n the deflation occurrence

rate (DOR) in the following.Note that the SVD of M1 also has a trivial solution

in case of d′j = d′j+1 = · · · = d′j+k−1 for some j (j =1, . . . , n1 − k + 1). However, since it is rare that this

– 92 –


condition is satisfied, we do not take into account thistype of deflation in the present implementation.

3.3 Left SVCs corresponding to σ ≃ 0

A direct usage of (7) for computing left SVCs for tinySVLs causes loss of numerical accuracy. In general, theleft SVC for a SVL σ of M3 is equivalent to the eigen-vector associated with the eigenvalue σ2 of

M3MT3 =

(

D′21 D′

1S′1

S′T1 D′

1 S′T1 S′

1 + R′T2 R′

2

)

∈ Rr×r.

Hence the left SVC of a tiny SVL σ can be found fromKer (σ2Ir−M3M

T3 ), that can be computed by a method

analogous to the method in Subsec. 2.2 for computingthe eigenvectors of a real diagonal matrix plus low-rankperturbation.

3.4 SVCs corresponding to multiple SVLs

If l neighboring SVLs σj , σj+1, . . . , σj+l−1 for some j(l = 2 in most cases) are close to each other, the com-putation of kernels of F (σ2

j ), F (σ2j+1), . . . , F (σ2

j+l−1)causes a serious loss of numerical accuracy. As a result,the corresponding SVCs become nearly parallel to eachother. In this case, together with the average σ′

j of theneighboring SVLs, these kernels are numerically approx-imated by computing the eigenvectors corresponding tothe l smallest eigenvalues of F (σ′2

j ), that are obtainedby applying the simultaneous inverse iteration with or-thogonalization to F (σ′2

j ).

3.5 Reorthogonalization

In the final stage in the computation of SVCs, weneed a reorthogonalization process, if necessary. Whenwe compute the inner products among SVCs, we use thetechnique in [3], which makes it possible to complete thecomputation of the inner products for all pairs of SVCswithin O(n2) arithmetic operations. According to theresults of the inner products, we divide the SVCs intoseveral groups, following the criteria:

• The number of groups is maximal under the condi-tion that the following two conditions are satisfied.

• Both for left and right, the SVCs in each group areorthogonal to those in other groups.

• For any pair of left and right SVCs in each group,at least one of them is not orthogonal to some SVCin the same group.

After grouping, we find the SVD of M3 restricted to thesubspaces spanned by the SVCs belonging to each group.More precisely, let u1, . . . , ul and v1, . . . , vl be the leftand right SVCs belonging to one of the groups. Togetherwith orthonormal bases w1, . . . , wl of span(u1, . . . , ul)and z1, . . . , zl of span(v1, . . . , vl), define

L ≡ (w1, . . . , wl)T M3(z1, . . . , zl),

and compute the SVD

L = (p1, . . . , pl)Σ′(q1, . . . , ql)

T

of L. Then

uj = (w1, . . . , wl)pj ,

vj = (z1, . . . , zl)qj

(j = 1, . . . , l)

give the reorthogonalized SVCs with high numerical pre-cision.

4. DCK-SVD algorithm

From Secs. 2 and 3, we obtain the DCK-SVD algo-rithm. In the main algorithm (proc dcksvd) below, procrksvd is a subroutine to compute the SVD M = UΣV T

of M .

proc dcksvd(k, B, Σ, U , V )Remove SVLs σ ≃ 0 from B by using the cut-offparameter c1. (Subsec. 3.1)Divide B as in (1).for i = 1 to k

call dc2svd(Bi, Σi, Ui, Vi), where dc2svd isthe DBDSDC routine in LAPACK to computethe SVD Bi = UiΣiV

Ti .

end for

call rksvd(k, M , Σ, U , V ).Define U ′ and V ′ by (3).

Compute U ≡ U ′U and V ≡ V ′V .return Σ, U , V

proc rksvd(k, M , Σ, U , V )Perform deflation. (Subsec. 3.2)Compute σj (j = 1, . . . , n). (Subsec. 2.1)Compute uj , vj (j = 1, . . . , n). (Subsec. 2.2)(If σj/σn < c2, then compute uj followingSubsec. 3.3.)for j = 1 to n − 1

l = 1while j + l ≤ n and |(uj , uj+l)| > c3 do

l = l + 1end whileif l ≥ 2 then

Compute the left SVCscorresponding to σj , . . . , σj+l−1. (Subsec. 3.4)

end ifend forfor j = 1 to n − 1

l = 1while j + l ≤ n and |(vj , vj+l)| > c3 do

l = l + 1end whileif l ≥ 2 then

Compute the right SVCscorresponding to σj , . . . , σj+l−1. (Subsec. 3.4)

end ifend forPerform reorthogonalization, if necessary. (Subsec. 3.5)

return Σ, U , V

5. Numerical experiment

Numerical environment is as follows: CPU: Intel Corei7-960 3.2GHz, memory: 12GB, OS: Ubuntu Linux 9.10(kernel 2.6.31), LAPACK: version 3.2.1, BLAS: ATLASversion 3.6.0-22 Ubuntu2, and I-SVD: version 0.4.5. Weset the cut-off parameters to c1 = 8 × 10−16, c2 = 10−6

and c3 = 0.9 in DCK-SVD. We use DBDSDC routine inLAPACK for DC2-SVD. For I-SVD, we use GotoBLAS2,and DBDSLV routine is performed on a single thread.

– 93 –


Test matrices are the following three types:

• L-DOR: Upper-bidiagonal matrix B = (0n, BL) ∈Rn×(n+1) with BL ∈ Rn×n such that T − λmin(T )In = BLBT

L , where T is the real symmetric tridiag-onal matrix obtained by tridiagonalizing a real sym-metric dense matrix with normal random numbersof mean 0 and variance 1 in elements, and λmin(T )is the minimum eigenvalue of T . L-DOR is basedon a physical model which shows so-called quantumchaos [7].

• H-DOR1: aj (j = 1, . . . , n) are uniform randomnumbers in the interval (−2, 2], while bj (j =1, . . . , n) are uniform random numbers in the in-terval (−1, 1].

• H-DOR2: aj and bj (j = 1, . . . , n) are uniformrandom numbers in the interval (−1, 1].

Matrix size is n = 5000. On the execution time as wellas numerical errors, we take an average of 16 examplesfor each type. The DOR is low (δ = 0) for L-DOR, whilethe other two have high DORs (δ = 0.92 for H-DOR1and δ = 0.95 for H-DOR2, on average).

Figs. 1 and 2 show the dependence of execution timeby DCK-SVD on the division number k for L-DOR andH-DOR1, respectively. For comparison, the executiontimes by DC2-SVD and I-SVD are also shown. For L-DOR, the DOR is low and as a result, a large divisionnumber is preferable in DCK-SVD. With the optimaldivision number k = 32, DCK-SVD (11.4 sec) is fasterthan DC2-SVD (32.5 sec) and I-SVD (15.8 sec). For H-DOR1, the DOR is high and as a result, the small di-vision number is preferable in DCK-SVD. The optimaldivision number is k = 8, with which the execution time(2.35 sec) by DCK-SVD is comparable to that (2.74 sec)by DC2-SVD. The execution time (13.8 sec) by I-SVDdoes not depend on the DOR largely. H-DOR2 has alsoa high DOR, and the dependence of execution time byDCK-SVD on the division number k is the same as inFig. 2. The execution time by DCK-SVD is 4.30 sec withthe optimal division number k = 4, while the executiontimes by DC2-SVD and I-SVD are 2.86 sec and 14.6 sec,respectively. Table 1 shows orthogonal errors

ǫOU = max1≤i≤n

‖UT ui − e(n)i ‖2,

ǫOV = max1≤i≤n+1

‖V T vi − e(n+1)i ‖2

for left and right SVCs as well as the residual

ǫR = max1≤i≤n

‖Bui − σivi‖2

‖B‖2,

when L-DOR, H-DOR1 and H-DOR2 are solved byDCK-SVD, DC2-SVD and I-SVD, respectively. ForDCK-SVD, we use the optimal division number for eachtype. Table 1 shows that DCK-SVD has almost the samenumerical accuracy as I-SVD.

In case of H-DOR2 of matrix size n = 3000, 4000, weobserved several cases that tiny SVLs induce the orthog-onal errors beyond O(10−11). In all cases, however, theyare suppressed below O(10−11) by adjusting the value ofc1 in the range c1 ∈ [8 × 10−16, 5 × 10−11], according tothe input matrices.

10

15

20

25

30

35

40

45

50

2 20 40 60 80 100 120

Tim

e [s

ec]

Division Number k

DCK-SVDDC2-SVD

I -SVD

Fig. 1. Dependence of execution time for L-DOR on the division

number k in DCK-SVD.

0

2

4

6

8

10

12

14

2 10 20 30 40 50 60

Tim

e [s

ec]

Division Number k

DCK-SVDDC2-SVD

I -SVD

Fig. 2. Dependence of execution time for H-DOR1 on the divi-sion number k in DCK-SVD.

Table 1. Numerical errors.

L-DOR ǫOU ǫOV ǫR

DCK-SVD 1.17E−12 1.17E−12 1.27E−13

DC2-SVD 1.13E−14 1.55E−14 5.92E−15

I-SVD 2.54E−12 2.54E−12 1.26E−14

H-DOR1 ǫOU ǫOV ǫR

DCK-SVD 8.60E−13 4.17E−13 1.03E−13

DC2-SVD 4.55E−15 4.00E−15 9.14E−15

I-SVD 8.25E−13 8.25E−13 8.47E−15

H-DOR2 ǫOU ǫOV ǫR

DCK-SVD 2.74E−13 2.23E−13 3.10E−14

DC2-SVD 4.07E−15 4.14E−15 9.00E−15

I-SVD 1.68E−12 1.60E−12 9.71E−13

Acknowledgments

We are grateful to the anonymous referee for helpfulcomments, which served to improve the quality of thispaper. This work was partially supported by Grant-in-Aid for Scientific Research (C) No.19560058.

References

[1] M. Gu, J. Demmel and I. Dhillon, Efficient computation of thesingular value decomposition with applications to least squaresproblems, Tech.Rep., UT-CS-94-257, Univ. of Tennessee, 1994.

[2] Y. Kuwajima and T. Shigehara, An extension of divide-and-conquer for real symmetric tridiagonal eigenproblem (inJapanese), Trans. JSIAM, 15 (2005) 89–115.

[3] Y. Kuwajima and T. Shigehara, An improvement of multi-ple division divide-and-conquer for real symmetric tridiagonaleigenproblem (in Japanese), Trans. JSIAM, 16 (2006) 453–480.

[4] E. Anderson et al., LAPACK Users’ Guide, Third Edition,SIAM, Philadelphia, 1999.

[5] H. Toyokawa, K. Kimura, M. Takata and Y. Nakamura, Onparallelism of the I-SVD algorithm with a multi-core processor,JSIAM Letters, 1 (2009) 48–51.

[6] M. Gu and S. C. Eisenstat: A divide-and-conquer algorithm

for the symmetric tridiagonal eigenproblem, SIAM J. MatrixAnal. Appl., 16 (1995) 172–191.

[7] M. L. Mehta, Random Matrices, Third Edition, Pure and Ap-plied Mathematics, 142, Elsevier/Academic Press, 2004.

– 94 –


Box-ball systems related to the nonautonomous

ultradiscrete Toda equation on the finite lattice

Kazuki Maeda1 and Satoshi Tsujimoto1

1 Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501, Japan

E-mail kmaeda amp.i.kyoto-u.ac.jp, tujimoto i.kyoto-u.ac.jp

Received April 10, 2010, Accepted May 25, 2010

Abstract

Ultradiscrete analogues of the nonautonomous discrete finite Toda lattice and its modifiedsystem are explicitly given. It is shown that these two systems can be seen as the box-ballsystem with size limits and one with speed limits, respectively. Particular solutions and theLax forms of these ultradiscrete systems on the finite lattice are also discussed.

Keywords ultradiscrete Toda lattice, shifted qd algorithm, box-ball system with a carrier


1. Introduction

Since the discovery of the ultradiscretization [1], muchprogress has been made in the understanding of ultradis-crete integrable systems. The ultradiscretization is a pro-cedure to construct an ultradiscrete system from a dis-crete system. The procedure replaces operators (+,×,÷)with (min,+,−) by using the formula:

limǫ→+0

ǫ log(e−A/ǫ + e−B/ǫ) = −min(A,B).

Applying the ultradiscretization to discrete integrablesystems, we can obtain ultradiscrete integrable systems,which have properties of original systems: the existenceof soliton solutions, many conserved quantities, and soon.

The box-ball system (BBS) is one of the ultradiscreteintegrable systems. The BBS is composed of an array ofinfinite boxes, finite balls in the boxes, and a carrier ofballs. Each box can contain only one ball and the carriercan hold infinitely many balls. The evolution rule fromtime t to time t + 1 is defined as follows. The carriermoves from left to right and passes each box. When thecarrier passes a box containing a ball, the carrier gets theball; when the carrier passes an empty box, if the carrierholds balls, the carrier puts one ball into the box.

An evolution equation of the BBS is given by

U (t+1)n = min

(

1 − U (t)n ,

n−1∑

k=−∞

(U(t)k − U

(t+1)k )

)

, (1)

where the variable U(t)n ∈ 0, 1 denotes the number of

balls in the nth box at time t. The equation (1) is derivedfrom the discrete KdV equation via the ultradiscretiza-tion [2].

Nagai et al. [3] discovered that the ultradiscrete Toda(u-Toda) lattice

Q(t+1)n = min

(

E(t)n+1,

n∑

k=0

Q(t)k −

n−1∑

k=0

Q(t)k

)

, (2a)

1111

11

11

2 7 4 8 52 5 4 7 52 3 4 6 53 2 3 5 54 3 2 3 55 4 3 2 35 5 4 3 2

11111111

1111111111

111111111111111

1111

1111

1111111

1111111111111111

11111t=0123456

Q0E1Q1E2Q2Q0

(0)Q1

(0)Q2E1

(0)E2

(0) (0)

Fig. 1. Example of the BBS and the u-Toda lattice. ‘1’ denotesa ball in a box and ‘.’ denotes an empty box.

E(t+1)n = E(t)

n + Q(t)n − Q

(t+1)n−1 , (2b)

with non-periodic finite lattice condition E(t)0 = E

(t)N =

+∞ gives another evolution equation of the BBS. The

variables Q(t)n , E

(t)n and the constant N denote the size of

the nth soliton (the number of the balls of the nth block)at time t, the distance between the (n−1)th soliton andthe nth soliton at time t, and the number of the soli-tons, respectively. Fig. 1 illustrates the correspondencebetween the BBS and the u-Toda lattice.

Recently, the nonautonomous version of the discreteToda lattice(nd-Toda lattice) has been well studied [4,5].In this letter, we derive an ultradiscrete analogue of thend-Toda lattice and study its relation to the BBS. Fur-thermore, from the BBS with a carrier, we constructan extended system of the nd-Toda lattice. These corre-spondences can be understood from the viewpoint of theLax form, or the shifted qd (LR) algorithm in numericalmethods.

2. Ultradiscretization of the nd-Toda lat-

tice and BBS with size limits

The nd-Toda lattice is given by

q(t+1)n + e(t+1)

n = q(t)n + e

(t)n+1 + µt+1 − µt, (3a)

q(t+1)n−1 e(t+1)

n = q(t)n e(t)

n , (3b)

– 95 –

JSIAM Letters Vol. 2 (2010) pp.95–98 Kazuki Maeda et al.

where the independent variables n and t are integer,and µt is a nonautonomous parameter of time t. Letus rewrite the nd-Toda lattice (3) in the form of theevolution equation:

q(t+1)n = e

(t)n+1 + q(t)

n − e(t+1)n + µt+1 − µt, (4a)

e(t+1)n = e(t)

n

q(t)n

q(t+1)n−1

. (4b)

The equations (4) is known as Rutishauser’s shifted qd(qds) algorithm [6], which computes matrix eigenvaluesor singular values. Since the equation (4a) includes asubtraction, rounding errors arise in the process of theshifted qd algorithm and the accuracy of the numericalresults is degraded. Also in the ultradiscretization pro-cedure, the subtraction is considered to be an obstacle.

To overcome this obstacle, let us introduce an auxil-iary variable

d(t+1)n := q(t)

n − e(t+1)n + µ′

t+1 = q(t+1)n − e

(t)n+1,

where µ′t+1 := µt+1 −µt. Then we obtain a subtraction-

free evolution equation of the nd-Toda lattice:

q(t+1)n = e

(t)n+1 + d(t+1)

n , e(t+1)n = e(t)

n

q(t)n

q(t+1)n−1

, (5a)

d(t+1)n = d

(t+1)n−1

q(t)n

q(t+1)n−1

+ µ′t+1. (5b)

The equations (5) appear in the differential qds (dqds)algorithm [7], which is known to be more stable thanthe qds algorithm. We can ultradiscretize the nd-Toda

lattice of the form (5) if the variables q(t)n , e

(t)n , d

(t)n and

the parameter µ′t are non-negative for all n and t. In-

troduce variables Q(t)n , E

(t)n , D

(t)n and a parameter M ′

t

by q(t)n = e−Q(t)

n/ǫ, e

(t)n = e−E(t)

n/ǫ, d

(t)n = e−D(t)

n/ǫ and

µ′t = e−M ′

t/ǫ. Taking a limit ǫ → +0, we have

Q(t+1)n = min(E

(t)n+1,D

(t+1)n ), (6a)

E(t+1)n = E(t)

n + Q(t)n − Q

(t+1)n−1 , (6b)

D(t+1)n = min(D

(t+1)n−1 + Q(t)

n − Q(t+1)n−1 ,M ′

t+1). (6c)

We call the system (6) the nonautonomous ultradiscreteToda (nu-Toda) lattice. We should remark that if M ′

t =+∞ for all t, the nu-Toda lattice (6) reduces to the u-Toda lattice (2).

Let us consider the finite lattice condition E(t)0 =

E(t)N = +∞, and suppose that Q

(t)n , E

(t)n , D

(t)n and M ′

t

are positive integer for all n and t. Then the nu-Toda lat-tice (6) connects to the BBS, where the variables denotethe following quantities:

• Q(t)n : the size of the nth soliton at time t.

• E(t)n : the distance between the (n−1)th soliton and

the nth soliton at time t.• D

(t)n : the number of balls which the carrier holds

just after the carrier passes the nth soliton of timet − 1.

Fig. 2 shows an example of this connection. We can ob-serve that several balls vanish from time 1 to time 2 and

1111

11

11

11

111 111 2 6 3 5 32 5 3 5 32 4 3 5 32 3 3 5 33 2 3 4 44 3 2 3 45 4 3 2 35 5 4 3 2

111111

111111

111111111111

1111

1111

111111

111111111111111

11111t=01234567

Q0 E1Q1E2Q2

Fig. 2. Example of the BBS with size limits and the nu-Toda

lattice. Initial values are same as in Fig. 1 and the parameterM ′

tis chosen as M ′

t= +∞ for 0 ≤ t ≤ 1, M ′

t= 4 for 2 ≤ t ≤ 3

and M ′

t= 3 for 4 ≤ t.

1111

11

11

11

111 111 2 4 5 4 42 3 5 4 42 2 5 4 43 2 4 4 44 2 3 4 45 3 2 3 45 4 3 2 35 5 4 3 2

1111111

11111

11111

11111111111

1111

11111111

1111

1

1111111

111111111111111

11111t=01234567

Q0E1Q1E2Q2

Fig. 3. Example of the BBS with speed limits and the modifiednu-Toda lattice. The carrier capacity is chosen same as Fig. 2.

If the excess balls (shown in boldface) are removed, the statesare equal to the case of size limits (Fig. 2).

11 1111

111

111 1111 1111

111

11

11

111 2 2 5 4 4

2 4 3 5 3

3 2 4 4 4

(Qn , En

)(t+1) (t+1)

(Qn , En

)(t+1) (t+1)

(Qn , En

)(t) (t)

Fig. 4. Illustration of the construction of the modified nu-Todalattice. This figure shows the evolution from time 4 to time 5 in

Fig. 3.

from time 3 to time 4, and the size of the solitons is re-stricted to M ′

t at each time t. Indeed, the equation (6c)gives a new evolution rule from time t to time t + 1: ifthe number of balls which the carrier holds exceeds thecapacity M ′

t+1, the excess balls are removed from thesystem. We call this system the BBS with size limits.

3. The modified nu-Toda lattice and the

BBS with speed limits

In this section, we discuss the well-known BBS witha carrier [8], which we call the BBS with speed limits inthis letter. Fig. 3 shows the evolution of the BBS withspeed limits. We find that the evolution rule is composedof the “size limit” process and the “recovery” process.The first one is studied in Section 2 and the latter one isthe process which recovers the balls removed by the sizelimit process as illustrated in Fig. 4. These two processesare expressed with the ultradiscrete system

Q(t+1)n = min(E

(t)

n+1,D(t+1)n ), (7a)

E(t+1)n = E

(t)

n + Q(t)

n − Q(t+1)n−1 , (7b)

D(t+1)n = min(D

(t+1)n−1 + Q

(t)

n − Q(t+1)n−1 ,Mt+1), (7c)

Q(t+1)

n = Q(t+1)n + D

(t+1)

n , (7d)

E(t+1)

n = E(t+1)n − D

(t+1)

n , (7e)

D(t+1)

n = max(0,D(t+1)n−1 + Q

(t)

n − Q(t+1)n−1 − Mt+1). (7f)

– 96 –


The equations (7a)–(7c), which coincide with the nu-Toda lattice (6), and (7f) describe the size limit pro-

cess (Q(t)

n , E(t)

n ) 7→ (Q(t+1)n , E

(t+1)n ,D

(t+1)

n ), where the

auxiliary quantity D(t+1)

n is defined as the numberof balls removed by the size limit process. Then theequations (7d) and (7e) describe the recovery process

(Q(t+1)n , E

(t+1)n ,D

(t+1)

n ) 7→ (Q(t+1)

n , E(t+1)

n ). Thus the

mapping (Q(t)

n , E(t)

n ) 7→ (Q(t+1)

n , E(t+1)

n ) with the finitelattice condition gives the evolution of the BBS withspeed limits. We call the system (7) the modified nu-Toda lattice.

It is easily shown that the modified nu-Toda lattice(7) is derived from the discrete system

q(t+1)n = e

(t)n+1 + d(t+1)

n , e(t+1)n = e(t)

n

q(t)n

q(t+1)n−1

, (8a)

d(t+1)n = d

(t+1)n−1

q(t)n

q(t+1)n−1

+ µt+1, (8b)

q(t+1)n = q(t+1)

n d(t+1)

n , e(t+1)n =

e(t+1)n

d(t+1)

n

, (8c)

d(t+1)

n =d(t+1)n−1 q

(t)n

d(t+1)n q

(t+1)n−1

, (8d)

through the ultradiscretization: variable transformations

q(t)n = e−Q(t)

n/ǫ, e

(t)n = e−E(t)

n/ǫ, d

(t)n = e−D(t)

n/ǫ, q

(t)n =

e−Q(t)

n/ǫ, e

(t)n = e−E

(t)

n/ǫ, d

(t)

n = e−D(t)

n/ǫ, µt = e−Mt/ǫ,

and a limit ǫ → +0. Eliminating d(t)n , we rewrite the

equations (8a) and (8b) as follows:

q(t+1)n + e(t+1)

n = q(t)n + e

(t)n+1 + µt+1,

q(t+1)n−1 e(t+1)

n = q(t)n e(t)

n .

Furthermore, we obtain the equations

q(t+1)n e(t+1)

n = q(t+1)n e(t+1)

n

and

q(t+1)n + e

(t+1)n+1

= q(t+1)n

(

1 −µt+1

d(t+1)n

)

+ e(t+1)n+1

(

1 +µt+1q

(t+1)n

d(t+1)n q

(t)n+1

)

= q(t+1)n + e

(t+1)n+1 − µt+1

q(t+1)n − e

(t)n+1

d(t+1)n

= q(t+1)n + e

(t+1)n+1 − µt+1.

Hence the system (8) is equivalent to the system

q(t+1)n + e(t+1)

n = q(t)n + e

(t)n+1 + µt+1, (9a)

q(t+1)n−1 e(t+1)

n = q(t)n e(t)

n , (9b)

q(t+1)n + e

(t+1)n+1 = q(t+1)

n + e(t+1)n+1 − µt+1, (9c)

q(t+1)n e(t+1)

n = q(t+1)n e(t+1)

n . (9d)

We call the system (9) the modified nd-Toda lattice. Inthe following sections, the integrability of this system

will be clarified by giving particular solutions and theLax form.

4. Particular solutions

In order to derive particular solutions, we introducebilinear equations related to (5) and (8). The followingtheorem is proved by using the Plucker relation.

Theorem 1 Let us consider the bilinear equations

τk,t−1n+1 τk+1,t+1

n−1 − τk,t−1n τk+1,t+1

n + τk,tn τk+1,t

n = 0, (10a)

τk,tn+1τ

k+1,tn − τk,t

n τk+1,tn+1 − µt−kτk,t−1

n+1 τk+1,t+1n = 0, (10b)

on the semi-infinite lattice: τk,t−1 = 0 and τk,t

0 = 1. A so-

lution to this system is given by the Hankel determinant

τk,tn =

∣

∣

∣ξ(t−k)k+i+j

∣

∣

∣

0≤i,j≤n−1, (11)

where ξ(t)n is an arbitrary function satisfying the disper-

sion relation

ξ(t+1)n = ξ

(t)n+1 + µt+1ξ

(t)n . (12)

Let us consider the following variable transformations:

q(t)n =

τ1,tn τ1,t+1

n+1

τ1,tn+1τ

1,t+1n

, q(t)n =

τ0,tn τ1,t+1

n+1

τ0,tn+1τ

1,t+1n

, (13a)

e(t)n =

τ1,tn+1τ

1,t+1n−1

τ1,tn τ1,t+1

n

, e(t)n =

τ0,tn+1τ

1,t+1n−1

τ0,tn τ1,t+1

n

, (13b)

d(t)n =

τ0,tn+1τ

1,tn

τ0,t−1n+1 τ1,t+1

n

, d(t)

n =τ0,tn τ1,t

n+1

τ0,tn+1τ

1,tn

. (13c)

Then (5) and (8) can be transformed to the bilinearequations

τ0,t−1n+1 τ1,t+1

n−1 − τ0,t−1n τ1,t+1

n + τ0,tn τ1,t

n = 0,

τ0,tn+1τ

1,tn − τ0,t

n τ1,tn+1 − µtτ

0,t−1n+1 τ1,t+1

n = 0,

which coincide with the bilinear equations (10) of thecase k = 0. Thus the Hankel determinant (11) gives so-lutions to (5) and (8) with the semi-infinite lattice con-dition.

To obtain solutions on the finite lattice, let us choose

the element ξ(t)n as

ξ(t)n =

N−1∑

i=0

wipni

t−1∏

j=0

(pi + µj+1), (14)

where pi and wi are some constants satisfying p0 > p1 >

· · · > pN−1. Then ξ(t)n satisfies the dispersion relation

(12) and the finite lattice condition τk,t−1 = τk,t

N+1 = 0holds. Substituting (14) into the Hankel determinant(11), τk,t

n is calculated as follows:

τk,tn =

∑

0≤r0<r1<···<rn−1≤N−1

[

∏

0≤i<j≤n−1

(pri− prj

)2

]

×n−1∏

i=0

[

wripk

ri

t−k−1∏

j=0

(pri+ µj+1)

]

, (15)

which has the “ultradiscretizable form”. Hence we canobtain solutions to (6) and (7) with the finite lattice

– 97 –


condition.

Theorem 2 A solution to the modified nu-Toda lattice

(7) with the finite lattice condition E(t)0 = E

(t)N = E

(t)

0 =

E(t)

N = +∞ is given by

Q(t)n = T 1,t

n − T 1,tn+1 + T 1,t+1

n+1 − T 1,t+1n , (16a)

E(t)n = T 1,t

n+1 − T 1,tn + T 1,t+1

n−1 − T 1,t+1n , (16b)

D(t)n = T 0,t

n+1 − T 0,t−1n+1 + T 1,t

n − T 1,t+1n , (16c)

and

Q(t)

n = T 0,tn − T 0,t

n+1 + T 1,t+1n+1 − T 1,t+1

n , (16d)

E(t)

n = T 0,tn+1 − T 0,t

n + T 1,t+1n−1 − T 1,t+1

n , (16e)

D(t)

n = T 0,tn − T 0,t

n+1 + T 1,tn+1 − T 1,t

n , (16f)

where

T k,tn = min

0≤r0<r1<···<rn−1≤N−1

n−1∑

i=0

Wri+ [2(n − 1 − i)

+ k]Pri+

t−k−1∑

j=0

min(Pri,Mj+1)

, (17)

Pi and Wi, i = 0, 1, . . . , N − 1, are some constants sat-

isfying P0 ≤ P1 ≤ · · · ≤ PN−1, which determine the size

of solitons and the phase of solitons, respectively.

Furthermore, a solution to the nu-Toda lattice (6)is also given by (16a)–(16c) and (17) with Mt =mink=1,...,t(M

′k).

For example, the solutions shown in Figs. 2 and 3 aregiven by setting P0 = 2, P1 = 4, P2 = 5, W0 = 9,W1 = 2, and W2 = 0.

5. Lax forms and asymptotic behaviours

Finally, we give the Lax forms of (3) and (9) with thefinite lattice condition. Let us introduce the bidiagonalmatrices

L(t) =

N−1∑

i=1

e(t)i Ui,i−1 +

N−1∑

i=0

Ui,i,

R(t) =

N−1∑

i=0

q(t)i Ui,i +

N−2∑

i=0

Ui,i+1,

where Ui,j = (δi,kδj,l)N−1k,l=0 is a matrix unit. Let L

(t)

and R(t)

denote the bidiagonal matrices whose elementsare overlined ones of L(t) and R(t), respectively. Thenthe modified nd-Toda lattice (9) can be written in thematrix form

L(t+1)R(t+1) = R(t)

L(t)

+ µt+1I, (18a)

R(t+1)

L(t+1)

= R(t+1)L(t+1) − µt+1I, (18b)

where I is an identity matrix of order N . On the BBSwith size/speed limits, the matrix equations (18a) and(18b) correspond to the size limit process and the recov-ery process, respectively. If the eigenvalues of the tridiag-

onal matrix L(0)

R(0)

are λ0, λ1, . . . , λN−1, one can prove

that the eigenvalues of L(t)R(t) and L(t)

R(t)

are λn + µt

and λn, n = 0, 1, . . . , N − 1, respectively. Remark that,

eliminating L(t)

and R(t)

, we obtain

L(t+1)R(t+1) = R(t)L(t) + µ′t+1I,

which is the Lax form of the nd-Toda lattice (3).Hereafter, we consider the asymptotic behaviour by

using the solutions given in Section 4. If the parameterµt is chosen as µt > −pN−1 for all t, from (13) and (15),the dependent variables of the discrete systems (3) and(9) have the asymptotic behaviour for sufficiently larget:

q(t)n ≃ pn + µt, q(t)

n ≃ pn, e(t)n ≃ 0, e(t)

n ≃ 0,

where the convergence speed of e(t)n and e

(t)n depends on

(pn + µt)/(pn−1 + µt) [9]. These results give a methodto compute the eigenvalues of a tridiagonal matrix. Theparameter µt is called the shift parameter used to accel-erate convergence in the dqds algorithm.

In the BBS with size/speed limits, these eigenvaluescorrespond to the size of solitons. Suppose that P0 ≤· · · ≤ Pm−1 < Mt ≤ Pm ≤ · · · ≤ PN−1 holds exceptfor the finite number of t. Then, from (16) and (17), thedependent variables of (6) and (7) have the asymptoticbehaviour for sufficiently large t:

Q(t)n = min(Pn,Mt) = Pn, Q

(t)

n = Pn,

E(t)n+1 → +∞ and E

(t)

n+1 → +∞ if Pn < Pn+1,

for n = 0, 1, . . . ,m − 1, and

Q(t)n = Mt, E

(t+1)n+1 = E

(t)n+1, E

(t+1)

n+1 = E(t)

n+1,

for n ≥ m, where E(t)n and E

(t)

n increase by min(Pn,Mt) − min(Pn−1,Mt) ≥ 0 in one step.

References

[1] T. Tokihiro, D. Takahashi, J. Matsukidaira and J. Satsuma,From soliton equations to integrable cellular automata througha limiting procedure, Phys. Rev. Lett., 76 (1996), 3247–3250.

[2] S. Tsujimoto and R. Hirota, Ultradiscrete KdV equation, J.Phys. Soc. Jpn., 67 (1998), 1809–1810.

[3] A. Nagai, D. Takahashi and T. Tokihiro, Soliton cellular au-

tomaton, Toda molecule equation and sorting algorithm, Phys.Lett. A, 255 (1999), 265–271.

[4] V. Spiridonov and A. Zhedanov, Discrete Darboux transfor-mations, the discrete-time Toda lattice, and the Askey-Wilson

polynomials, Methods Appl. Anal., 2 (1995), 369–398.[5] S. Tsujimoto, Determinant solutions of the nonautonomous

discrete Toda equation associated with the deautonomized dis-

crete KP hierarchy, J. Syst. Sci. Complex., 23 (2010), 153–176.[6] H. Rutishauser, Uber eine kubisch konvergente variante der

LR-transformation, Z. Angew. Math. Mech., 40 (1960), 49–54.[7] K. V. Fernando and B. N. Parlett, Accurate singular values and

differential qd algorithms, Numer. Math., 67 (1994), 191–229.[8] D. Takahashi and J. Matsukidaira, Box and ball system with

a carrier and ultradiscrete modified KdV equation, J. Phys. A:Math. Gen., 30 (1997), L733–739.

[9] P. Henrici, Applied and computational complex analysis, vol.1, John Wiley & Sons, 1974.

– 98 –


Hybridized discontinuous Galerkin method

with lifting operator

Issei Oikawa1

1 Graduate School of Mathematical Sciences, The University of Tokyo, Tokyo 153-8914, Japan

E-mail oikawa ms.u-tokyo.ac.jp


Abstract

In this paper, we propose a new hybridized discontinuous Galerkin method for the Poissonequation with homogeneous Dirichlet boundary condition. Our method has the advantagethat the stability is better than the previous hybridized method. We derive L

2 and H1 error

estimates of optimal order. Some numerical results are presented to verify our analysis.

Keywords discontinuous Galerkin method, hybridized method, error analysis


1. Introduction

The discontinuous Galerkin finite-element methods(DGFEMs) is one of the active research fields of nu-merical analysis in the last decade. They allow us touse discontinuous approximate functions across the el-ement boundaries and have the robustness to variationof element geometry. That is, we can utilize many kindof polynomials as approximate functions on elementsand many kind of polyhedral domains as elements si-multaneously. Consequently, DGFEM fits adaptive com-putations so that mathematical analysis as well as ac-tual applications has been developed for various prob-lems. For more details, we refer to [1–3]. However, thesize and band-widths of the resulting matrices can bemuch larger than those of the conventional FEM, whichis a disadvantage from the viewpoint of computationalcost. To surmount this obstacle, recently new class ofDGFEM, which is called hybridized DGFEMs, is pro-posed and analyzed by B. Cockburn and his colleagues;for example, see [4]. Thus, we introduce new unknownfunction Uh on inter-element edges and characterize itas the weak solution of a target PDE. We then obtainthe discrete system for Uh and the size of the systembecomes smaller. On the other hand, it should be keptin mind that DGFEM has another origin. Some class ofnonconforming and hybrid FEM’s, which are called hy-brid displacement method, use discontinuous functionsas approximate field functions; see for example [5, 6].In [7] and [8], F. Kikuchi and Y. Ando developed a vari-ant of the hybrid displacement one, and applied it toplate problems. Their approach enables one to use con-ventional element matrices and vectors. It, however, suf-fered from numerical instability and was not fully suc-cessful. Recently, the author and his colleagues proposeda new DGFEM that is based on the hybrid displacementapproach by stabilizing their old method and applied itto linear elasticity problems in [9]. A key point of ourmethod is to introduce penalty terms in order to ensurethe stability. We, then, carried out theoretical analysis

by using the 2D Poisson equation as a model problem,and gave some concrete finite element models with nu-merical results and observations in [10]. However, an is-sue still remains. The stability is guaranteed only whenthe penalty parameters are taken from a certain interval,and we know only the existence of such an interval anddo not know concrete information about it.

The purpose of this paper is to propose a new hy-bridized DGFEM that is stable for arbitrary penaltyparameters. Our strategy is to introduce the lifting op-erator and define the penalty term in terms of the liftingoperator. In order to state our idea as clearly as possi-ble, we consider the Poisson equation with homogeneousDirichlet condition:

−∆u = f in Ω, u = 0 on ∂Ω, (1)

where Ω is a convex polygonal domain and f ∈ L2(Ω).This paper is composed of six sections. In Section 2,

we introduce the triangulation and finite element spaces,and then describe the lifting operator. Section 3 is de-voted to the formulation of our proposed hybridizedDGFEM, and mathematical analysis including error es-timates is given in Section 4. In Section 5, we reportsome results of numerical computations and confirm ourtheoretical results. Finally, we conclude this paper inSection 6.

2. Preliminaries

2.1 Notation

Let Ω ⊂ Rn, for an integer n ≥ 2, be a convex polyg-

onal domain. We introduce a triangulation Th = Kof Ω in the sense [10], where h = maxK∈Th

hK and hK

stands for the diameter of K. That is each K ∈ Th isan m-polygonal domain, where m is an integer and candiffer with K. We assume that m is bounded from aboveindependently of a family of triangulations Thh, and∂K does not intersect with itself. Let Eh = e ⊂ ∂K :K ∈ Th be the set of all edges of elements, and letΓh =

⋃

K∈Th∂K. We define the so-called broken Sobolev

– 99 –

JSIAM Letters Vol. 2 (2010) pp.99–102 Issei Oikawa

space for k ≥ 0,

Hk(Th) = v ∈ L2(Ω) : v|K ∈ Hk(K), ∀K ∈ Th.

Let L20(Γh) = v ∈ L2(Γh) : v|∂Ω = 0. We introduce

the inner products

(u, v)K =

∫

K

uvdx for K ∈ Th,

〈u, v〉e =

∫

e

uvds for e ∈ Eh.

The usual m-th order Sobolev seminorm and norm onK are denoted by |u|m,K and ||u||m,K , respectively. Weuse finite element spaces:

Uh ⊂ H2(Th), Uh ⊂ L20(Γh).

In addition, we set Vh = Uh × Uh, and V (h) = H2(Th)×L2

0(Γh).

2.2 Lifting operators

We state the definition of the lifting operator whichplays a crucial role in our formulation and analysis. Tothis end, we fix K ∈ Th and e ⊂ ∂K for the time being,and set

Uh(K) = wh|K : wh ∈ Uh,

Uh(e) = wh|e : wh ∈ Uh.

Then, for any v ∈ L2(e), there exists a unique uh ∈Uh(K)n such that

(uh,wh)K = 〈v,wh · nK〉e, ∀wh ∈ Uh(K)n, (2)

where nK is the unit outward normal vector to ∂K. Thelifting operator Le,K : L2(e) → Uh(K)n is defined asLe,K(v) = uh. Thus,

(Le,K(v),wh)K = 〈v,wh · nK〉e, ∀wh ∈ Uh(K)n. (3)

Furthermore, we define L∂K =∑

e⊂∂K Le,K .

3. New hybridized DG scheme

This section is devoted to the presentation of our pro-posed hybridized DGFEM. Before doing so, we convertthe Poisson problem (1) into a suitable weak form (7).A key idea is to introduce unknown functions on inter-element edges. First, multiplying both the sides of (1) bya test function v ∈ Uh and integrating over each K ∈ Th,we have by the integration by parts

∑

K∈Th

((∇u,∇v)K − 〈nK · ∇u, v〉∂K) = (f, v). (4)

From the continuity of the flux, we have∑

K∈Th

〈nK · ∇u, v〉 = 0, ∀v ∈ L20(Γh). (5)

This, together with (4), implies∑

K∈Th

((∇u,∇v)K − 〈nK · ∇u, v − v〉∂K) = (f, v). (6)

Here we set, for u = (u, u) and v = (v, v) ∈ V (h),

ah(u,v) =∑

K∈Th

(∇u,∇v)K ,

bh(u,v) = −∑

K∈Th

〈nK · ∇u, v − v〉∂K .

Then, (6) is rewritten as

ah(u,v) + bh(u,v) = (f, v). (7)

Now we can state our hybridized DGFEM: find uh ∈ Vh

such that

BLh (uh,vh)

:= ah(uh,vh) + bh(uh,vh) + bh(vh,uh) + jh(uh,vh)

= (f, vh), ∀vh = (vh, vh) ∈ Vh. (8)

Here, the third term bh(vh,uh) of BLh is added in

order to symmetrize the scheme and the penalty termjh(uh,vh) is defined by

jh(u,v) =∑

K∈Th

(L∂K(u− u),L∂K(v − v))K

+∑

K∈Th

∑

e⊂∂K

∫

e

ηeh−1e (u− u)(v − v)ds,

with the penalty parameters ηe > 0, where he is thediameter of e.

4. Error estimates

In this section, we give a mathematical analysis of ourhybridized DGFEM. To this end, we introduce

|||v|||2 =∑

K∈Th

(

||∇v − L∂K(v − v)||20,K

+∑

e⊂∂K

ηe

he

||v − v||20,e

)

,

|||v|||2h =∑

K∈Th

(

|v|21,K +∑

e⊂∂K

ηe

he

||v − v||20,e

)

,

where ηe is a positive parameter for each e ∈ Eh.

Theorem 1 The bilinear form BLh satisfies the follow-

ing three properties.

(Consistency) Let u ∈ H2(Ω) ∩ H10 (Ω) be the exact

solution. For u = (u, u|Γh), we have

BLh (u,v) = (f, v), ∀v ∈ V (h).

(Boundedness)

|BLh (v,w)| ≤ |||v||||||w|||, ∀v,w ∈ V (h).

(Coercivity)

BLh (vh,vh) ≥ |||vh|||

2, ∀vh ∈ Vh.

Furthermore, the scheme (8) admits a unique solution

uh ∈ Vh for any f ∈ L2(Ω) and ηee.

Proof The consistency is trivial since u− u|Γh= 0 on

Γh. The coercivity is a direct consequence of the expres-sion

bh(v,w) = −∑

K

(∇v,L∂K(w − w))K .

Combining this with the Schwarz inequality, we imme-diately deduce the boundedness. Finally, the coercivity

– 100 –


implies the uniqueness of (8) and, hence, the system oflinear equations (8) admits a unique solution.

(QED)

As results of those three properties, we obtain the fol-lowing a priori error estimates in terms of ||| · |||.

Theorem 2 Let u = (u, u|Γh) ∈ V (h) with the exact

solution u ∈ H2(Ω)∩H10 (Ω) of the Poisson problem (1).

Suppose that Thh satisfies

τ ≤he

hK

, ∀K ∈ Th, ∀e ⊂ ∂K (9)

with some positive constant τ . Let uh = (uh, uh) ∈ Vh

be the solution of our HDG scheme (8) for an arbitrary

ηee, ηe > 0. Then, we have the error estimates

|||u − uh||| ≤ 2 infvh∈Vh

|||u − vh|||. (10)

Proof Let vh ∈ Vh be arbitrary. By Theorem 1, wehave

|||uh − vh|||2

≤ BLh (uh − vh,uh − vh) (Coercivity)

= BLh (u − vh,uh − vh) (Consistency)

≤ |||u − vh||| |||uh − vh|||, (Boundedness)

which implies that

|||uh − vh||| ≤ |||u − vh|||, ∀vh ∈ Vh. (11)

Using the triangle inequality, we have

|||u − uh||| ≤ |||u − vh||| + |||uh − vh|||

≤ 2|||u − vh|||.

From the above, it follows that

|||u − uh||| ≤ 2 infvh∈Vh

|||u − vh|||, (12)

which implies that the error of the approximate solutionis optimal in the norm ||| · |||.

(QED)

As is stated in [10], we assume that the following approx-imate properties: for v ∈ Hk+1(K) there exist positiveconstants Ce

k,s and Cfk,s such that

infvh∈Uh

|v − vh|s,K ≤ Cek,sh

k+1−sK |v|k+1,K , (13)

infvh∈Uh

|v − vh|s,e ≤ Cfk,sh

k+1/2−s

K |v|k+1,K . (14)

Then we have the error estimates in Theorem 2 are ac-tually of optimal order.

Theorem 3 Under the assumptions in Theorem 2 and

the approximate properties (13) and (14), we have, if

u ∈ Hk+1(Ω) ∩H10 (Ω),

|||u − uh||| ≤ Chk|u|k+1,Ω, (15)

||u− uh||0,Ω ≤ Chk+1|u|k+1,Ω. (16)

In order to prove Theorem 3, we need the following aux-iliary result.

Proposition 4 Let K ∈ Th and e ⊂ ∂K. Then we

have

||Le,K(v)||0,K ≤ C1h−1/2e ||v||0,e, ∀v ∈ L2(e). (17)

Proof In (3), taking wh = Le,K(v) yields

||Le,K(v)||20,K = (Le,K(v),Le,K(v))K

= 〈v,Le,K(v)〉e

≤ ||v||0,e||Le,K(v)||0,e. (18)

By the trace theorem, there exists C1 such that

||Le,K(v)||0,e ≤ C1h−1/2e ||Le,K(v)||0,K . (19)

Here C1 depends on Uh(K) and Uh(e). Combining (18)with (19), we obtain (17).

(QED)

Proof of Theorem 3 As a consequence of Proposi-tion 4, it can be proved that there exists a constant C2

such that

|||v||| ≤ C2|||v|||h, ∀v ∈ V (h). (20)

From (13) and (14), we have

infvh∈Vh

|||u − vh|||h ≤ Chk|u|k+1,Ω. (21)

Combining this with (20), we obtain (15). Next, we prove(16). Here we define ψ ∈ H2(Ω)∩H1

0 (Ω) as the solutionof the adjoint problem

−∆ψ = u− uh in Ω, ψ = 0 on ∂Ω. (22)

Let ψ = (ψ,ψ|Γh). Then, since BL

h is symmetric, wehave

BLh (v,ψ) = (u− uh, v), ∀v = (v, v) ∈ V (h). (23)

In particular, taking v = u − uh, we have for any ψh ∈Vh,

||u− uh||20,Ω ≤ BL

h (u − uh,ψ)

= BLh (u − uh,ψ −ψh)

≤ |||u − uh||||||ψ −ψh|||

≤ C2|||u − uh||||||ψ −ψh|||h.

From (13) and (14), it follows that

|||ψ −ψh|||h ≤ Ch|ψ|2,Ω. (24)

By the regularity of the adjoint problem, we have

|ψ|2,Ω ≤ C||u− uh||0,Ω. (25)

Thus we obtain (16).(QED)

Remark 5 In contrast to our previous results of [10],error estimates in Theorem 2 are valid for any positive

parameters ηe. This is one of the advantages of our hy-

bridized DGFEM.


We now present the numerical results of our methodfor the following Poisson equation:

−∆u = 2π2 sin(πx) sin(πy) in Ω,

u = 0 on ∂Ω,

where Ω is a unit square. We use uniform rectangularmeshes and Pk–Pk elements (k = 1, 2, 3). We computedthe approximate solutions for various mesh size h =

– 101 –


Table 1. L2 and H1 errors.

L2 H1

k N error rate error rate

1 4 3.23E−02 1.96 7.15E−01 1.01

8 8.29E−03 1.96 3.55E−01 1.00

16 2.14E−03 1.99 1.78E−01 1.00

32 5.39E−04 8.90E−02

2 4 4.56E−03 3.18 1.46E−01 2.07

8 5.04E−04 3.05 3.47E−02 2.02

16 6.08E−05 3.01 8.58E−03 2.00

32 7.53E−06 2.14E−03

3 4 4.48E−04 4.21 2.00E−02 3.12

8 2.43E−05 4.07 2.30E−03 3.03

16 1.45E−06 4.02 2.81E−04 3.01

32 8.94E−08 3.49E−05

1/N , see Table 1. We take the unity as the penalty pa-rameters for each e ∈ Eh. We see from Table 1 that theH1 and L2 convergence rate of the approximate solu-tions are hk and hk+1, respectively. Figs. 1 and 2 showthe approximate solution uh and uh in the case k = 1and N = 8, respectively.

6. Conclusions

We have presented a new hybridized DGFEM by usingthe lifting operator and examined the stability for arbi-trary penalty parameters. Convergence results of opti-mal order have been proved and confirmed by numericalexperiments. As a model problem, we have consideredonly the Dirichlet boundary value problem for the Pois-son equation. We are interested in application to otherproblems, for example, Neumann boundary value prob-lem, convection-diffusion equations, Stokes system, andtime-dependent problems. They are left here as futurestudy.

7. Acknowledgement

I thank Professor Fumio Kikuchi who brought myattention to the present subject and encouraged methrough valuable discussions. This work is supportedby Grants-in-Aid for Scientific Research, JSPS and byGlobal COE Program (The Research and Training Cen-ter for New Development in Mathematics, The Univer-sity of Tokyo), MEXT, Japan.

References

[1] S. C. Brenner and L. R. Scott, The Mathematical Theory ofFinite Element Methods, 3rd ed., Springer, Berlin, 2008.

[2] D. N. Arnold, F. Brezzi, B. Cockburn and L. D. Marini, Uni-fied analysis of discontinuous Galerkin methods for ellipticproblems, SIAM J. Numer. Anal., 39 (2002), 1749–1779.

[3] D. N. Arnold, An interior penalty finite element method withdiscontinuous elements, SIAM J. Numer. Anal., 19 (1982)742–760.

[4] B. Cockburn, J. Gopalakrishnan and R. Lazarov, Unified hy-

bridization of discontinuous Galerkin, mixed, and continuousGalerkin methods for second order elliptic problems, SIAMJ. Numer. Anal., 47 (2009), 1319–1365.

[5] T.H.H. Pian and C. -C. Wu, Hybrid and Incompatible Finite

Element Methods, Chapman & Hall, 2005.

−0.2

0.2

0.4

0.6

0.8

1

0

1.2

0 0.2

0.4 0.6

0.8 1 0

0.2

0.4

0.6

0.8

1

Fig. 1. The approximate solution uh in the case k = 1 and N =8.

0 0.2

0.4 0.6

0.8 1 0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

1.2

Fig. 2. The approximate solution uh in the case k = 1 and N =8.

[6] P. Tong, New displacement hybrid finite element models forsolid continua, Int. J. Numer. Meth. Eng., 2 (1970) 73–83.

[7] F. Kikuchi and Y. Ando, A new variational functional for thefinite-element method and its application to plate and shellproblems, Nucl. Eng. Des., 21 (1972) 95–113.

[8] F. Kikuchi and Y. Ando, Some finite element solutions for

plate bending problems by simplified hybrid displacementmethod, Nucl. Eng. Des., 23 (1972) 155–178.

[9] F. Kikuchi, K. Ishii and I. Oikawa, Discontinuous Galerkin

FEM of hybrid displacement type. Development of PolygonalElements, Theor. Appl. Mech. Jpn, 57 (2009) 395–404.

[10] I. Oikawa and F. Kikuchi, Discontinuous Galerkin FEM ofhybrid type, JSIAM Letters, 2 (2010) 49–52.

– 102 –

JSIAM Letters Vol.2 (2010) pp.103–106 c⃝2010 Japan Society for Industrial and Applied Mathematics

Testing whether the Nikkei225 best bid/ask price path

follows the first order discrete Markov chain

– an approach in terms of the total “ρ-variation” –

Meng Li1 and Kazuo Kishimoto1

1 Graduate School of Systems and Information Engineering, University of Tsukuba, 3-1, Tenn-odai, Tsukuba-shi, Ibaraki 305-0006, Japan

E-mail ri10 sk.tsukuba.ac.jp


Abstract

This paper empirically shows that in the days near the last trading day of Nikkei 225 Futuresthe best bid/ask prices follows the highly negatively correlated first order Markov process,and has no trend up to four ticks based on the total ρ-variation. This is consistent with themodel by Endo et al. and the empirical results therein by different approach. It also derivesthe theoretical asymptotic formula for the total ρ-variation when the process follows the firstorder random Markov walks, and shows that its fit is satisfactory for ρ ≤ 4.

Keywords length, measurement, Markov random walk


1. Introduction

In the liquid market of the Nikkei 225 Futures in theOSE (Osaka Stock Exchange), the bid-ask spread is al-most always just one tick. (See [2].) Arrival of a marketbuy order triggers the settlement at the ask quote, whilethat of a sell order triggers the settlement at the bidquote. The transaction price oscillates between the high-est bid and the lowest ask quotes, which causes seemingstrong negative serial correlation in the sequence of con-secutive transaction prices. One is interested in the tem-poral price changes of the best bid/ask quote or theirmid-price rather than the transaction price itself.To analyze the price changes under this situation,

Endo et al. [1] proposed a simple double auction model,which predicts that the price changes of the best bid/askquotes again follows a first order Markov random walk.This prediction is ascertained empirically in the samepaper: the null hypothesis of the first order Markov prop-erty is not rejected by the Anderson and Goodman teststatistics.In financial markets, one is interested in possible exis-

tence of trend in the price path. Short time strong neg-ative correlation caused by intrinsic nature of doubleauction property does not exclude the possible existenceof trend. The target of the Anderson and Goodman teststatistics is not to detect the existence of trend. Onewishes to test directly the possible existence of longerscale trend. It is just the purpose of this paper.The Alexander’s filter rule [3] is a classical approach

to detect the existence of trend in the academic world.This paper works on a variant version of this concept thetotal ρ-variation proposed in Kishimoto and Iri [4], whichseems to be more natural as a mathematical quantity.(See also Kishimoto [5].)

This paper takes up 72 paths of the best ask priceof tick-by-tick data of Nikkei225 futures in 2007 in theOSE, and tests whether we can regard them as paths offirst order Markov chains. We also derive the formula forthe expectation of the total ρ-variation of a locus of thefirst order Markov random walk to check what extentactual loci of price paths of Nikkei225 Futures followthis formula.

2. Markov random walk and the total ρ-

variation

2.1 Markov random walk

In the model by Endo et al., the tth price change X(t)(t = 1, 2, . . . ) takes either +1 or −1. We associate “+1”(resp. “−1”) with the term “Up” (resp. “Down”). We de-noted “Up” (resp. “Down”) simply by U (resp. D). Theirmodel explains price changes of securities in terms of thearrival of two types of orders: buy and sell. In conclusion,the tth changes X(t) of the best bid/ask price follow thetwo state Markov chain whose transition probability ma-trix is

P =

(pUU pUD

pDU pDD

). (1)

Here, pUU and pUD (resp. pDU and pDD) are the condi-tional transition probabilities of U and D when the lastmove was U (resp. D). The security price S(t) at t isgiven by S(t) = S(0) +

∑tτ=0 Xτ . We denote the path

of S(t) (t ∈ [t0, t1]) by C[t0, t1].

2.2 Definition of the total ρ-variation

Let ∆ = t0 < t1 < · · · < tn be a subset of 0, 1, . . . ,T. In our case of the path of piecewise linear function,the total ρ-variation V (ρ;C([0, t])) (ρ > 0, t ∈ [0, T ]) is

– 103 –

JSIAM Letters Vol. 2 (2010) pp.103–106 Meng Li et al.

defined by

V (ρ;C([0, t]))

= sup∆|(∀k) |S(tk)−S(Tk−1)|>ρ

m∑k=1

|S(tk))− S(Tk−1)| .

(2)

We notice that V (0;C[0, t]) is equal to the total varia-tion in the ordinary sense. For ρ > 0, the supremum isattained for some ∆∗. We say that ρ-extremum is at-tained at S(t∗) if t∗ ∈ ∆∗. S(t∗) is called ρ-maximum(resp. ρ-minimum) if it is a maximum (resp. minimum)point in the ordinary sense. Let us fix ρ at a constant.We understand the X(t) is positively (resp. negatively)correlated if V (ρ;C([0, t])) is large (resp. small).

3. Theoretical value of E[V (ρ;C([0, t]))]

Let us derive theoretical value of limt→∞ E[V (ρ;C[0,t])] as a function of transition matrix P . We consider thecase where E[X(t)] = 0, because our preliminary empir-ical calculations show that it produce little difference inour results during the period under consideration. Thus,we put

P =

(pUU pUD

pDU pDD

)=

(π 1− π

1− π π

).

3.1 Construction of an auxiliary Markov chain

In our case where V (ρ;C([0, t])) is a piecewise con-stant right continuous function, we need V (ρ;C([0, t]))only for ρ = 0, 1, 2, . . . . Without loss of generality, weassume that S(0) = 0 holds.Suppose that we are at time t. For any ϵ > 0 we can

find T , such that for any t > T , there are ρ-extremumpoints between [0, t] with probability more than 1 − ϵ.Thus, we assume that S(t) takes its ρ-extremum pointsat t0 < t1 < · · · < tn−1 < t∗ in [0, t]. We remark that,in interval [0, t′] ⊃ [0, t], S(t) takes ρ-extremum value attk (0 ≤ k ≤ n− 1) and not necessarily at t∗.Let us introduce a new random variable U(t) ≡|S(t) − S(t∗)| (t ≥ t∗). A pair of random variables(U(t), X(t)) define a new Markov chain whose statespace is 0, 1, . . . , ρ × −1, 1.We have the following six cases: (Fig. 1)

1) (U(t), X(t)) = (0,+1): This will never happen.

2) (U(t), X(t)) = (0,−1): We have two cases:

a. With probability pDU, X(t+1) = +1 takes place,and we have a new state (1,+1).

b. With probability pDD, X(t+1) = −1 takes place.S(t) becomes the last ρ-extremum point in [0, t+1], while S(t∗) is no more a ρ-extremum point.We have new state (0,−1) with side effect

V (ρ;S(t+ 1)) = V (ρ;C([0, t])) + 1.

3) 1 ≤ U(t) ≤ ρ− 1 and X(t) = +1:

a. With probability pUU, we have a new state (U(t)+1,+1) with no side effect.

b. With probability pUD, we have a new state (U(t)−1,−1) with no side effect.

4) 1 ≤ U(t) ≤ ρ− 1 and X(t) = −1:

t

(0,−1)

(1,+1)

(0,−1)

Case 2

S(t*)+ρ

S(t)

S(t*)S(t*)

ii) b

ii) a

V(t+1)=V(t)+1

t+1t0t

(n+1,+1)

(n−1,−1)

Case 3S(t)

iii) b

iii) a

t+1t0t

S(t*)

(n,+1)

S(t*)+ρ

S(t*)

(n+1,+1)

(n−1,−1)

Case 4S(t)

iv) b

iv) a

t+1t0t

S(t*)(n,−1)

S(t*)+ρ

S(t*)

(ρ−1,−1)

(0,+1)

(ρ,+1)

Case 5

S(t*)+ρ

S(t)

S(t*+1)

S(t*)

v) b

v) a

V(t+1)

=V(t)ρ+1

t+1t0t

Fig. 1. Four cases 2)–5) out of six cases 1)–6), when t is incre-mented by 1.

a. With probability pDU, we have a new state (U(t)+1,+1) with no side effect.

b. With probability pDD, we have a new state (U(t)−1,−1) with no side effect.

5) (U(t), X(t)) = (ρ,+1):

a. With probability pUU, X(t+1) = +1 takes place.S(t+1) is the last ρ-extremum point in [0, t+1],and we rename S(t∗) as S(tρ+1). We have newstate (0,−1) with side effect V (ρ;S(t + 1)) =V (ρ;C([0, t])) + ρ+ 1.

b. With probability pUD, X(t+1) = −1 takes place,and we have a new state (1,+1) with no side ef-fect.

– 104 –


6) (U(t), X(t)) = (ρ,−1): This will never happen.Let us define the state probability vector wt = (wt,1,

wt,2, . . . , wt,2(ρ+1)) at t by

wt,2k−1 = Pr[U(t) = k − 1, X(t) = −1],(k = 1, 2, . . . , ρ+ 1),

wt,2k = Pr[U(t) = k − 1, X(t) = +1],

(k = 1, 2, . . . , ρ+ 1).

The transition matrix A = (aij) is given by

A =

A11 A12 0 . . . 0A21 0 A23 0 0

0 A32. . .

. . . 0

0 0. . . 0 Aρ,ρ+1

Aρ+1,1 0 · · · Aρ+1,ρ 0

,

A11 =

(pDD 00 0

), A12 =

(0 pDU

0 0

),

A23 = Aρ,ρ+1 =

(0 pDU

0 pUU

),

A21 = A32 =

(pDD 0pUD 0

),

Aρ+1,1 =

(0 0

pUD 0

), Aρ+1,ρ =

(0 00 pDD

).

3.2 Asymptotic probability

From the standard theory on the finite state Markovchain, A has one and only one characteristic value 1whose characteristic vector gives the asymptotic proba-bility of wt at t→∞ if normalized as

2(ρ+1)∑k=1

wt,k = 1.

Let “T” denote the transposition of a vector. Asymptoticprobability is given by the solution of

wT(I −A) = 0.

Thus, we have

( wt,2ρ+1, wt,2ρ+2 )

=

(0,

π

ρ(ρ− 1)π2 + (2ρ− 1)π + 1

)and(

wt,2n−1, wt,2n

)=(wt,2ρ+1, wt,2ρ+2

)( π π − 11− π 2− π

)ρ+1−n

(n = 1, 2, . . . , ρ+ 1).

The proportional relationship among asymptotic proba-bilities are shown in Figs. 2 and 3.

3.3 Formula for E[V (ρ;C([0, t]))]

As t→∞, we have

E[V (ρ;S(t+ 1))− V (ρ;C([0, t]))]

ρ+1ρ3

ω t,2n−1

2p

3p

p

0

ρp

(ρ−2)p

(ρ−1)p

...

ρ−1ρ−221 . . .

. . .

n

Fig. 2. Scaling relation among the points of wt,2n−1.

ρ+1ρ3

ω t,2n

1+p

1+2p

1

0

1+(ρ−3)p

1+(ρ−2)p

1+(ρ−1)p

...

ρ−1ρ−221 . . .

. . .

n

Fig. 3. Scaling relation among the points of wt,2n.

1.8

1.6

1

1.2

1.4

π0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

E[V(1;C[0, t])]

Fig. 4. E[V (1;C[0, t])] for a Markov random walk.

= π(wt,2 + ρwt,2ρ+1),

where the first term on the right-hand side is due to Case2 in Fig. 1 and the second term is due to Case 5. Theexpectation of total ρ-variation is

E[V (ρ;C([0, t]))] =π2 + πρ

ρ(ρ− 1)π2 + 2(ρ− 1)π + 1t (3)

as t→∞. A graph E[V (1;C[0, t])] as a function of π isgiven in Fig. 4.

4. Empirical test on Nikkei 225 futures

trade

4.1 The data set

We used tick-by-tick data of Nikkei 225 Futures on theOSE in 2007 provided by Nikkei Media Marketing Inc.

– 105 –


frequency

0.120

5

10

15

20

25

0.16 0.20 0.24 0.28 0.32

π

Fig. 5. Histogram of the observed value of π.

frequency

60

40

30

20

10

0

50

ratio0.6 1 1.2 1.4 1.60.8 1.8

ρ=1 ρ=2 ρ=3 ρ=4

Fig. 6. Histogram for the ratio of calculated value to real value

as ρ = 1, ρ = 2, ρ = 3, ρ = 4.

The contract months of Nikkei 225 Futures are March,June, September and December. The last trading dayis the business day preceding the second Friday of eachcontract month. We deal only with the data of 10 daysbefore the last trading day for 4 delivery months in 2007.Due to incompleteness of data, we only calculate fromMarch 1st to 8th for the delivery month of March 2007.Trading hours are composed of two sessions: 9:00–

11:00 (morning session) and 12:30–15:10 (afternoon ses-sion). Their opening prices and the closing prices are de-termined by the periodic double auction called “itayose”,while the other transactions prices are determined bycontinuous auctions. We only work on the transactionsby the continuous auctions, and discard the transactionsat opening and closing of the morning and the after-noon sessions. We analyze morning and afternoon ses-sions separately. Thus we have 72 samples.We worked on the changes of the best ask price. When

the best ask price moves kticks (k > 1) instantaneously,we regard it as consecutive k times transitions in thesame direction.

4.2 Basic statistics

For each of 72 sample paths, we can directly estimateπ. We give the histogram of the estimated values of πin Fig. 5, which suggests that X(t) has strong negativeserial correlation.One is also interested whether the theoretical asymp-

totic formula (3) well predicts the observed total ρ-variations, or not. In Fig. 6, we give the histogram ofthe ratios of observed total ρ-variations (ρ = 1, 2, 3, 4) totheir asymptotic predictions. To remove the initial pointeffect, we used (π2+πρ)(T−t∗)/[ρ(ρ−1)π2+2(ρ−1)π+1]

Table 1. The time on finding the first ρ-extremum point. Thetime is divided by the total time of a path.

ρ = 1 ρ = 2 ρ = 3 ρ = 4

Mean 1.67E−03 8.79E−03 1.71E−02 3.16E−02Var 5.92E−06 2.05E−04 1.06E−03 3.91E−03

Table 2. Numbers of rejections at the significance level 5 percent.

ρ = 1 ρ = 2 ρ = 3 ρ = 4

ratios of rejection 5/72 2/72 1/72 1/72

as theoretical prediction, where t∗ is the location of thefirst extremum point. We give its average and variancein Table 1.We see that the best fit is given for ρ = 1, and the

asymptotic formula seems to predicts the reality for allρ’s, though its variance must be further investigated.

4.3 Method for testing and its results

Suppose that 72 original paths of the best ask priceare given. For each path, we randomly generated 999paths of the same path length based on the estimatedπ. We calculated 1000 total ρ-variation. If the rank rof V (ρ;C[0,m]) of the original path satisfies 25 < r <975, we judges that the null hypothesis is accepted forthis path at the significance level 5 percent. The nullhypothesis that the best ask path follows the first orderMarkov process seems to be accepted for ρ = 1, 2, 3, 4.For ρ more than 5, we do not have enough ρ-extremumpoints for testing.We give the numbers of rejections in Table 2. It seems

that the null hypothesis is accepted.

5. Conclusion

We tested whether the best ask price path of theNikkei 225 Futures in the OSE follows the first orderMarkov chain or not, based on the total ρ-variation forρ = 1, 2, 3, 4. The null hypothesis is not rejected. We alsocalculated the asymptotic theoretical expectation of thetotal ρ-variation. Its fit seems to be satisfactory thoughits variance must be further investigated.

Acknowledgments

The second author is supported by the Grant-in-Aidfor Scientific Research, MEXT, Japan.

References

[1] M. Endo, S. Zuo and K. Kishimoto, Modeling intra-day stockprice changes in terms of a continuous double auction system(in Japanese), Trans. JSIAM, 16 (2006), 305–316.

[2] M. Li, M. Endo, S. Zuo and K. Kishimoto, Order imbalancesexplain 90% of returns of Nikkei 225 futures, Applied Eco-nomics Lett., 17 (2010), 1241–1245.

[3] S. S. Alexander, Price movement in speculative markets:

Trends or random walk, Industrial Management Review, 2(1961), 7–26.

[4] K. Kishimoto and M. Iri, A practical approach to the def-inition and measurement of “Length”, Jpn J. Indust. Appl.

Math., 6 (1989), 179–207.[5] K. Kishimoto, Analytical expression of the expectation of a

scale dependent “Length” of a path of Brownian motion withdrift, Prog. Theor. Phys., 82 (1989), 465–470.

– 106 –


Numerical identification of nonhyperbolicity

of the Lorenz system through Lyapunov vectors

Yoshitaka Saiki1 and Miki U. Kobayashi2

1 Department of Mathematics, Hokkaido University, Sapporo 060-0810, Japan2 Research Institute for Mathematical Sciences, Kyoto University, Kyoto 606-8502, Japan

E-mail saiki math.sci.hokudai.ac.jp


Abstract

Understanding nonhyperbolicity in dynamical systems is important, yet, it is usually difficultto see whether a system is hyperbolic or not. In this letter, angles between stable and unstabledirections on a point of a chaotic attractor of the Lorenz system with some sets of variousparameter values are calculated through identifying Lyapunov vectors numerically. Then weestimate the parameter value where the system becomes nonhyperbolic in one parameterfamily.

Keywords Lyapunov vectors, nonhyperbolicity, Lorenz system

Research Activity Group Scientific Computation and Numerical Analysis (Role of Unstable So-lutions in Pattern Dynamics)

1. Introduction

1.1 Basic background

A dynamical system is said to be hyperbolic if the sta-ble and unstable manifolds are everywhere transversal toeach other; otherwise a system is nonhyperbolic. Theoryof hyperbolic dynamics have been developed by Smale[1] and many other researchers. In hyperbolic systemsapproaches by symbolic dynamics and cycle expansiontheory always work. It is usually difficult to characterizenonhyperbolic dynamics [2,3], because those approachesare sometimes useless in analyzing a nonhyperbolic sys-tem [4, 5]. Therefore to distinguish between hyperbolicand nonhyperbolic dynamical systems is important.However, it is difficult to see whether a given sys-

tem is hyperbolic or not, because manifold structuresare usually very complicated in chaotic systems. Thereare, however, some studies to prove hyperbolicity fromboth rigorous and non-rigorous approaches. Davis et al.[6] conjectured that the real Henon family with someparameter region is hyperbolic. Later, Arai [7] proposeda rigorous computational method to prove hyperbolicityof discrete dynamical systems. He applied the method tothe real Henon family and proved the existence of manyregions of hyperbolic parameters in the parameter planeof the family. Kuptsov and Kuznetsov [8] studied a cou-pled Ginzburg Landau equation from the viewpoint ofhyperbolicity and nonhyperbolicity through Lyapunovvectors calculated by the numerical algorithm proposedby Ginelli et al. [9]. In this letter, we try to investigatethe validity of Lyapunov vectors and the occurrence ofnonhyperbolicity in the well known Lorenz system.

1.2 Lyapunov vectors

Lyapunov vectors are the vectors invariant under bothforward and backward time iterations, and the expan-sion and the contraction rates of the vectors correspond

un

2

vn−12 vn

2

vn+12

xn−1xn

time

un−12

un2

un+12

vn−11

vn+11vn

1

xn+1

Fig. 1. Conceptual figure of calculating Lyapunov vectors ((i)Identifying “orthogonal Lyapunov vector” from the calculationof positive time direction (ii) Identifying Lyapunov vectors from

the calculation of inverse time direction).

to Lyapunov spectra [10–12]. They indicate stable andunstable directions of the tangent space at each point ofan invariant set. The local (un)stable manifold at eachpoint is spanned by the (un)stable directions.Ginelli and co-workers recently proposed a nice algo-

rithm to compute Lyapunov vectors and named themcovariant Lyapunov vectors (CLVs) [9]. The algorithmenables us to study local manifold structures for varioussystems including high dimensional systems. The Lya-punov vectors are computed in the following way (seeFig. 1). Let’s consider an N -dimensional map xn+1 =F(xn). For the forward procedure, we compute a set oforthogonal vectors vk

n (k = 1, 2, . . . , N) at time n ac-companied by the QR procedure [13]. vk

n correspondsto the k-th column of the Qn at the phase space pointxn. To calculate Lyapunov vectors, we basically usevn and Rn which are stored for the forward proce-dure. Let uj

n be a generic vector inside the subspacespanned by vk

n, k = 1, 2, . . . , j. We iterate this vectorbackward in time by inverting the matrix Rn: one has

– 107 –

JSIAM Letters Vol. 2 (2010) pp.107–110 Yoshitaka Saiki et al.

ci,jn−1 =∑

k[Rn]−1i,kc

k,jn , where [R]i,j is a matrix element

of R and ci,jn = (vin,u

jn) are the expansion coefficients.

After iterating ujn backward for a long time, the vec-

tor eventually gives the most expanding direction withinthe subspace spanned by vk

n, k = 1, 2, . . . , j. In fact ujn

gives j-th expanding direction for the forward time it-eration and thus uj

n are j-th Lyapunov vectors at thephase space point xn . The knowledge of the Lyapunovvectors allows testing hyperbolicity by determining an-gles between subspaces Es spanned by contracting CLVsand ones Eu spanned by expanding CLVs. The angle isdefined as follows [8]:

∠(Es, Eu) = cos−1 max|us|=|uu|=1

us∈Es,uu∈Eu

(|us,uu|).

To see the validity of the Lyapunov vectors we firstapply them to the Henon map. Henon map is a two-dimensional map on R2, which is described by

xn+1 = a− x2n + byn, yn+1 = xn

where the parameters a, b (∈ R) are constants. This is adiffeomorphism if b = 0, and the Jacobian of the systemis −b. Henon map is the only one diffeomorphism on R2

described by a polynomial of order 2 and the inverse ofwhich is also written by a polynomial.Fig. 2 shows the distribution of the angle between sta-

ble and unstable directions at each point of a chaoticattractor of the Henon map with two parameter values,which is calculated from Lyapunov vectors. This showsthat both parameters give nonhyperbolic structures. Itis already known that if the Henon map is hyperbolicthe system cannot have a chaotic attractor. So our re-sults are consistent with this known result and the resultby Arai [7] in which both of these parameters are out-side the parameter regions in R2 at which the system isproved to be hyperbolic.

2. Lyapunov vectors of the Lorenz sys-

tem

The Lorenz system

dx

dt= σ(y − x),

dy

dt= rx− y − xz,

dz

dt= xy − bz

is one of the most famous chaotic systems. The systemwith the classical parameter values (σ = 10, b = 8/3,r = 28) have been extensively studied [14]. It is knownthat the system with the classical parameter values is(singular) hyperbolic and has a chaotic attractor whichincludes an infinite number of unstable periodic orbits[2, 15, 16]. It is also an interesting problem to see thestructure change by varying some parameter values.Here, we only change the parameter r from the classicalparameter value and investigate the change of manifoldstructures by calculating Lyapunov vectors.

2.1 Hyperbolic and nonhyperbolic structure

Fig. 3 shows the distribution of the angle between sta-ble and unstable directions at each point of a chaoticattractor of the Lorenz system. Practically points of achaotic attractor are replaced by points on a chaotic or-bit with time length T=30000 calculated by the fourth

0

0.02

0.04

0.06

0.08

0 30 60 90

PDF

angle

a=1.40,b=0.30

a=1.00,b=0.52

Fig. 2. Distribution of the angle(degree) between stable and un-stable directions at each point of a chaotic attractor of the Henon

map (30000 iterations, bin size=0.1).

r=28

r=60

0

0.01

0.02

0.03

0.04

0 30 60 90

PDF

angle

Fig. 3. Distribution of the angle (degree) between stable and

unstable directions at each point of a chaotic attractor of theLorenz system (r = 28, 60).

order Runge-Kutta method with time step width 0.001.In the case of r = 28 the PDF does not seem to takepositive around zero angle, whereas the PDF for r = 60seems to take positive around zero angle. Fig. 4 showsthe minimum angle between stable and unstable direc-tions along a segment of a chaotic orbit (time length T )of the Lorenz system (r = 28, 60) for three initial con-ditions. In the case of the Lorenz system with classicalparameter values (r = 28) the minimum angle seems toconverge to some positive value. However, in the caseof the Lorenz system with r = 60, the minimum an-gle seems to decrease toward 0. This implies that thesystem is hyperbolic in r = 28 and nonhyperbolic inr = 60. It is known that the Lorenz system with the clas-sical parameter values is (singular) hyperbolic [2,15,16],whereas the system with r = 60 is thought to be nonhy-perbolic where the cycle expansion theory [17] does notwork [4, 5]. Sparrow [14] conjectured that the systemgenerates a homoclinic tangency as r increases from 28.The results obtained in Figs. 3 and 4. which are cal-culated from the Lyapunov vectors are consistent withthese facts.

– 108 –


0.1

1

10

100

0.001 0.1 10 1000 100000

min

imum

angle

T

Fig. 4. Minimum angle (degree) between stable and unstable di-

rections at each point along a segment of a chaotic orbit (timelength T ) of the Lorenz system from three initial conditions(r = 28(red), 60(green)).

2.2 First tangency

The occurrence of the nonhyperbolicity by changingsome parameter values is an important phenomenon ofa structure change. Especially the determination of thefirst tangency point is one of the most important butdifficult problems [3,18]. In this subsection we try to ap-proach the first tangency problem that appears when ris increased from 28 in the Lorenz system by numericallycalculating the Lyapunov vectors at points of a chaoticattractor. Remark that in the usual case the term firsttangency refers to a first bifurcation on the boundary ofuniformly hyperbolic parameter region, but the problemhere is not the case.Fig. 5 is the minimum angle between stable and un-

stable directions at points of a chaotic attractor approx-imated by a chaotic orbit with time length T = 30000for the Lorenz system for various r (24.5 < r < 124.5).In the range from r = 40 to 70 minimum angles seemnot to be small enough, but if we use a longer orbit forrepresenting points of a chaotic attractor, the minimumangles tend to become smaller as we have seen in Fig.4 (r = 60). It seems that the structure change is mono-tonic by increasing r in the range 24.5 < r < 124.5 andthat after the occurrence of nonhyperbolicity the sys-tem keeps nonhyperbolicity for r which realizes a chaoticattractor. Fig. 6 is the detailed figure of Fig. 5 for28 < r < 33 which is calculated by a chaotic orbit withtime length T = 106. As r increases the minimum angledecreases, and the system seems to become nonhyper-bolic around 32. In fact from Fig. 7 PDF of the anglebetween stable and unstable directions of the Lorenz sys-tem at r = 32 takes positive around zero angle, whereasPDFs at r = 28, 30 do not take positive around zero an-gle. This means that the Lorenz system becomes nonhy-perbolic between r = 30 and 32. The result is consistentwith the estimation from the observation of the Poincaresection without calculating manifolds [14].

0

2

4

6

8

10

12

14

20 40 60 80 100 120

min

imum

angle

r

Fig. 5. Minimum angle (degree) between stable and unstable di-rections at each point on a chaotic attractor for various r.

0

1

2

3

4

5

6

7

8

9

28 29 30 31 32 33

min

imum

angle

r

Fig. 6. Minimum angle (degree) between stable and unstable di-rections at each point on a chaotic attractor for various r.


3.1 Conclusion

In this letter, the validity of the Lyapunov vectors isconfirmed from the Henon map and the Lorenz system.At first we confirmed the hyperbolicity and nonhyper-bolicity of the systems for well known parameter values.Hyperbolicity and nonhyperbolicity are identified fromthe angles between stable and unstable directions on apoint of a chaotic attractor which are determined by thenumerically calculated Lyapunov vectors.Next the ranges of hyperbolic and nonhyperbolic pa-

rameter values of the Lorenz system are studied in detail.It is conjectured, from the calculation of Lyapunov vec-tors, that the first tangency parameter of r is between30 and 32, which was estimated from the observation ofthe Poincare section without calculating manifolds.

3.2 Recent works

Yang et al. [19] obtained Lyapunov vectors of theKuramoto-Sivashinsky equation numerically and dis-cussed physical modes in relation to the nonhyperbol-icity of the system. This seems to be one of the inter-esting ways to use Lyapunov vectors. Identification ofglobal manifolds are very difficult, but Doedel et al. [20]numerically identified the global stable manifold of the

– 109 –


r=28

r=30

r=32

0

0.01

0.02

0.03

0.04

0 30 60 90

PDF

angle

0

0.0001

0.0002

0.0003

0 2 4 6

PDF

angle

r=28

r=30

r=32

Fig. 7. Distributions of the angle (degree) between stable and unstable directions at each point on a chaotic attractor (r = 28, 30, 32)(left) and its detailed figure (right).

origin of the Lorenz system. It will also give us someinteresting features in relation to nonhyperbolicity. Inaddition, from our recent study, it is found that the gen-eration of nonhyperbolicity in the Lorenz system can beunderstood well by employing periodic orbits. Resultswill be reported in our papers in preparation [21,22].

Acknowledgments

The authors are thankful to the reviewers for theirimportant comments and suggestions. They also thankProf. M. Yamada, Dr. H. Takahashi, Ms. K. Obuse andMr. M. Inubushi for fruitful discussions. This work ispartially supported by the Grant-in-Aid 194048 and in-centive system for young researchers of Academic Cen-ter for Computing and Media Studies, Kyoto Univer-sity, and by the Postdoctoral Fellowship of the GlobalCenters of Excellence Program “Fostering top leaders inmathematics” at Kyoto University.

References

[1] S.Smale, Differentiable dynamical systems, Bull.Amer.Math.Soc., 73 (1967), 747–817.

[2] C. Bonatti, L. J. Diaz and M. Viana, Dynamics Beyond Uni-form Hyperbolicity, Encyclopedia of Mathematical Sciences102, Springer-Verlag, Berlin, 2005.

[3] J. Palis and F. Takens, Hyperbolicity & sensitive chaotic dy-

namics at homoclinic bifurcations, Cambridge studies in ad-vanced mathematics 35, Cambridge Univ. Press, Cambridge,1993.

[4] V. Franceschini, C. Giberti and Z. Zheng, Characterization of

the Lorenz attractor by unstable periodic orbits, Nonlinearity,6 (1993), 251–258.

[5] S. M. Zoldi, Unstable periodic orbit analysis of histograms of

chaotic time series, Phys. Rev. Lett., 81 (1998), 3375–3378.[6] M. J. Davis, R. S. Mackay and A. Sannami, Markov shifts in

the Henon family, Physica D, 52 (1991), 171–178.[7] Z. Arai, On hyperbolic plateaus of the Henon maps, Experi-

mental Mathematics, 16 (2007), 181–188.[8] P. V. Kuptsov and S. P. Kuznetsov, Violation of hyperbolicity

in a diffusive medium with local hyperbolic attractor, Phys.Rev. E, 80 (2009), 016205.

[9] F. Ginelli, P. Poggi, A. Turchi, H. Chate, R. Livi and P. Politi,Characterizing dynamics with covariant Lyapunov vectors,Phys. Rev. Lett., 99 (2007), 130601.

[10] L.Barreira and Y.B.Pesin, Lyapunov Exponents and Smooth

Ergodic Theory, University Lecture Series 23, Amer. Math.

Soc., 2002.[11] V. I. Oseledec, A multiplicative ergodic theorem: Lyapunov

characteristic numbers for dynamical systems, Trans.Moscow

Math. Soc., 19 (1968), 197–231.[12] D.Ruelle, Ergodic theory of differentiable dynamical systems,

Publ. Math. IHES, 50 (1979), 27–58.[13] I. Shimada and T. Nagashima, A numerical approach to er-

godic problem of dissipative dynamical systems, Prog. Theor.Phys., 61 (1979), 1605–1616.

[14] C. Sparrow, The Lorenz Equations: Bifurcations, Chaos, andStrange Attractors, Springer-Verlag, New York, 1982.

[15] W. Tucker, The Lorenz attractor exists, C. R. Acad. Sci. ParisSer. I Math., 328 (1999), 1197–1202.

[16] W. Tucker, A rigorous ODE solver and Smale’s 14th problem,Found. Comput. Math., 2 (2002), 53–117.

[17] P. Cvitanovic, Invariant measurement of strange sets in termsof cycles, Phys. Rev. Lett., 61 (1988), 2729–2732.

[18] Z. Arai and K. Mischaikow, Rigorous computations of homo-

clinic tangencies, SIAM J. Appl. Dyn. Syst., 5 (2006), 280–292.

[19] H. L. Yang, K. A. Takeuchi, F. Ginelli, H. Chate and G.Radons, Hyperbolicity and the effective dimension of spa-

tially extended dissipative systems, Phys. Rev. Lett., 102(2009), 074102.

[20] E. J. Doedel, B. Krauskopf and H. M. Osinga, Global bifurca-tions of the Lorenz manifold, Nonlinearity, 19 (2006), 2947–

2972.[21] M. U. Kobayashi and Y. Saiki, in preparation.[22] Y. Saiki and M. U. Kobayashi, in preparation.

– 110 –


Mean breakdown points for compressed sensing

by uniformly distributed matrices

Ryuichi Ashino1 and Remi Vaillancourt2

1 Division of Mathematical Sciences, Osaka Kyoiku University, Kashiwara, Osaka 582-8582,Japan

2 Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Ave.,Ottawa, Ontario K1N 6N5, Canada

E-mail ashino cc.osaka-kyoiku.ac.jp

Received March 31, 2010, Accepted June 18, 2010

Abstract

It is graphically observed that curves of mean breakdown points obtained by ℓ1 optimiza-tion for compressed sensing defined by underdetermined systems y = Aw with uniformly dis-tributed random matrices A ∈ Rd×m and sparse w almost coincide with the curves obtained bynormally distributed random matrices, both with sparse vectors w+ with nonnegative compo-nents and w± with components of either sign. Three-dimensional figures illustrate asymptoticphase transition cliffs. These and the standard deviation of the mean breakdown points canbe used to define a level of sparseness of w below which a unique solution is expected to ahigh probability.

Keywords phase transition, compressed sensing, ℓ1 linear programming, sparse solution ofunderdetermined system, mean breakdown points

Research Activity Group Wavelet Analysis

1. Introduction

We consider a detection system with a large numberof sensors, say 1600, and output of size d. The originalsignal w is processed by means of a d× 1600 matrix A,y = Aw. At any given time, it is expected that only asmall number of those sensors will be activated, say 20.The question is how small to take d so that the signalis compressed by ℓ1 linear programming, or how smallcan one chose the number d of rows of A so that theunderdetermined system admits a unique sparse solutionby optimization in the ℓ1 norm.Compressing data received via underdetermined linear

systems of the form

y = Aw, A ∈ Rd×m, d≪ m, (1)

where the random matrix A has full rank, amounts toexpress a sparse signal w ∈ Rm in terms of a reducednumber of columns of A [1]. The question is how to findthe sparsest solution of such underdetermined problemwhile minimizing d for speed.Let the number of nonzero components wi of a vector

w ∈ Rm be denoted by the quasi-norm

∥w∥ℓ0 := card i : wi = 0. (2)

Problems (1) can be solved by convex ℓ1 linear program-ming [2] if w is highly sparse, that is, the number q ofnonzero components of w,

q = ∥w∥ℓ0 , (3)

is sufficiently small.For a given matrixA and a sparse vector w, let y = Aw

be the value of Aw. A breakdown point for system (1) is

the maximum number of nonzero components in w be-yond which the recovery of w by ℓ1 minimization breaksdown, that is, for given A and y the ℓ1 minimizationfinds a vector different from the given vector w or doesnot find any solution.To obtain a mean breakdown point, N random matri-

ces Ai are generated. For each Ai, random vectors wij

are generated with increasing number q = 1, 2, . . . , qij ,qi,j+1 of nonzero random elements, randomly indexed,until the error in the solution, with qi,j+1 = j + 1, islarger than 10−5. Then the number qi = qij is taken tobe the breakdown point associated with Ai. The meanbreakdown point for the N runs is

µ =1

N

N∑i=1

qi.

Many useful and interesting results have been estab-lished on the existence and uniqueness of the sparsestsolution of system (1) by means of ℓ1 minimization, pro-vided the vector w is sufficiently sparse, for examplein [3], culminating in important theoretical results byDonoho and collaborators [4–8]. Tsaig and Donoho [9]established semi-empirical bounds on breakdown pointsof uniformly spherical ensemble, partial Hadamard en-semble and partial Fourier ensemble.Donoho [10, Cor. 1.3] derived from the theory of

centrally-symmetric polytopes that, with m−2 ≥ d > 2,if the ℓ1 minimization correctly finds all sparse solutionsof (1) having not more than k nonzero elements, thenk ≤ ⌊(d + 1)/3⌋, where ⌊t⌋ denotes the floor or integralpart of t.

– 111 –

JSIAM Letters Vol. 2 (2010) pp.111–114 Ryuichi Ashino et al.

Donoho [10, Cor. 1.5] proved the following overwhelm-ing probabilistic result for underdetermined systems. Letm and d tend to ∞ and d = ⌊δm⌋ where δ < 1 andk/d < 1. Let y = Aw0, where w0 contains nonzero com-ponents at k sites selected uniformly at random withsigns chosen uniformly at random and where A is a uni-form random orthoprojector from Rm to Rd. With over-whelming probability for large m, the minimum ℓ1-normsolution to y = Aw is also the sparsest solution and isprecisely w0.High breakdown points for low dimensional com-

pressed sensing problems have been investigated in[11,12] where the matrices A were generated by singularvalue decomposition and reduced QR decomposition ofuniformly distributed random matrices.David Donoho and Jared Tanner [13] observed uni-

versal threshold locations across nine probability distri-butions matching those of simplex polytopes and cross-polytopes. Excellent agreement at 50% success has beenfound between each of the non-Gaussian matrix ensem-ble and the asymptotic theory for the normally dis-tributed Gaussian based on geometric combinatorics.This raises the question of a central limit theorem formatrix ensembles, but they did not consider uniformlydistributed matrices.We shall say that mean breakdown points obtained

from two sets of matrices A are equivalent if their mean,µ, do not differ by more than their standard deviation,σ.The aim of this paper is to observe numerically that

the mean breakdown points obtained by random matri-ces A ∈ Rd×m uniformly distributed on [0, 1] are equiv-alent to those of normally distributed matrices N (0, 1)of the same dimension if d ≪ m whether the signal whas positive components or components of either sign.The plan of the paper is as follows. Section 2 is

devoted to solution by linear programming. Section 3presents the numerical results. Standard deviations ofmean breakdown points are obtained in Section 4.

2. Sparse solutions of underdetermined

systems

The basis pursuit of Chen, Donoho and Sanders [1]is used to find a sparse solution of an underdeterminedsystem by means of ℓ1 optimization.Let A ∈ Rd×m, where d < m. We want to solve the

underdetermined system

Aw = y (4)

by means of the ℓ1 convex minimization

min ∥w∥ℓ1 , subject to Aw = y, (5)

under the condition that w is a sparse m-vector.In the case w has nonnegative components, denoted

by w+, the minimization problem (5) is linear and canbe solved by the standard linear program

min cTw, subject to Aw = y, w ≥ 0, (6)

where c is the vector with all components equal to 1. Thegeneralized vector inequality x ≤ y means that xi ≤ yi

for all i.In the case w has components of either sign, denoted

by w±, the minimization problem (5) is nonlinear, butit can be transformed into the standard linear programfor z ∈ Rr, r even,

min cT z, subject to Cz = a, z ≥ 0, (7)

by the following substitutions [1]:

r ⇔ 2m, C ⇔ (A,−A), a⇔ y,

c⇔ (1; 1), z ⇔ (u; v),(8)

where the unknown vectors u, v ∈ Rm×1 have nonneg-ative components. Here C ∈ Rd×2m, c ∈ R2m×1 andz ∈ R2m×1. Then the solution to problem (4) is

w = u− v.


Computations were performed with Mathematica 7.0.1.0 mainly on a Mac Pro, two 2.93GHz Quad-Core In-tel Xeon, 16GB, Mac OS X Version 10.6.3, and also onseveral other Macs.The MATLAB colon notation [1 : p] will denote the

set 1, 2, . . . , p. Similarly, [n : p : m] := n, n + p, n +2p, . . . ,m if m− n is an integer multiple of p.

Notation 1 A random matrix A ∈ Rm×n with entriesuniformly distributed in [a, b] will be denoted as A ∼U(m,n; a, b). Similarly, a random matrix A ∈ Rm×n

with entries normally distributed with mean µ and stan-dard deviation σ will be denoted as A ∼ N (m,n;µ, σ).The same notation applies to random vectors x ∈ Rm×1.

Random matrices A ∼ U(d,m; 0, 1) and A ∼ N (d,m;0, 1) were produced by the Mathematica 7 commandsRandomReal[UniformDistribution[0,1],d,m] andRandomReal[NormalDistribution[0,1],d,m], respec-tively. Random vectors w ∼ U(m, 1; 0, 1), with 1, 2, . . .nonzero components in random position, were producedby RandomReal[UniformDistribution[0,1],m,1]

and RandomPermutation until breakdown occurred.The command LinearProgramming with the optionInteriorPoint was used to perform ℓ1 minimization.In the numerical experiments, y was calculated exactly

as y = Aw for given w. The solution w obtained by (6)or (8) is called a success if ∥w − w∥ < 10−5.Comparison is made between the mean breakdown

points of A ∼ U(d,m; 0, 1) and A ∼ N (d,m; 0, 1) forw+ and w±.Given a matrix A ∈ Rd×m and a vector w ∈ Rm,

the dimensionless variables δ = d/m and ρ = k/d aredefined in terms of the physical dimensions d and m andthe number of nonzero components k = ∥w∥ℓ0 of w.The theoretical phase transition curves ρ(δ;T ) for sim-

plexes T , which are (m − 1)-dimensional analogues ofequilateral triangles, and ρ(δ;C) for cross-polytopes C,which are m-dimensional analogues of octahedra, areshown in Fig. 1 in the dimensionless variable (δ, ρ).The upper and lower curves have been obtained byDonoho and Tanner [13] from high-dimensional combi-natorics for w+ and w±, respectively. These transitioncurves give the 50% success rates for A ∼ N (d,m; 0, 1).

– 112 –


0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

δ = d / m

ρ =

k

/ d

Fig. 1. Asymptotic phase transition curves of 50% success ratesfor y = Aw and ∥w∥ℓ0 = k, where A ∼ N (d,m; 0, 1). Uppercurve: ρ(δ;T ) with w+. Lower curve: ρ(δ;C) with w±.

A full description of the transition curves, which raisethe amazement of Donoho and Tanner, would take toomuch room in this Letter; the interested readers mayconsult [13] and the references therein. Jared Tannerhappily sends the data of the curves upon demand. Itis very pleasant mathematically to derive the transi-tion curves from high-dimensional combinatorics, but, ofcourse, they could be constructed experimentally fromextensive numerical computation as we once did.The 3-dimensional Fig. 2 shows the cliff in the case

A ∼ U(d, 200; 0, 1) for w+ in the physical variables (d,k, h), where d = 200δ, k = ρd, and δ and ρ are thedimensionless variables used in Fig. 1. For each (d, k),where d = 5 : 5 : 195 and k = 5 : 1 : 195, 30 problems y= Aw were solved with 30 different random matrices A∼ U(d, 200; 0, 1) and random vectors w ∼ U(200, 1; 0, 1)with increasing ∥w∥ℓ0 until breakdown occurs. The valueof the variable h is the number of successes.The top surface ranging from the upper left corner

(d, k, h) = (195, 5, 30) almost to the cliff represents 100%success, that is, 30 successful solutions out of 30 tri-als. The bottom surface ranging almost from the cliff tothe lower right corner (d, k, h) = (0, 195, 0) represents0% success, that is, no vector w or an incorrect w wasrecovered from y = Aw. The level curve at height 15along the cliff represents 50% success, that is, 15 solu-tions, out of 30 trials, were obtained to an error lessthan 10−5. Similarly, Fig. 3 shows the cliff in the caseA ∼ U(d, 200; 0, 1) for w±. The cliff in Fig. 2 with w+

occurs at higher number k of nonzero components thanthe one in Fig. 3 with w±.It was found that the curve of mean breakdown points

for problem (5) with w+ obtained by the linear program(6) lies just below the upper curve ρ(δ;T ) of 50% successin Fig. 1 for A ∼ U(d,m; 0, 1) and A ∼ N (d,m; 0, 1) inalmost all cases. In cases of failure, no solution could befound. However, the curve of mean breakdown pointsfor problem (5) obtained by the linear program (8)with w+ lies just below the upper curve ρ(δ;T ) forA ∼ U(d,m; 0, 1) and just below the lower curve ρ(δ;C)for A ∼ N (d,m; 0, 1), in all cases. With w±, (8) solves(5) with A ∼ U(d,m; 0, 1) and A ∼ N (d,m; 0, 1) withcurves of mean breakdown points lying just below thelower curve ρ(δ;C).

200

150

100k

50

0

150

100d

50

0

10

20

30

h

Fig. 2. Cliff of asymptotic phase transition curve, ρ(k/200;T )k,

for y = Aw+, and ∥w+∥ℓ0 = k, where A ∼ U(d, 200; 0, 1) andd = 5 : 5 : 195.

0

50

100k

150

200

150

100

50

0

10

20

30

h

d

Fig. 3. Cliff of asymptotic phase transition curve, ρ(k/200;C)k,for y = Aw±, and ∥w±∥ℓ0 = k, where A ∼ U(d, 200; 0, 1) andd = 5 : 5 : 195.

Fig. 4 shows continuous curves of average break-down points for matrices A ∼ U(d, 800; 0, 1) and A ∼N (d, 800; 0, 1) obtained from nine discrete values of d,d = 80 : 80 : 720. It is seen that the mean of 100 break-down points obtained with uniformly and normally dis-tributed matrices and w+ coincide. Moreover, since thecliff is sudden and quite abrupt, it is seen that the curvesof mean breakdown points of 100% success, are very closeto the Donoho-Tanner theoretical estimates ρ(δ;T ) of50% success for A ∼ N (d, 800; 0, 1). A similar situationholds with w± and ρ(δ;C).

4. Standard deviation for mean break-

down points

The standard deviations of the means of 100 break-down points for nine values of d, d = [80 : 80 : 720],are plotted in Fig. 5 for A ∼ U(d, 800; 0, 1) (solidline) and A ∼ N (d, 800; 0, 1) (long dashed line) withw+, and A ∼ U(d, 800; 0, 1) (medium dashed line) andA ∼ N (d, 800; 0, 1) (short dashed line) with w±. It isseen that the standard deviation slightly increases asthe number d of rows of A increases. For the applica-tions, if one subtracts twice the standard deviation from

– 113 –


100 200 300 400 500 600 700

100

200

300

400

500

600

Fig. 4. The top set of three curves is for w+ and the bottom set is

for w±. The two almost coinciding lower curves in each set arethe average breakdown points for matrices A ∼ U(d, 800; 0, 1)and A ∼ N (d, 800; 0, 1) with d = 80 : 80 : 720. The thirdhigher curves are Donoho-Tanner theoretical estimates ρ(δ;T )

and ρ(δ;C) for 50% success for A ∼ N (d, 800; 0, 1).

100 200 300 400 500 600 700

2

4

6

8

10

k

d

Fig. 5. Standard deviation for matrices A ∈ Rd×800 (see text).

the Donoho-Tanner theoretical curves of 50% success,one gets a good estimate for the region of no breakdownpoints. By taking the number of rows of random matrixA as small as possible, compression sensing is acceler-ated.Although the Donoho-Tanner asymptotic theoretical

curves ρ(δ;T ) and ρ(δ;C) have been derived for matricesA ∼ N (d,m; 0, 1) with d and m tending to infinity, itis surprising that the results obtained in this paper forA ∼ U(d,m; 0, 1) with m = 800 still hold for m = 400and m = 200. Partial results with A ∈ Rd×1600 agreewith the results presented here.

5. Conclusion

It was observed numerically by means of 3-dimension-al and 2-dimensional figures that the breakdown pointsof the solution of underdetermined system y = Aw withuniformly distributed random matrices A and sparsevectors w obtained by ℓ1 optimization roughly coincidewith the breakdown points for randomly distributed ma-trices A. Thus the Donoho-Tanner asymptotic phasetransition curves of 50% success which were derivedfor normally distributed matrices from high-dimensionalcombinatorial geometry still hold for uniformly dis-tributed random matrices even for low dimensional ma-trices. The Mathematica programs are available uponrequest from the second author at [email protected].

Acknowledgments

Thanks are due to the anonymous reviewer whosedeep and extensive comments greatly contributed toimprove this letter. This work was supported inpart by JSPS.KAKENHI (B)21340021, (C)20540168,(C)20540193 of Japan, the Natural Sciences and Engi-neering Research Council of Canada and the Centre derecherches mathematiques of the Universite de Montreal.

References

[1] S. S. Chen, D. L. Donoho and M. A. Saunders, Atomic de-

composition by basis pursuit, SIAM Rev., 43 (2001) 129–159;SIAM J. Sci. Comput., 20 (1998) 33–61.

[2] E. Candes and J. Romberg, ℓ1-magic: Recovery of sparse sig-nals via convex programming, http://www.acm.caltech.edu/

l1magic/.[3] E.Candes, J.Romberg and T.Tao, Robust uncertainty princi-

ples: Exact signal reconstruction from highly incomplete fre-

quency information, IEEE Trans. Inform. Theory, 52 (2006)489–509.

[4] D. L. Donoho, For most large underdetermined systems oflinear equations the minimal ℓ1-norm solution is the sparsest

solution, Comm. Pure Appl. Math., 59 (2006), 797–829.[5] D. L. Donoho, High-dimensional centrally-symmetric poly-

topes with neighborliness proportional to dimension, DiscreteComput. Geom., 35 (2006) 617–652.

[6] D. L. Donoho and J. Tanner, Sparse nonnegative solutions ofunderdetermined linear equations by linear programming, in:Proc. Natl. Acad. Sci. USA, Vol. 102, pp. 9446–9451, 2005.

[7] D. L. Donoho and J. Tanner, Neighborliness of randomly-

projected simplices in high dimensions, in: Proc. Natl. Acad.Sci. USA, Vol. 102, pp. 9452–9457, 2005.

[8] D. L. Donoho and J. Tanner, Counting faces of randomly-projected polytopes when the projection radically lowers di-

mension, J. Amer. Math. Soc., 22 (2009) 1–53.[9] Y. Tsaig and D. L. Donoho, Breakdown of equivalence be-

tween the ℓ1-norm solution and the sparsest solution, Signal

Processing, 86 (2006) 533–548.[10] D. L. Donoho, Neighborly polytopes and sparse solution of

underdetermined linear equations, Tech. Rep. 2005-04, De-partment of Statistics, Stanford Univ., 2005.

[11] R. Ashino, T. Nguyen-Ba and R. Vaillancourt, Decoding low-dimensional linear codes by linear programming, Can. Appl.Math. Q., 16 (2008) 241–254.

[12] R. Ashino, T. Nguyen-Ba and R. Vaillancourt, Low-dimen-

sional linear codes with high breakdown points by QR de-composition, Int. J. Pure Appl. Math., 57 (2009) 151–163.

[13] D. L. Donoho and J. Tanner, Observed universality of phasetransitions in high-dimensional geometry, with implications

for modern data analysis and signal processing, Philos. Trans.R. Soc. Lond. Ser. A Math. Phys. Eng. Sci., 367 (2009) 4273–4293.

– 114 –


A quadrature-based eigensolver

with a Krylov subspace method for shifted linear systems

for Hermitian eigenproblems in lattice QCD

Hiroshi Ohno1,2, Yoshinobu Kuramashi1,3, Tetsuya Sakurai2 and Hiroto Tadano2,3

1 Graduate School of Pure and Applied Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8571, Japan

2 Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba,Ibaraki 305-8573, Japan

3 Center for Computational Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan

E-mail ohno het.ph.tsukuba.ac.jp

Received March 31, 2010, Accepted June 25, 2010

Abstract

We consider a quadrature-based eigensolver to find eigenpairs of Hermitian matrices arisingin lattice quantum chromodynamics. To reduce the computational cost for finding eigenpairsof such Hermitian matrices, we propose a new technique for solving shifted linear systemswith complex shifts by means of the shifted CG method. Furthermore, by using integrationpaths along horizontal lines corresponding to the real axis of the complex plane, the number ofiterations for the shifted CGmethod is also reduced. Some numerical experiments illustrate theaccuracy and efficiency of the proposed method by comparison with a conventional method.

Keywords Hermitian matrix, eigenvalue problem, quadrature method, shifted linear sys-tems, lattice QCD


1. Introduction

Eigenproblems arise in many scientific applicationsand in some cases, only a limited set of eigenpairs isneeded. For example, to calculate all-to-all propagatorsin lattice quantum chromodynamics (QCD) [1], it isknown that the contribution of some low-lying eigenval-ues of a large sparse Hermitian matrix called “Hermitianfermion matrix” is dominant.For such eigenproblems, the Implicitly Restarted

Lanczos method or generally, the Implicitly RestartedArnoldi method (IRAM) [2] is one of the conventionalchoices. On the other hand, to find eigenvalues in a givenregion and corresponding eigenvectors with contour inte-grations, the Sakurai-Sugiura (SS) method [3] has beenproposed. The SS method translates a problem of find-ing eigenvalues in a domain surrounded by an integra-tion path into a problem of solving systems of linearequations for some matrices with shifts correspondingto quadrature points of the contour integration. There-fore solving shifted linear systems efficiently plays animportant role in high performance of the SS method.In this paper, we improve the SS method by reducingcomputational cost for solving shifted linear systems.To solve such shifted linear systems efficiently, there

are some ways. One is solving each linear system in par-allel since all shifted linear systems arising in the SSmethod can be solved independently. This high paral-lelism is one of the important features of the SS method.Another way is reducing matrix-vector multiplications ina Krylov subspace linear solver. We consider the latter

in this paper.To this end, we propose following two ideas: first we

adopt a Krylov subspace method for shifted linear sys-tems [4] which requires matrix-vector multiplicationsonly for one seed linear system to solve all shifted linearsystems because of shift invariance of Krylov subspace.Moreover we show that the shifted CG method, whichrequires no more than one matrix-vector multiplicationin each iteration, can be applied to solve such shiftedlinear systems if a coefficient matrix of a seed linear sys-tem is Hermite although coefficient matrices of the othershifted systems are non-Hermite. We show the detail inSection 2. Next, in Section 3, we consider appropriateconfigurations of quadrature points for less iterations fora Krylov subspace solver because the number of itera-tions depends on a shift parameter corresponding to eachquadrature point. Section 4 shows some numerical test toinvestigate the properties of the proposed method and itsefficiency by comparison with PARPACK [5], the soft-ware package to solve eigenvalue problems with IRAMin parallel, with a Hermitian fermion matrix of latticeQCD and Section 5 concludes.

2. The SS method with a Krylov sub-

space method for shifted linear sys-

tems

2.1 The SS method

We consider an eigenvalue problem Ax = λx whereA ∈ Cn×n is a Hermitian matrix and (λ,x) is an eigen-

– 115 –

JSIAM Letters Vol. 2 (2010) pp.115–118 Hiroshi Ohno et al.

pair of A. Let Γ be a positively oriented closed Jordancurve in the complex plane and we introduce a contourintegration

sk ≡1

2πi

∫Γ

zk(zI −A)−1vdz, k = 0, 1, . . . , (1)

where I is an n × n unit matrix and v ∈ Cn is anynonzero vector. According to the residue theorem, skhas only the contribution corresponding to eigenvaluesinside Γ.In the moment based method [3], a moment µk ≡

vHsk is defined and let the Hankel matrix Hm ∈ Cm×m

and the shifted Hankel matrix H<m ∈ Cm×m be

Hm ≡ [µi+j−2]mi,j=1, H<

m ≡ [µi+j−1]mi,j=1,

respectively, where m is the number of eigenvalues insideΓ. Here let S ≡ [s0, · · · , sm−1] ∈ Cn×m, eigenvalues ofthe pencil (H<

m, Hm) are given by λ1, . . . , λm and aneigenvector corresponding to λl is given by xl = Sul

where ul is an eigenvector of (H<m,Hm).

On the other hand, in a Rayleigh-Ritz type approach[6], by constructing an orthonormal basis Q ∈ Cn×m viathe orthogonalization of S, approximate eigenvalues aregiven by the Ritz values of a projected matrix pencil(A, B) where A ≡ QHAQ ∈ Cm×m and B ≡ QHQ ∈Cm×m, respectively, and corresponding eigenvectors aregiven by xl = Qwl where wl is an eigenvector of (A, B).The Rayleigh-Ritz projection method is rather accu-

rate than the moment-based method, however there istrade-off between accuracy and memory consumption.To calculate (1) numerically, the N -point trapezoidal

rule is applied and we approximate sk by

sk =

N−1∑j=0

wjζkj (zjI −A)−1v, (2)

where zj and wj are a quadrature point and a weight,respectively, and ζj ≡ (zj−γ)/ρ is a normalized quadra-ture point satisfying the condition −1 ≤ Re ζj ≤ 1 witha shift parameter γ ∈ C and a scale parameter ρ > 0.In the case of an integration on a circle C with a centerγ and a radius ρ, a quadrature point and a weight aredefined by

zj = γ + ρ e2πiN (j+ 1

2 ), j = 0, 1, . . . , N − 1,

and

wj =zj − γ

N, j = 0, 1, . . . , N − 1,

respectively. Here let ηl ≡ (λl − γ)/ρ and suppose v =∑l αlxl. We can rewrite (2) as sk =

∑l fk(ηl)αlxl/ρ by

means of a filter function defined by

fk(x) ≡N−1∑j=0

wjζkj

ζj − x. (3)

In the case of a unit circle, it has been shown thatfk(x) = xk/(1+xN ) [7] and it suppresses as O(|x|−N+k)outside the circle. This means that sk has nonnegligi-ble contribution corresponding to the eigenvalues out-side the circle due to the approximation of the contourintegration. When we construct S ≡ [s0, · · · , sM−1], M

should be more than m.Note that a block version of the SS method is proposed

in [7], i.e. S is extended to Cn×(M×L) with L differentarbitrary nonzero vectors v1, . . . ,vL. Using this method,we can obtain L degenerate eigenvalues and it is knownthat the accuracy is higher than just increasing M .

2.2 The SS method with the shifted CG method

To calculate (2), shifted linear systems such that

(zjI −A)yj = v (4)

should be solved for each quadrature point zj . In whatfollows, we propose some ideas to reduce the computa-tional cost for solving (4) by the shifted CG method [4].It is known that there is a shift invariance of Krylov

subspace Kk(A, r0) ≡ span(r0, Ar0, . . . , Ak−1r0) with

any shift σ ∈ C such that

Kk(A, r0) = Kk(A+ σI, r0). (5)

In a Krylov subspace linear solver such as the CGmethod, matrix-vector multiplications should be per-formed to update a residual vector rk ∈ Kk+1(A, r0).Because of (5), a residual vector rσk ∈ Kk+1(A+ σI, r0)corresponding to a matrix A+ σI can be given by

rσk = ξσk rk,

with some scalar ξσk , namely once a residual vector ofa seed linear system rk is calculated with matrix-vectormultiplications, residual vectors of any other shifted lin-ear systems rσk , what is more, corresponding solutionvectors are given without additional matrix-vector mul-tiplications. Suppose the computational cost of matrix-vector multiplications is dominant, the computationalcost of the SS method is drastically reduced by 1/N .In addition, we show that only one matrix-vector mul-

tiplication in each iteration is necessary to solve shiftedlinear systems when a coefficient matrix is a Hermitianmatrix with any shift. Consider the BiCG method tosolve a system of linear equations with a coefficient ma-trix σI − A. The BiCG method requires two matrix-vector multiplications in each iteration to update a resid-ual vector rk ∈ Kk+1(σI−A, r0) and its shadow residualvector r∗k ∈ Kk+1((σI − A)H, r∗0). When A is Hermite,we find that

(σI −A)H = (σI −A) + (σ − σ)I.

Because of the shift invariance (5), r∗k is calculated by rkwithout matrix-vector multiplications and accordingly,the computational cost is reduced by half, if r∗0 = r0.Applying this technique to the shifted BiCG method,we can solve many shifted linear systems with only onematrix-vector multiplication in each iteration. If a shiftσ for the seed system is real, (σI − A)H = σI − A, i.e.the seed system and its shadow system are coincident. Inthis case, we can apply the CG method to solve the seedsystem. This means that any shifted linear systems witharbitrary complex shift can be solved with the shiftedCG method when a coefficient matrix corresponding toa seed system consists of a Hermitian matrix with somereal shift. For the above reason, we adopt the shifted CGmethod in this paper.

– 116 –


− 0.015

− 0.01

− 0.005

0

0.005

0.01

0.015

− 0.03 − 0.02 − 0.01 0 0.01 0.02 0.03

Im zj

Re zj

#iter.

2500 3000 3500 4000 4500 5000 5500

eigenvalue

Fig. 1. The distribution of eigenvalues for the Hermitian fermionmatrix which we use in Section 4 and the contour of the numberof iterations for the shifted CG method for zjI−A with tolerancefor the relative residual ||rk||2/||v||2 ≤ 10−12.

3. Integrations along straight lines

Empirically, the number of iterations for the shiftedCG method depends on distribution of eigenvalues nearzero. In terms of shifted matrix zjI − A, the number ofiterations for the shifted CG method tends to increaseif the number of eigenvalues of A close to zj becomeslarger. Fig. 1 shows the distribution of eigenvalues forthe Hermitian fermion matrix which we use in Section4 and the contour of the number of iterations for theshifted CG method to solve a shifted matrix zjI−A withtolerance for the relative residual ||rk||2/||v||2 ≤ 10−12.Actually, the number of iterations for the shifted CGmethod gets larger as zj becomes closer to the real axisand its absolute value of real part becomes larger, whichis consistent with the expectation from the distributionof eigenvalues as mentioned above. Then to reduce thenumber of iterations for the shifted CG method, quadra-ture points should be as far from the real axis as possible.In order to control the number of iterations for the

shifted CG method and accuracy of eigenpairs, we in-troduce a new integration path as follows: let L± be twohorizontal lines such that

L± : z = γ + ρ(x± iβ), −1 ≤ x ≤ 1,

where γ is real. Then N/2 equally-spaced quadraturepoints z0, z2, . . . , zN−2 and z1, z3, . . . , zN−1 are locatedon L+ and L−, respectively. Using this integration path,distance between a quadrature point and the real axisdepends only on βρ unlike the case of C. In this case,a set of weights for a integration w0, w1, . . . , wN−1 isdefined by solving following linear equations,

N−1∑j=0

wjζk−1j =

1, k = 0,

0, k = 1, · · · , N − 1.

Fig. 2 shows filter functions f0(x) defined by (3) forN = 32 quadrature points on L± with β = 0.2, 0.6, 1.0,1.4. For comparison, f0(x) in the case of C is also shown.In the case of C, f0(x) has a plateau in [−1, 1] and expo-nentially suppresses in the other region. However in thecase of L±, there is no plateau and the slope of dump-ing parts depends on β which controls the size of thegap between L+ and L−. Therefore it is expected thatthe accuracy of eigenvalues is the highest right in themiddle of L± and gets lower at the edges, so we accept

10−14

10−12

10−10

10− 8

10− 6

10− 4

10−2

100

− 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 2.5

f 0(x

)

x

circleline, β=1.4line, β=1.0line, β=0.6line, β=0.2

Fig. 2. Filter functions f0(x) for N = 32 quadrature points onL± with β = 0.2, 0.6, 1.0, 1.4. For comparison, f0(x) in the caseof C is also shown as a solid line.

the eigenvalues only in the rectangular formed by N ′

quadrature points near the center γ where N ′ ≥ 4 is aneven number.Note that when we introduce more than two integra-

tion paths lying next to each other L±1 ,L

±2 , . . . which

have the same N , ρ and β, quadrature points can bereused by just shifting them from one integration pathL±k to another L±

k+1 by N ′− 2 points. This is advantageof using the integration path L±.


A Hermitian fermion matrix is defined as A = γ5(I −κD) where κ is a hopping parameter and D is a com-plex non-symmetric sparse matrix explained in [8]. γ5is one of the Dirac γ matrices. We employ the latticesize 123 × 24 which corresponds to a 497, 664 dimen-sional matrix with 25, 380, 864 nonzero components. Ourchoice of κ = 0.13600 is rather close to the critical valueκc = 0.136116(8).Our experiments are carried out on a single node of

T2K-TSUKUBA which has totally 648 nodes providing95.4 Tflops of computing capability. Each node has 4sets of a 2.3 GHz Quad-Core AMD Opteron Model 8356processor and a 8 GBytes DDR2-667 memory.Let us first compare the efficiency of the SS method

between the conventional integration path C and thenew integration path L± by calculating low-lying 6eigenvalues and corresponding eigenvectors. In bothcases, we set γ = 0.0004, ρ = 0.0102, N = 32 and M =24. The gap size between L+ and L− is varied as β =0.2, 0.6, 1.0. We employ the shifted CG method with thestopping criterion for the relative residual ||rk||2/||v||2 ≤10−12 choosing a shift σ = 0 for the seed. Eigenpairs areobtained with the Rayleigh-Ritz projection method.We show the efficiency and the accuracy of the SS

method with C and L± in Table 1, where resl ≡ ||Axl−λlxl||2, ||xl||2 = 1 is the residual for the l-th smallesteigenvalue λl. In the case of L±, we obtain the loweraccuracy of eigenpairs toward the edges of the integra-tion interval as expected in Section 3. We also find thataccuracy of eigenpairs is increased at the cost of thenumber of matrix-vector multiplications as β becomessmaller. This allows us to choose an optimal value ofβ to minimize the computational cost for the required

– 117 –


Table 1. The efficiency and the accuracy of the SS method withC and L± for calculating 6 low-lying eigenvalues and corre-sponding eigenvectors.

path C L±

β - 0.2 0.6 1.0

# matvec 4352 4268 3560 2886time [sec] 399.1 424.8 355.5 293.2

res1 1.8E−11 5.2E−10 3.1E−08 1.5E−06res2 2.8E−10 1.8E−12 1.2E−09 3.0E−07res3 4.0E−10 1.9E−12 1.6E−09 3.3E−07res4 2.9E−10 3.0E−12 7.1E−10 1.1E−07

res5 5.6E−11 2.5E−12 4.0E−10 6.7E−08res6 7.8E−12 6.6E−10 1.6E−08 5.8E−07

precision of eigenpairs. We observe similar efficiency be-tween the C and L± cases. An intriguing finding is thatthe elapsed time for L± with β = 0.2 is larger than thatfor C even if the number of matrix-vector multiplicationsof the former is less than that of the latter. The reasonis that the vector operations in the shifted CG methodrequire nonnegligible computational cost compared tothe matrix-vector multiplication which contains only 51nonzero components in each row in our case.We also compare the efficiency between PARPACK

and the SS method with C and L± by calculating 20low-lying eigenvalues and corresponding eigenvectors.The parameters of the SS method are chosen to sat-isfy the tolerance resl ≤ 10−9 and the stopping crite-rion for PARPACK tol = 10−10. For the SS methodwith C we employ 4 circles with N = 32, M = 24choosing γ = −0.02485,−0.00964, 0.01181, 0.02575 andρ = 0.00435, 0.00960, 0.00860, 0.00431, each of whichcontains 5 eigenvalues. The SS method with L± uses 6pair of lines with N = 32,M = 24, N ′ = 16, ρ = 0.02121and β = 0.2. The other setup for the SS method is thesame as the previous experiment. For PARPACK, thenumber of the Arnoldi vectors is chosen to be four timesthe number of eigenvalues, i.e. 80, in the regular mode.Table 2 shows the efficiency and the accuracy of three

methods. resmax and resmin are the maximum and min-imum value of resl, respectively. There are two impor-tant points. One is that the SS method shows similaror better efficiency and accuracy in comparison withPARPACK thanks to the shifted CG method which re-duces the number of matrix-vector multiplications byabout 1/100. Another is that the SS method with L±

requires less numbers of matrix-vector multiplicationsand quadrature points compared to the C case. This isbecause the L± case requires less number of iterationsfor the shifted CG method and allows us to reuse thequadrature points.

5. Conclusions

We introduce a shifted Krylov subspace method to re-duce the computational cost for the SS method. More-over, we propose a new integration path along straightlines which decreases both the number of iterations fora Krylov subspace solver and the number of quadraturepoints.We calculate some low-lying eigenvalues and cor-

responding eigenvectors of a Hermitian fermion ma-

Table 2. The efficiency and the accuracy of the SS method withC and L± and PARPACK for calculating 20 low-lying eigenval-ues and corresponding eigenvectors.

SS (C) SS (L±) PARPACK

# matvec 7542 6680 7384Total # quad. points 128 102 -

Total time [sec] 1159.4 1020.7 1277.4Time for matvec [sec] 421.2 365.5 406.1

resmax 7.7E−10 2.6E−10 2.8E−12resmin 1.4E−12 7.7E−13 3.6E−15

trix with the SS method and PARPACK. We showthat the SS method becomes efficient comparable withPARPACK thanks to the shifted CG method and ournew integration path is more efficient than the conven-tional one on a circle.Investigating more efficient integration paths and

quadrature rules to reduce computational cost for theSS method is our future plan.

Acknowledgments

Numerical calculations for the present work have beencarried out on the T2K-TSUKUBA computer underthe “Interdisciplinary Computational Science Program”of Center for Computational Sciences, University ofTsukuba. This work is supported in part by Grants-in-Aid for Scientific Research from the Ministry of Edu-cation, Culture, Sports, Science and Technology (Nos.18540250, 20105002, 21105502 and 21246018).

References

[1] J. Foley, K. J. Juge, A. O Cais, M. Peardon, S. M. Ryan andJ. Skullerud, Practical all-to-all propagators for lattice QCD,Comput. Phys. Comm., 172 (2005), 145–162.

[2] D. Sorensen, Implicit application of polynomial filters in a k-step Arnoldi method, SIAM J.Matrix Anal. Appl., 13 (1992),357–385.

[3] T. Sakurai and H. Sugiura, A projection method for general-

ized eigenvalue problems using numerical integration, J.Com-put. Appl. Math., 159 (2003), 119–128.

[4] B. Jegerlehner, Krylov space solvers for shifted linear systems,arXiv:hep-lat/9612014v1.

[5] K. J. Maschhoff and D. C. Sorensen, P ARPACK: An Effi-cient Portable Large Scale Eigenvalue Package for DistributedMemory Parallel Architectures, http://www.caam.rice.edu/

software/ARPACK/.[6] T. Sakurai and H. Tadano, CIRR: a Rayleigh-Ritz type

method with contour integral for generalized eigenvalue prob-lems, Hokkaido Math. J., 36 (2007), 745–757.

[7] T. Ikegami, T. Sakurai and U. Nagashima, A filter diago-nalization for generalized eigenvalue problems based on theSakurai-Sugiura projection method, J. Comput. Appl. Math.,233 (2010), 1927–1936.

[8] T. Sakurai, H. Tadano and Y. Kuramashi, Application ofblock Krylov subspace algorithms to the Wilson-Dirac equa-tion with multiple right-hand sides in lattice QCD, Comput.Phys. Comm., 181 (2010), 113–117.

– 118 –


Algorithm for computing Jordan basis

Kenji Kudo1, Yoshiaki Kakinuma2, Kazuyuki Hiraoka3, Hiroki Hashiguchi1, Yutaka Kuwajima1

and Takaomi Shigehara1

1 Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama City, Saitama 338-8570, Japan

2 NDD Corporation, 2-46-2, Hon-cho, Nakano-ku, Tokyo 164-0012, Japan3 General Education, Wakayama National College of Technology, 77 Noshima, Nada-cho, GoboCity, Wakayama 644-0023, Japan

E-mail sigehara mail.saitama-u.ac.jp

Received March 31, 2010, Accepted July 7, 2010

Abstract

We propose a novel algorithm to compute a Jordan basis (JB) for an arbitrarily given squarematrix. The algorithm is based on the fact that a JB for a linear transformation f is obtainedby extending a JB for the restriction of f to its range R(f). The main ingredient of the algo-rithm is singular value decomposition, and that ensures backward-stability of the algorithm.To enhance the practical utility, we also introduce an automatic mechanism into the algorithmsuch that it outputs all possible Jordan structures close to the exact one of the input matrix.

Keywords Jordan canonical form, Jordan basis, recursive algorithm


1. Introduction

Numerical algorithms to reveal the Jordan structurefor a given square matrix have been constructed mainlyalong three lines; unitary deflation [1–4], matrix power-ing [2, 5], and resolvent analysis [6, 7]. Unitary deflationrelies on the fact in the first step that any square matrixcan be reduced to a block triangular form by successiveapplications of unitary transformations. This procedureis enough to determine the Jordan canonical form (JCF)of the input matrix. However, a further process includ-ing nonunitary transformations is required to constructa Jordan basis (JB) for the matrix. It should be stressedthat, even if such a series of unitary and nonunitarytransformations are established, it is inevitable to mul-tiply all the transformation matrices in order to obtaina JB concretely. The purpose of this paper is to proposea novel numerical algorithm for revealing Jordan struc-ture including JB for an arbitrarily given square matrix.In the proposed algorithm, only unitary deflation is re-quired to obtain a JB as well as the JCF for the inputmatrix. The main tool of the algorithm is singular valuedecomposition (SVD), which guarantees the backwardstability of the algorithm. The existence of tiny singularvalues may disturb the numerical stability. Therefore weintroduce a mechanism into the algorithm such that, foran input square matrix, all possible solutions are auto-matically output. In other words, the algorithm outputsall possible Jordan structures close to the exact one ofthe input matrix, together with the information on nu-merical error associated with each Jordan structure.

2. Theoretical aspects

Let V be a finite-dimensional linear space over C andf a linear transformation on V . The kernel and the range

of f are denoted by N(f) and R(f), respectively. Let µbe a complex constant. An ordered sequence (x1, . . . , xl)of l vectors (l ≥ 1) of V with the property

(f − µ)(xk) = xk−1 (k = 1, . . . , l), x0 ≡ 0 (1)

is called a Jordan sequence (JS) of length l associatedwith the eigenvalue µ of f . The set of JSs associatedwith µ is denoted by Jµ. A JS in Jµ (µ = 0) is calledregular, while a JS in J0 is called singular. A set of JSssuch that the vectors in the JSs compose a basis of V iscalled a Jordan basis (JB) for f .The following theorem gives the theoretical founda-

tion of this paper. Although the assertion is the same asin [8], the proof is rather simplified. The restriction of fto R(f) defines a linear transformation on R(f), that isdenoted by f ′ in the following.

Theorem 1 Let µj = 0 (j = 1, . . . , nr) be the nonzeroconstants. If the JSs

(x(r)j;1, . . . , x

(r)

j;l(r)j

) ∈ Jµj (j = 1, . . . , nr),

(x(s)j;1, . . . , x

(s)

j;l(s)j

) ∈ J0 (j = 1, . . . , n′s)

are a JB for f ′, then f has a JB such that

(x(r)j;1, . . . , x

(r)

j;l(r)j

) ∈ Jµj (j = 1, . . . , nr),

(x(s)j;1, . . . , x

(s)

j;l(s)j

, x(s)

j;l(s)j +1

) ∈ J0 (j = 1, . . . , n′s),

(x(s)n′s+j;1) ∈ J0 (j = 1, . . . , ns),

(2)

where the vectors x(s)n′s+1;1, . . . , x

(s)n′s+ns;1

are a basis of a

complementary space of N(f) ∩R(f) in N(f).

Proof Since x(s)

j;l(s)j

∈ R(f), there exists x(s)

j;l(s)j +1

such

– 119 –

JSIAM Letters Vol. 2 (2010) pp.119–122 Kenji Kudo et al.

that x(s)

j;l(s)j

= f(x(s)

j;l(s)j +1

). Hence there exist the middle

JSs associated with µ = 0 in (2). The images of thesequences

(x(r)j;1, . . . , x

(r)

j;l(r)j

) (j = 1, . . . , nr),

(x(s)j;2, . . . , x

(s)

j;l(s)j +1

) (j = 1, . . . , n′s)

(3)

by f are

(µjx(r)j;1, µjx

(r)j;2 + x

(r)j;1, . . . , µjx

(r)

j;l(r)j

+ x(r)

j;l(r)j −1

)

(j = 1, . . . , nr),

(x(s)j;1, . . . , x

(s)

j;l(s)j

) (j = 1, . . . , n′s),

(4)

respectively. Since µj = 0, the vectors in the sequences in(4) are a basis of R(f) by assumption. Thus, we concludethat the vectors in the sequences in (3) are a basis ofa complementary space of N(f) in V . The remaining

vectors x(s)j;1 (j = 1, . . . , n′

s) are a basis of N(f) ∩ R(f).Hence, by adding the lower JSs in (2), we obtain a JBfor f in (2).

(QED)

3. Proposal of algorithm

3.1 Framework

The constructive proof in the previous section makes itpossible to establish a recursive algorithm for computinga JB for the restriction of f to the generalized eigenspaceassociated with the eigenvalue zero of f . The regular JSsassociated with the eigenvalue µ = 0 of f correspond tothe singular JSs of f − µ, which can be obtained by re-placing f by f − µ in Theorem 1. Thus we are led toan algorithm for computing a JB for the linear transfor-mation f . Here we assume that the distinct eigenvaluesµ1, . . . , µm of f are separately computed in advance.

JB algorithminput: linear transformation f : V −→ V , and all thedistinct eigenvalues µ1, . . . , µm of f .For each eigenvalue µi (i = 1, . . . ,m), repeat 1)–4).

1) Set f(1)i = f − µi and V

(1)i = V .

2) For k = 1, . . . , ti, find the restriction f(k+1)i : V

(k+1)i

−→ V(k+1)i of f

(k)i to V

(k+1)i ≡ R(f

(k)i ), where ti

(ti ≥ 1) is the minimum integer such that f(ti+1)i is

bijective.

3) Set qi;ti = dimV(ti)i − dimV

(ti+1)i . Define

S(f(ti)i ) ≡ (xj) | j = 1, . . . , qi;ti

with a basis x1, . . . , xqi;tiof N(f

(ti)i ). Set p = qi;ti .

4) For k = ti − 1, . . . , 1, repeat a)–c).

a) For each sj = (xj;1, . . . , xj;lj ) ∈ S(f(k+1)i ) (j =

1, . . . , p), solve the linear system

f(k)i (xj;lj+1) = xj;lj

and set ext(sj) ≡ (xj;1, . . . , xj;lj , xj;lj+1). Define

S1(f(k)i ) ≡ ext(sj) | sj ∈ S(f

(k+1)i ), j = 1, . . . , p.

b) Set qi;k = dimV(k)i − dimV

(k+1)i − p. Define

S2(f(k)i ) ≡ (xj) | j = 1, . . . , qi;k

with a basis x1, . . . , xqi;k of a complementary

space of N(f(k)i ) ∩R(f

(k)i ) in N(f

(k)i ).

c) Define S(f(k)i ) ≡ S1(f

(k)i )∪S2(f

(k)i ). Set p = p+

qi;k.

output: S(f(1)1 ) ∪ · · · ∪ S(f

(1)m ).

Note that for each i = 1, . . . ,m, V(k)i = R(fk−1

i ) and

hence f(k)i is just the restriction of f

(1)i = f − µi to

R(fk−1i ) (k = 1, . . . , ti + 1), where R(f0

i ) ≡ V .The output of the JB algorithm gives a JB for f . In

particular, for each i = 1, . . . ,m, the JSs in S(f(1)i ) gives

a JB for the restriction of f to the generalized eigenspaceG(µi) associated with the eigenvalue µi of f . The valueof qi;k in the algorithm means the number of JSs oflength k (k = 1, . . . , ti) associated with the eigenvalueµi, and hence dimG(µi) =

∑tik=1(kqi;k) (i = 1, . . . ,m).

3.2 Matrix representationDenote the set of l1 × l2 complex matrices by Cl1×l2

in general. Let F ∈ Cn×n be a singular matrix of rankr = dimR(F ) < n. Consider the matrix representationF ′ ∈ Cr×r of the restriction of F to R(F ). First wedecompose F to the form

F = IS. (5)

Here I ∈ Cn×r is injective, while S ∈ Cr×n is surjective.The decomposition in (5) is indeed possible. In the pres-ent implementation, we set I = U and S = DV ∗. Here

F = UDV ∗ (6)

is the singular value decomposition (SVD) of F , where

D = diag(σ1, . . . , σr) ∈ Cr×r,

U = (u1, . . . ,ur) ∈ Cn×r, (7)

V = (v1, . . . ,vr) ∈ Cn×r

with the singular values σ1 ≥ · · · ≥ σr > σr+1 = · · · =σn = 0 and the corresponding left and right (orthonor-mal) singular vectors uj and vj (j = 1, . . . , n). The col-umn vectors of I compose a basis of R(F ) and, withrespect to this basis, the matrix representation F ′ of therestriction of F to R(F ) should satisfy FI = IF ′. HenceI(F ′ − SI) = O. Since I is injective, we conclude

F ′ = SI ∈ Cr×r.

This procedure to construct F ′ from F is essentially

similar to the procedure to obtain B(2)11 from B(1) by

a unitarily-equivalent transformation in [2, Section 10],and it is used in step 2) in the JB algorithm.Suppose that we have a singular JS (x′

1, . . . ,x′l) for

F ′. Then, together with xk ≡ Ix′k (k = 1, . . . , l) as well

as a solution xl+1 of the linear system

Fxl+1 = xl, (8)

the sequence (x1, . . . ,xl,xl+1) is a singular JS for F .Since we already have the SVD of F in (6), a solution of(8) is given by using the Moore-Penrose inverse:

xl+1 = V D−1U∗xl = V D−1x′l.

– 120 –


This is used in step 4)-a) in the JB algorithm.To keep numerical accuracy in numerics, we have to

remove tiny singular values in the SVD of F . This iscarried out by introducing a small cut-off parameter ε:If σr′ ≥ σ1ε > σr′+1 ≥ · · · ≥ σr > 0, then set σr′+1 =· · · = σr = 0 in (7). The parameter r′ corresponds tothe numerical δ-rank of F with δ = σ1ε [9], and wenumerically regard u1, . . . ,ur′ as a basis of R(F ) whilevr′+1, . . . ,vn are a basis of N(F ).We may choose different values of the cut-off parame-

ter, according to each eigenvalue as well as each recursivestep associated with the eigenvalue. However, in generalcases, it is hard to know appropriate magnitudes of theparameters in advance. Therefore we introduce an au-tomatic mechanism into the algorithm which makes itpossible to compute JBs in both cases whether each tinysingular value is discarded or not.With these notices, we are led to a possible matrix

representation of the JB algorithm. Here E ∈ Cn×n isthe identity matrix. For an input matrix F ∈ Cn×n, thefollowing algorithm outputs all the possible JBs of therestriction of F to G(µ) for an assigned eigenvalue µ ofF , by making a thorough investigation of the cut-off pa-rameters within a range [εminσ1, εmaxσ1]. Here σ1 is themaximum singular value of SVD under the considera-tion, and the two parameters εmin, εmax (εmin < εmax)are chosen such that the range substantially covers thetiny singular values.

JB algorithm (matrix version)input: square matrix F ∈ Cn×n, an eigenvalue µ of F ,and the parameters εmin, εmax.Set S(µ) = ∅.Set F

(1)n1 = F − µE with n1 = n.

Call cal JB of GE(F(1)n1 , εmax, εmin,S(µ)).

output: S(µ).

proc cal_JB_of_GE(F(k)nk , εmax, εmin,S(µ))

Compute the singular pairs (σ(k)j ,u

(k)j ,v

(k)j ) (j = 1, . . . ,

nk) of F(k)nk , where

σ(k)1 ≥ · · · ≥ σ

(k)

r(k)max

≥ εmaxσ(k)1 > · · · ≥ σ

(k)

r(k)min

≥ εminσ(k)1 > · · · ≥ σ(k)

nk≥ 0.

Repeat the following procedure for r′k = r(k)min, . . . , r

(k)max:

If r′k < nk, then perform a1)–a3).

a1) Define F(k+1)r′k

≡ D(k)r′k

V(k)∗r′k

U(k)r′k∈ Cr′k×r′k with

D(k)r′k

= diag(σ(k)1 , . . . , σ

(k)r′k

) ∈ Cr′k×r′k ,

U(k)r′k

= (u(k)1 , . . . ,u

(k)r′k

) ∈ Cnk×r′k ,

V(k)r′k

= (v(k)1 , . . . ,v

(k)r′k

) ∈ Cnk×r′k .

a2) Set N(F(k)r′k

) = span(v(k)r′k+1, . . . ,v

(k)nk ).

a3) Call cal JB of GE(F(k+1)nk+1 , εmax, εmin,S(µ)) with

nk+1 = r′k.

Otherwise, if r′k = nk, then perform b1)–b4).

b1) Set t = k − 1.

Table 1. Experimental environment.

CPU Intel Core(TM)2 Duo 2.66GHz

Memory 1.99GB

OS Windows XP SP3

Compiler cygwin gcc version 3.4.4

LAPACK version 3.0.8

b2) Set qt = nt − nt+1. Define

S(F (t)nt

) ≡ (v(t)nt+1+1), . . . , (v

(t)nt

).

Set p = qt.

b3) For k = t− 1, . . . , 1, repeat i)–iii).

i) For each s′j = (x′j;1, . . . ,x

′j;lj

) ∈ S(F(k+1)nk+1 ) (j =

1, . . . , p), set ext(s′j) = (xj;1, . . . ,xj;lj ,xj;lj+1)with

xj;z = U (k)nk+1

x′j;z (z = 1, . . . , lj),

xj;lj+1 = V (k)nk+1

D(k)−1

nk+1x′j;lj .

Define S1(F(k)nk ) ≡ ext(s′j) | s′j ∈ S(F

(k+1)nk+1 ), j =

1, . . . , p.ii) Set qk = nk − nk+1 − p. Extend a basis xj;1 |

(xj;1, . . . ) ∈ S1(F(k)nk ), j = 1, . . . , p of N(F

(k)nk ) ∩

R(F(k)nk ) to a basis of N(F

(k)nk ) by appending

x1, . . . ,xqk . Define

S2(F(k)nk

) ≡ (xj) | j = 1, . . . , qk.

iii) Define S(F(k)nk ) ≡ S1(F

(k)nk ) ∪ S2(F

(k)nk ). Set p =

p+ qk.

b4) S(µ) = S(µ) ∪ S(F (1)n1 ).

Note that the multiplicity of the input eigenvalue µis not an input of the algorithm. For each µ, (in generalmore than one) generalized eigenspacesG(µ) determinedfrom the JBs in the output S(µ) might have differentdimensions. Therefore one should select such combina-tions of JBs for G(µi) from S(µi) (i = 1, . . . ,m) that∑m

i=1 dimG(µi) = n.

4. Numerical experiment

Numerical environment is summarized in Table 1. Weuse ZGESVD routine in LAPACK for SVD. Let Jl(µ) bethe Jordan cell of size l associated with the eigenvalue µ.Numerical test is performed by using 100 matrices witha form

F = PJP−1, (9)

where

J =

3⊕i=1

(ni⊕

ji=1

Jlji (µi)

)(10)

is a JCF with the eigenvalues (µ1, µ2, µ3) = (1, 1+α, 10)and P is an invertible matrix with uniform random num-bers in the range [−1, 1] for elements. In (10), ni ∈ [1, 3](i = 1, . . . , 3), lji ∈ [1, 3] (ji = 1, . . . , ni; i = 1, . . . , 3)are random integers. The JCF as well as a JB for F in(9) is obvious by construction, and a comparison withnumerical results is easily made.

– 121 –


We first compute the eigenvalues of F in (9) by us-ing ZGEEV routine in LAPACK. This provides n distincteigenvalues µj (j = 1, . . . , n) in general, where n is ma-trix size of F . To recover the Jordan structure of F , wegroup neighboring eigenvalues together as follows:

1) Set Λ = µ1, . . . , µn. Set i = 1.

2) Repeat i)–iii) until Λ becomes the empty set.

i) Let µ be the eigenvalue with the maximal abso-lute value in Λ. Define

Λi ≡µj ∈ Λ

∣∣∣∣ ∣∣∣∣µj − µ

µ

∣∣∣∣ < 10−4

.

ii) Define µ′i by the average of the eigenvalues in Λi.

iii) Set Λ = Λ− Λi. Set i = i+ 1.

We use the output µ′1, µ

′2, . . . as the input eigenvalues

for the JB algorithm.Let J ′ and P ′ be the JCF and a JB, numerically de-

termined from the output of the JB algorithm. Let W1;2

be the direct sum of the generalized eigenspaces associ-ated with the eigenvalues µ1 = 1 and µ2 = 1 + α, whilelet W3 be the generalized eigenspace associated with theeigenvalue µ3 = 10. Both ofW1;2 andW3 are determinedfrom P in (9). Numerical counterparts of W1;2 and W3

are denoted by W ′1;2 and W ′

3 respectively, that are deter-mined from the output of the JB algorithm. We estimatetwo kinds of numerical errors:

E1 =||FP ′ − P ′J ′||∞||FP ′||∞

,

E2 = maxsin θW1;2,W ′1;2, sin θW3,W ′

3,

where θV1,V2 is the largest canonical angle betweenthe subspaces V1 and V2 in general [10]. E1 measureswhether the numerical JSs indeed satisfy the relation in(1) or not, while E2 measures whether the numerical JBindeed spans the input generalized eigenspaces or not.Fig. 1 shows the results for a well-conditioned case

of α = 1. In this case, we set εmin = εmax = 10−11.The bar and broken line graphs show E1 and E2, re-spectively, for 100 examples. Since εmin = εmax, the JBalgorithm outputs a single JB for each example. FromFig. 1, we confirm that, even with the unsophisticatedcut-off parameters, the generalized eigenspaces as well astheir JBs are computed with high numerical accuracy.Fig. 2 shows the results for an ill-conditioned case of

α = 10−8. Here we set εmin = εmax = 10−5. In this case,owing to the lazy selection of the cut-off parameters, theJB algorithm fails to reproduce the input generalizedeigenspaces. Indeed, we observe E2 > 10−2 in 43 exam-ples among all. (Note that E2 = 10−2 means that thelargest canonical angle is 0.573 degree.) To remedy this,we set εmin = 10−18 and εmax = 10−6 in the JB algo-rithm. Fig. 3 shows the results for this case. In this case,the JB algorithm outputs a number of JBs for each ex-ample in general, and in Fig. 3, we show the result withthe JB which minimizes E2 for each. We confirm fromFig. 3 that the overlap between the input and outputgeneralized eigenspaces is drastically improved. Indeed,we attain E2 < 10−2 in 98 examples of all.In several examples in Fig. 3, we find more than one

JBs that satisfy E1 ≤ 10−12 as well as E2 ≤ 10−3. This

200 40 60 80 100

100

10−2

10−4

10−6

10−8

10−10

10−12

10−14

10−16

10−18

Fig. 1. E1 (bar graph) and E2 (broken line) for 100 examples incase of α = 1. Here we set εmin = εmax = 10−11.

200 40 60 80 100

100

10−2

10−4

10−6

10−8

10−10

10−12

10−14

10−16

10−18

Fig. 2. E1 (bar graph) and E2 (broken line) for 100 examples incase of α = 10−8. Here we set εmin = εmax = 10−5.

200 40 60 80 100

100

10−2

10−4

10−6

10−8

10−10

10−12

10−14

10−16

10−18

Fig. 3. E1 (bar graph) and E2 (broken line) for 100 examples in

case of α = 10−8. Here we set εmin = 10−18 and εmax = 10−6.

indicates that there exist a number of Jordan structuresclose to the input matrix F , and the JB algorithm suc-ceeds in reproducing them.

Acknowledgments

We are grateful to the anonymous referee for helpfulcomments, which served to improve the quality of thispaper. This work was partially supported by Grant-in-Aid for Scientific Research (C) No.19560058.

References

[1] V. N. Kublanovskaya, On a method of solving the completeeigenvalue problem for a degenerate matrix, USSR Comput.Math. Math. Phys., 6 (1966) 1–14.

[2] G.H.Golub and J.H.Wilkinson, Ill-conditioned eigensystems

and the computation of the Jordan canonical form, SIAMRev., 18 (1976) 578–619.

[3] B. Kagstrom and A. Ruhe, An algorithm for numerical com-

putation of the Jordan normal form of a complex matrix,ACM Trans. Math. Software, 6 (1980) 398–419.

[4] D. S. Watkins, The Matrix Eigenvalue Problem, SIAM,Philadelphia, 2007.

[5] G. Ohtake, M. Koga and M. Sampei, A method for numer-ical computation of Jordan canonical form of matrix (inJapanese), Trans. ISCIE, 15 (2002) 320–326.

[6] T. Suzuki and T. Suzuki, An eigenvalue problem for deroga-

tory matrices, J. Comput. Appl. Math., 199 (2007) 245–250.[7] T. Suzuki and T. Suzuki, Computing the Jordan canonical

form in the finite precision arithmetic, in: Proc. of SCAN2006, p.39, 2006.

[8] J. I. Hall, Another elementary approach to the Jordan form,Amer. Math. Monthly, 98 (1991) 336–340.

[9] A. Bjorck, Numerical Methods for Least Squares Problems,SIAM, Philadelphia, 1996.

[10] G. W. Stewart, Matrix Algorithms, Vol. II: Eigensystems,SIAM, Philadelphia, 2001.

– 122 –


On the pass rate of NIST statistical test suite

for randomness

Akihiro Yamaguchi1, Takaaki Seo1 and Keisuke Yoshikawa1

1 Department of Information and Systems Engineering, Fukuoka Institute of Technology, Wajiro3-30-1, Higashi-ku, Fukuoka 811-0295, Japan

E-mail aki fit.ac.jp

Received March 31, 2010, Accepted September 6, 2010

Abstract

In this paper, the pass rate of the NIST SP800-22 statistical test suite for the ideally truerandom sequences is analyzed by the simulation of statistical tests, and derived by the theo-retical analysis under the assumption that there are no correlation among tests. As examplesof chaos based system, Vector Stream Cipher (VSC128S) and the encryption system usingArnold’s cat map are tested. The test results are compared with the theoretical one for thetrue random sequences and validity of presented analysis is discussed.

Keywords random number, chaos, randomness, statistical test, NIST SP800-22

Research Activity Group Applied Chaos

1. Introduction

As an application of chaos, random number generatorsand pseudo random number generators based on chaoticdynamics have been well investigated in various scientificand engineering fields including information security. Fora correct and safety application, the randomness of gen-erated sequences and its evaluation are very important.In order to evaluate the randomness, several statisticaltest suites have been proposed. NIST SP800-22 is one ofthe statistical test suites, and it was used for the evalu-ation of AES candidates [1, 2]. For chaos based randomand pseudo random number generators, this test suiteis also useful to evaluate the randomness of generatedsequences.NIST SP800-22 consists of 15 kinds of statistical tests

and provides the criteria to determine whether given se-quences are random or not for each statistical test. How-ever, it was not mentioned in the criteria how many theratio of passing all the 15 kinds of tests should be for thetarget generator to be regarded as the perfect randomnumber generator. In this paper, the test suite NISTSP800-22 is focused on and its statistical properties forthe idealized perfect random number generator are nu-merically analyzed. The results are compared with typ-ical pseudo random number generators including chaosbased system.

2. Statistical test of randomness

The random bit sequence should be independent andunpredictable. These are also characteristics of chaoticdynamics. Therefore, chaotic dynamics is one of the can-didates for the basic mechanism to produce random bitsequences. Since randomness is a probabilistic property,it can be characterized and described in terms of prob-ability.There are various statistical tests that can be applied

to a sequence to attempt to compare and evaluate thesequence to a true random sequence. The Special Publi-cation (SP) 800-22 revision 1 proposed by National Insti-tute of Standards and Technology (NIST) is a statisticaltest suite that consists of the 15 kinds of statistical testsof randomness [1]. The list of 15 tests and tests param-eters are shown in Tables 1 and 2.According to [1], the basic testing process common to

each test is explained in the following. Targets of testare binary sequences of ‘0’ and ‘1’ with the length n.Here, the number of tested sequence is m and it is calledsample size. At a test for one sequence, a statistics calledP-value is calculated from the tested sequence. The P-value is the probability that a perfect random numbergenerator would have produced a sequence less randomthan the tested sequence. It is determined whether thetested sequence is random or not random by the testinghypothesis. The null hypothesis H0 under test is thatthe sequence being tested is random, and the alternativehypothesis H1 is that the sequence being tested is notrandom. If P-value ≥ α, the null hypothesis is accepted(i.e., the target is random), and otherwise rejected (i.e.,the target is not random), where α = 0.01 is a signifi-cance level of the testing hypothesis.For one test, m sequences are tested and m P-values

are obtained. Thus, the m decisions of randomness areobtained for each test. These are only individual deci-sions for each sequence. As a holistic interpretation ofthese results, NIST adopted to include the following twoconditions to determine the target sequences are holisti-cally random or not random.

Condition 1 Let ξ be a proportion of accepted se-quences for the tests and it is called the pass rate ofsequences. If ξ is in the acceptable interval

p− 3

√p(1− p)

m≤ ξ ≤ p+ 3

√p(1− p)

m, (1)

– 123 –

JSIAM Letters Vol. 2 (2010) pp.123–126 Akihiro Yamaguchi et al.

Table 1. List of NIST SP800-22 statistical tests.

No. Test Name

1 The Frequency (Monobit) Test

2 Frequency Test within a Block

3 The Runs Test

4 Tests for the Longest-Run-of-Ones in a Block

5 The Binary Matrix Rank Test

6 The Discrete Fourier Transform (Spectral) Test

7 The Non-overlapping Template Matching Test

8 The Overlapping Template Matching Test

9 Maurer’s “Universal Statistical” Test

10 The Linear Complexity Test

11 The Serial Test

12 The Approximate Entropy Test

13 The Cumulative Sums (Cusums) Test

14 The Random Excursions Test

15 The Random Excursions Variant Test

Table 2. Parameters used for NIST SP800-22 test suite.

Test Name Block Length

Frequency Test within a Block 20,000

The Non-overlapping Template Matching Test 9

The Overlapping Template Matching Test 9

Maurer’s “Universal Statistical” Test 7

The Linear Complexity Test 500

The Serial Test 10

The Approximate Entropy Test 10

Table 3. Count of passing each test and passing all of the 15 testsfor 100 times iterations (m = 1, 000).

Test Number ofCount of passing tests

No. sub-testsout of 100 repetitions

VSC128S CatMap2D AES SHA-1

1 1 100 100 100 100

2 1 100 100 100 100

3 1 100 100 99 100

4 1 100 99 100 99

5 1 100 99 100 100

6 1 100 95 96 97

7 148 66 64 54 53

8 1 98 99 98 98

9 1 98 97 95 97

10 1 100 99 99 100

11 2 99 100 99 100

12 1 99 100 100 99

13 2 100 100 100 99

14 8 96 95 96 91

15 18 95 95 96 98

All(1-15) 188 (total) 56 55 43 41

the sequences are random, otherwise not random, wherep = α− 1.

This interval corresponds to the three times of thestandard deviation of ξ for the true random sequencesthat are produced by the perfect random number gener-ator.

Condition 2 The distribution of P-values is examinedto ensure uniformity. If the obtained P-values are uni-form, the sequences are random, otherwise not random.

Uniformity is determined by the χ2-test on the ob-tained P-values. The interval between 0 and 1 is dividedinto 10 sub-intervals, and the P-values that lie withineach i-th sub-interval are counted as Fi. Then a P-value

is calculated as

P-value T = igamc

(9

2,χ2

2

), (2)

where igamc is the incomplete gamma function and

χ2 =10∑i=1

(Fi −

m

10

)2m

10

. (3)

If the condition

P-value T ≥ α T = 0.0001 (4)

is satisfied, then the P-values can be considered to beuniformly distributed.Results of applying the NIST statistical test suite

are shown in Table 3. Here, Vector Stream Cipher(VSC128S) [3] and the encryption system using two di-mensional cat map (CatMap2D) [4] are tested as ex-amples of chaos based generator. VSC128S is a streamcipher designed by ChaosWare, and it consists of thecombination of 8 pseudo chaotic one dimensional mapswith 32bits operation. CatMap2D is also a steam cipher,and it adopts combined 4 two dimensional Arnold’s catmaps instead of 8 one dimensional maps in VSC128S. Incomparison, SHA-1 and AES are also tested. The for-mer is a hash function and hash results of plane textsare tested as a pseudo random sequence. The latter isa typical standard encryption system and the encryptedtexts are tested.For each system, NIST statistical test suite is applied

100 times for m = 1, 000 binary sequences with lengthn = 1, 000, 000, and the number of passing each kindof test and the number of passing all of the 15 testsare counted. These tests are performed by the test suiteprogram version 2.0b provided by NIST with the correc-tion of the assessment condition and the discrete Fouriertransform test. In the original program (version 2.0b),the former condition that corresponds to (1) was de-scribed only for the case m = 100. The latter test wascorrected according to Kim et al. [5] since the varianceof theoretical distribution was not yet corrected in NISTSP800-22 revision 1.As a result, the rate of passing all 15 tests is around

55% for VSC128S and CatMap2D, and around 40% forSHA-1 and AES, respectively. It is to be noticed thatthere are obvious differences between their pass rates ofall of the tests.

3. Simulation of NIST SP800-22

NIST SP800-22 is based on the testing hypothesis us-ing P-values. Since the P-values uniformly distribute inthe interval between 0 and 1 for the ideally true randomsequences, their statistical tests can be simulated by theMonte Carlo method in which uniformly distributed ar-tificial P-values are produced randomly. The procedureof the simulation of statistical test suite is follows.

Step 1 An artificial P-value that uniformly distributesin the interval between 0 and 1, is produced ran-domly. This P-value corresponds to the p-value cal-culated for the ideally true random sequence.

– 124 –


Step 2 The condition P-value > α = 0.01 is examinedfor the produced P-value.

Step 3 Steps 1 and 2 are iterated m times and the passrate ξ and the P-value T for one test are calculated.Conditions 1 and 2 are examined and pass or non-pass of one test is determined.

Step 4 Steps 1 to 3 are iterated K times, where K cor-responds to the number of statistical tests (includ-ing sub-tests). Then, pass or non-pass of all of Ktests is determined.

Some of the statistical tests constituted NIST SP800-22 have several sub-tests. Therefore, the actual numberof tests is the total number of sub-tests. For an example,the non-overlapping template marching test consists of148 sub-tests corresponding to different templates whenthe template length is 9. The number of sub-tests foreach test is also shown in Table 3, and the total numberK of tests is 188 for the parameters in Table 2.As results of this simulation, pass rates PC1, PC2,

PPass1, and PPassK could be obtained, where PC1 de-notes the pass rate of Condition 1, PC2 denotes the passrate of Condition 2, PPass1 denotes the pass rate of onetest, and PPassK denotes the pass rate of all of K tests.Furthermore, the correlation coefficient that is denotedby ρC1,C2 between Conditions 1 and 2 could also be es-timated.

4. Probability of passing all tests

The probability of passing all tests was firstly analyzedby Okutomi et al. for Condition 1 [6]. Their analysis was,however, insufficient to obtain the probability of passingall tests, since the upper limit of the acceptable intervalfor Condition 1 (Eq. (1)) was ignored and Condition 2was not concerned. For more precise analysis, this paperfocuses on the probability of passing all tests for both ofConditions 1 and 2.For one test, pass or non-pass is determined by Condi-

tions 1 and 2, as previously mentioned in Section 2. Sincethe latter is also the testing hypothesis for obtained P-values, the pass rate PC2 of Condition 2 is determinedby its significance level α T such that PC2 = 1 − α T =0.9999.On the other hand, the pass rate PC1 of Condition 1

is determined by the distribution of ξ that is the pro-portion of accepted sequences. The probability that onetrue random sequence passes the test is p = 1 − α bythe definition of testing hypothesis. Since the probabil-ity that k out of m true random sequences pass the test,obeys the binomial distribution, it is obtained as

P (k;m) = mCk × pk × (1− p)m−k, (5)

where k/m corresponds to ξ. The average of ξ is

µξ = p, (6)

and the standard deviation of ξ is

σξ =

√p(1− p)

m. (7)

Therefore, the pass rate of Condition 1 is exactly ob-

Table 4. Theoretical values of pass rates obtained by the bino-mial distribution and the Gaussian distribution (K = 188).

(a) Binomial distribution.

m PC1 PPass1 PPassK = (PPass1)K

100 98.162596% 98.152780% 3.003929%

500 99.479195% 99.469247% 36.770592%

1000 99.666846% 99.656880% 52.404673%

10000 99.689933% 99.679964% 54.736967%

(b) Gaussian distribution.

m PC1 PPass1 PPassK = (PPass1)K

—– 99.730020% 99.720047% 59.034450%

Table 5. Estimated pass rates and correlation coefficient betweenConditions 1 and 2 (K = 188).

m PC1 PC2 PPass1 ρC1,C2 PPassK

100 98.17% 99.992% 98.16% −0.0012 3.20±0.18%

500 99.43% 99.990% 99.42% −0.0008 35.81±0.48%

1000 99.66% 99.991% 99.65% −0.0006 52.08±0.50%

10000 99.67% 99.993% 99.66% −0.0005 54.78±0.50%

tained as

PC1 =

k1∑k=k0

P (k;m), (8)

where

k0 = max(⌈(µξ − 3σξ)×m⌉ , 0), (9)

and

k1 = min(⌊(µξ + 3σξ)×m⌋ ,m). (10)

Here, the range k0 ≤ k ≤ k1 corresponds to the range ofacceptance in (1). Since this range corresponds to threetimes of the standard deviation σξ, the Gaussian approx-imation of PC1 is also obtained by the error function aserf(3/

√2). In the analysis given by Okutomi et al. [6],

the upper limit k1 was fixed to m. Therefore, in thecase of m = 1, 000, their obtained probability PC1 =0.996712 is slightly larger than the correct probabilityPC1 = 0.99666846 (Table 4-(a)).The probability of passing one test PPass1 is the prob-

ability that both Conditions 1 and 2 are simultane-ously satisfied. If Conditions 1 and 2 have no correlation,PPass1 is obtained as the direct product

PPass1 = PC1 × PC2. (11)

Furthermore, if results for the K tests in the test suite,are not correlated to each other, the pass rate of all ofthe K tests, PPassK , is also obtained as

PPassK =K∏

k=1

PPass1 = (PPass1)K . (12)

5. Numerical results and discussions

Results of numerical simulation are shown in Table5, where K = 188, PC1, PC2, PPass1 and ρC1,C2 wereestimated by the 100,000 times simulations of Steps 1to 3, and PPassK was estimated by the 10,000 timessimulations of Steps 1 to 4. The standard error of PPassK

– 125 –


1

10

100

1000

10000

0.975 0.98 0.985 0.99 0.995 1

Fre

quen

cy o

f ξ

Pass rate ξ

Distribution of pass rate ξ (m = 1,000)

Gaussian distributionBinomial distribution

Frequency of ξ

(a) m = 1, 000.

0.1

1

10

100

1000

0.986 0.988 0.99 0.992 0.994

Fre

qu

ency

of

ξ

Pass rate ξ

Distribution of pass rate ξ (m = 10,000)

Gaussian distributionBinomial distribution

Frequency of ξ

10000

(b) m = 10, 000.

Fig. 1. Distribution of pass rate ξ for 200,000 samples.

is also shown as an error range of estimation. In thecase that m = 1, 000, the estimated pass ratio PPassK

is around 52%, and this value is close to the pass ratefor VSC128S and CatMap2D shown in Table 3. Sincethe correlation coefficient ρC1,C2 is too small, the passrate of Condition 1 and the pass rate of Condition 2 arealmost independent of each other.The distributions of ξ were also obtained in cases of

m = 1, 000 and m = 10, 000 to examine its convergenceto the Gaussian distribution. The results are shown inFig. 1, where the red squares represent the distributionof ξ, the green solid lines represent the Gaussian dis-tribution, and the blue crosses represent the binomialdistribution. For the Gaussian distribution and the bi-nomial distribution, their averages and variances are µξ

and σ2ξ , respectively. These results indicate that the dis-

tribution of ξ obeys the binomial distribution, and itapproaches asymptotically to the Gaussian distributionwhen the sample size m is increased.The pass rates PC1, PPass1 and PPassK directly calcu-

lated by the (8), (11), (12) and Gaussian approximation,are also shown in Table 4. The pass rate PC2 is 0.9999as previously mentioned. These theoretical values givenby the binomial distribution, agree with the estimatedvalues shown in Table 5, and they converge to the val-ues given by the Gaussian distribution as increasing thesample size m.As shown in Table 4-(a), the pass rate of all of the

K tests, PPassK , is almost 52% for the true random se-

quences in the case that K = 188 and the sample sizem = 1000. Furthermore, its limit of m to infinity, isalmost 59% (Table 4-(b)) that is given by the Gaus-sian approximation, since the binomial distribution con-verges to the Gaussian distribution for sufficiently largem. These results indicate that the sufficient number ofrepetition of the test suite is necessary to examine theproportion to pass all of the tests.In this paper, two kinds of independencies are assumed

for the analysis of pass rates. One is the independencybetween Conditions 1 and 2 at one test, and the other isthe independency among the tests constituted the testsuite. The former independency is supported by the re-sults for the correlation coefficients shown in Table 5.Although plausibility of the latter independency couldnot be mentioned here, if there are some positive corre-lations among the actual tests, the pass rate of all of thetests PPassK is expected to take larger value than (12).Even in such case, (12) and its values shown in Table4 might be useful guideline for the decision of the per-fect random number generator, since they give the lowerbound of PPassK .

6. Conclusions

The pass rate of the NIST SP800-22 statistical testsuite for the ideally true random sequences was ana-lyzed under the assumption that there are no correla-tions among tests. The obtained pass rate was close tothe results for actual pseudo random number generatorsbased on chaos. An analysis including the correlationamong tests is one of the future works.

Acknowledgments

One of the authors (A. Y.) wishes to express his grat-itude to Dr. K. Umeno, Mr. H. Terai and Dr. S. J. Kimfor their interesting and fruitful discussions. Authors (T.S. and K. Y.) had joined this research when they werestudents of Fukuoka Institute of Technology. Their cur-rent affiliations are Miyazaki Jyoho Center (T. S.) andTechno Systems Co. (K. Y.).

References

[1] A. Rukhin et al., A statistical test suite for random and pseu-

dorandom number generators for cryptographic applications,NIST Special Publication 800-22 Revision 1, 2008.

[2] J. Soto and L. Bassham, Randomness testing of the advancedencryption standard finalist candidates, NIST IR 6483, http:

//csrc.nist.gov/publications/nistir/ir6483.pdf, 2000.[3] K. Umeno, Specification of VSC128S (in Japanese), http://

www.chaosware.com/vsc128s.pdf, 2004.

[4] T. Araki and A. Yamaguchi, Statistical analysis of the VSCencryption system using two dimensional cat map (in Japa-nese), in: Proc. of 55th NCTAM, pp.185–186, 2006.

[5] S.J.Kim, K.Umeno and A.Hasegawa, On the NIST statistical

test suite for randomness, IEICE Tech.Rep., 103(499) (2003),21–27.

[6] H.Okutomi and K.Nakamura, A study on rational judgementmethod of randomness property using NIST randomness test

(NIST SP. 800-22) (in Japanese), IEICE Trans. A, J93-A(2010), 11–22.

– 126 –


Parallel stochastic estimation method

of eigenvalue distribution

Yasunori Futamura1, Hiroto Tadano1 and Tetsuya Sakurai1

1 Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tenn-odai, Tsukuba-shi, Ibaraki 305-8573, Japan

E-mail futamura mma.cs.tsukuba.ac.jp

Received July 29, 2010, Accepted October 22, 2010

Abstract

Some kinds of eigensolver for large sparse matrices require specification of parameters thatare based on rough estimates of the desired eigenvalues. In this paper, we propose a stochasticestimation method of eigenvalue distribution using the combination of a stochastic estimator ofthe matrix trace and contour integrations. The proposed method can be easily parallelized andapplied to matrices for which factorization is infeasible. Numerical experiments are executedto show that the method can perform rough estimates at a low computational cost.

Keywords eigenproblem, contour integration, stochastic estimation


1. Introduction

Interior eigenvalue problems arise in many kinds ofscientific calculation, and they are the most time con-suming part of these calculations. To solve these eigen-value problems, the Arnoldi method with the shift in-vert technique [1], the Jacobi-Davidson method [1], andthe Sakurai-Sugiura method [2,3] are reasonable choices.These methods require the specification of their param-eters, such as shift points, the number of basis vectors,or closed curves on the complex plane. One could specifythese parameters effectively if one had a rough estima-tion of the eigenvalue distribution. To estimate this dis-tribution, some methods have been proposed, includingthe method using Sylvester’s law of inertia and the al-gebraic substructure method [4]. Both methods requirea matrix factorization, such as the LDLT factorization.However, it is not feasible to apply these method to largesparse matrices or matrices that are only referenced inthe form of matrix-vector multiplications. In this paper,we propose a stochastic estimation method of the eigen-value distribution that is based on a stochastic estimatorof the matrix trace. We evaluate the performance of theproposed method by applying it to matrices from prac-tical applications.This paper is organized as follows. In Section 2, a

stochastic estimator of an eigenvalue distribution andits parallelization are described. We show a simple im-plementation of our method in Section 3. In Section 4,we investigate the performance of our method throughnumerical experiments with four matrices from MatrixMarket [5] and a matrix derived from a real-space den-sity functional calculation. This is followed by the con-cluding remarks in Section 5.

2. A stochastic estimator of eigenvalue

distribution

Let A,B ∈ Cn×n, z ∈ C be such that (zB − A) is aregular matrix pencil. It is known that matrices A, B canbe decomposed A = URV H, B = UTV H, where R, Tare upper triangular matrices whose diagonal elementsare rjj , tjj , respectively, and U , V are unitary matrices.Since

(zB −A)−1B = V (zT −R)−1TV H,

and the matrix trace is similarity-invariant,

tr((zB −A)−1B) = tr((zT −R)−1T )

=n∑

j=1

tjjztjj − rjj

=n′∑j=1

1

z − λj, (1)

where

tjj

= 0 (1 ≤ j ≤ n′),

= 0 (n′ + 1 ≤ j ≤ n),

and λj = rjj/tjj (j = 1, 2, . . . , n′) are finite eigenvaluesof the matrix pencil (A,B).When the contour integration

µ =1

2πi

∮Γ

tr((zB −A)−1B)dz

=1

2πi

∮Γ

n′∑j=1

1

z − λjdz (2)

is performed, the eigenvalue count µ in a positively ori-ented Jordan curve Γ is derived by the residue theorem.To discretize (2), an N -point quadrature rule is applied

– 127 –

JSIAM Letters Vol. 2 (2010) pp.127–130 Yasunori Futamura et al.

and we approximate µ by

µ ≈ µ =

N−1∑k=0

wk tr((zkB −A)−1B), (3)

where zj and wj are a quadrature point and a weight,respectively. In the case of the trapezoidal rule on a circlewith a center γ and a radius ρ, quadrature points andweights are defined by

zk = γ + ρe2πiN (k+ 1

2 ), k = 0, 1, . . . , N − 1,

and

wk =zk − γ

N, k = 0, 1, . . . , N − 1,

respectively, where i is the imaginary unit. Accordingto [3], when the contour path is a circle, (3) is writtenas

µ =n′∑j=1

1

1 +

(γ − λj

ρ

)N, (4)

where |(γ − λ1)/ρ| ≤ |(γ − λ2)/ρ| ≤ · · · ≤ |(γ − λn′)/ρ|.Let m′ be an integer such that ρ/1 + [(γ − λj)/ρ]

N =O(ε) for any j with m′ < j ≤ n′ for sufficiently smallε > 0. Then (4) can be expressed as

µ =

m′∑j=1

1

1 +

(γ − λj

ρ

)N+O(ε). (5)

Thus, the eigenvalues that exist nearby and outside of Γare attributed to quadrature error.According to [6,7], an unbiased estimation of the ma-

trix trace is given by

tr((zkB −A)−1B) ≈ 1

s

s∑j=1

vjT(zkB −A)−1Bvj , (6)

where s is the number of sample vectors and vj are vec-tors whose entries take 1 or −1 with equal probability.Using (6), one can estimate µ as

µ ≈ µ

=1

s

N−1∑k=0

wk

s∑j=1

[vjT(zkB −A)−1Bvj ]. (7)

Thus, the most time consuming part of the estimationof the trace of (zkB − A)−1B is the solution of s inde-pendent linear systems

(zkB −A)xkj = Bvj ,

j = 1, 2, . . . , s, k = 0, 1, . . . , N − 1. (8)

The subscript of xkj refers the sample vector vj and the

superscript refers the quadrature point zk. If the matri-ces A and B are large sparse matrices or they are onlyreferenced in the form of matrix-vector multiplications,an iterative method is a reasonable choice to solve theselinear systems. Additionally, if B is the identity matrixI, the linear systems (8) are written as (zkI−A)xk

j = vj .In this case, the shifted Krylov subspace method [8, 9]

1: Input : A,B, α, β, nc, N, s2: Output : µ1, µ2, . . . , µnc

3: Set vj whose elements take 1 or −1 with equal prob-ability, for j = 1, 2, . . . , s

4: ρ = (β − α)/2nc

5: for ℓ = 1, 2, . . . , nc do6: γℓ = α+ (2ℓ− 1)ρ7: zℓk = γℓ + ρe(2πi/N)(k+1/2)

8: Solve (zℓkB − A)xℓkj = Bvj for j = 1, 2, . . . , s,

k = 0, 1, . . . , N − 19: µℓ = [ρ/(sN)]

∑N−1k=0 e(2πi/N)(k+1/2)

∑sj=1 vj

Txℓkj

10: end for

Fig. 1. Algorithm.

can be applied to solve simultaneously the linear systems(zkI−A)xk

j = vj for the scalar parameters zk. By usingthe shifted Krylov subspace method, the total numberof matrix-vector multiplications in each iteration is re-duced to 1/N that of solving N systems separately bythe normal Krylov subspace method. When A is a realsymmetric matrix, (zkI − A) is a complex symmetric(but not Hermitian) matrix. The shifted conjugate or-thogonal conjugate gradient (COCG) method [10,11] isa reasonable choice to solve linear systems of complexsymmetric matrices. Furthermore, our method does notrequire the solution vectors xk

j , but only the inner prod-

ucts vjTxk

j . In such a case, we can calculate these innerproducts by scalar recurrences (see [11]). Thus, mem-ory allocation for the solution vectors and the auxiliaryvectors of the shifted systems is not required.A stochastic estimation method of the eigenvalue dis-

tribution is defined by the estimator of the eigenvaluecount straightforwardly. Let Γ be a given Jordan curve,D the domain closed by Γ, and Γℓ (ℓ = 1, 2, . . . , nc)a Jordan curve which closes sub-domain Dℓ such thatD = D1 + D2 + · · · + Dnc . It is easy to see that theestimations of the eigenvalue count in Γℓ can be exe-cuted independently. Below this independence, there isanother independence: that of the solutions of the linearsystems (8). Furthermore, the linear solver can be par-allelized, if it is possible. Thus, our method is efficienton modern massively parallel computing environments.

3. Implementation

In this section, we describe a simple implementationof our method in which A is a Hermitian matrix andB is a non-singular Hermitian matrix. The algorithm ofthe implementation is shown in Fig. 1. For simplicity, weassume the Jordan curves are circles. This algorithm es-timates the eigenvalue distribution in the interval [α, β]on the real axis. nc circles are placed so that each circleoccupies an equally separated sub-interval. ρ is the ra-dius of all circles and γℓ is the center of the ℓth circle.µℓ is the estimated eigenvalue count in the ℓth circle.The same number of quadrature points N is set for eachcircle.

– 128 –


Table 1. Matrix properties.

Matrix pencil Size nnz(A) nnz(B) Type(A) Type(B) Center Radius #eig in Γ

LUND 147 1298 1294 Indefinite Indefinite 1.0× 104 1.0× 104 40BCSST07 420 4140 3836 Positive definite Positive semi-definite 0.23 0.17 398PLAT1919 1919 17159 — Indefinite — 2.0× 107 2.5× 107 40BCSST13 2003 42943 11973 Positive definite Positive semi-definite 3.0× 103 2.0× 103 11


In this section, we perform numerical experiments toevaluate the efficiency of our method by using the al-gorithm shown in Fig. 1. Examples 1 and 2 are carriedout using Matlab 7.4, and Example 3 is carried out us-ing PGI Fortran 90. All operations are done in doubleprecision arithmetic.

4.1 Example 1

In Example 1, we investigate how the eigenvalue countchanges for an increase in the number of quadraturepoints N . We evaluate the effect of numerical integra-tions (3) on the eigenvalue count without trace estima-tions. The exact value of the matrix trace is calculatedusing the relation described in (1). The eigenvalues λj

are obtained by Matlab function eig. The test prob-lems were taken from Matrix Market; their propertiesare shown in Table 1. All eigenvalue problems are thatof real symmetric matrices. We set nc = 1 for the al-gorithm. Columns nnz(A) and nnz(B) show the numberof non-zero entries of matrices A and B, respectively.Columns Type(A) and Type(B) show the properties ofA and B. Columns Center and Radius show the centerand radius of the circles, respectively. The column #eigin Γ shows the number of eigenvalues in Γ. The numberof eigenvalues is calculated by using the results of eig.The number of quadrature points N is set to be 4, 8,16, 32, and 64. The results of this example are shown inTable 2. All results converge to the exact values.

4.2 Example 2

In Example 2, we investigate how the eigenvalue countchanges for an increase in the number of sample vectorss. The test matrices used are the same as those in Ex-ample 1, s is set to from 10 to 1000, nc is set to 1, andthe linear systems are solved using the Matlab functionmldivide. The number of quadrature points is set toN = 16. The elements of the sample vectors are given bythe Matlab function rand, and their random seed is setby rand(’twister’,5489). The results of this exampleare shown in Table 3. We consider the exact eigenvaluecount µ to be that shown for the N = 16 case in Table 2.Increasing s does not much effect the efficiency or accu-racy of the eigenvalue count, even though it increasesthe computational cost. The trace estimation is slow inconverging to the exact value because the convergencerate is O(

√s). Similar results on trace estimations are

shown in [6].

4.3 Example 3

In Example 3, the test matrix is derived from real-space density functional calculations [12,13]. It is a stan-dard eigenvalue problem Ax = λx, where A is a real


Neigenvalue count

LUND BCSST07 PLAT1919 BCSST13

4 38.024 318.03 55.559 10.9178 38.268 364.80 42.350 10.926

16 38.880 392.98 40.606 10.98832 39.373 397.89 39.945 11.000

64 39.749 398.00 39.540 11.000exact 40.000 398.00 40.000 11.000


#vectorseigenvalue count

LUND BCSST07 PLAT1919 BCSST13

10 44.344 391.08 40.759 12.86620 43.394 392.58 40.371 11.74730 43.195 391.92 40.926 10.765

40 39.547 393.83 39.874 10.59050 40.039 393.09 41.018 10.313

100 37.716 392.27 40.632 11.293200 39.805 393.45 40.341 11.460

500 41.147 392.76 40.542 11.1041000 39.874 392.53 40.731 11.229exact 38.880 392.98 40.606 10.988

symmetric matrix and is only referenced in the formof matrix-vector multiplications. Thus, applying conven-tional approaches mentioned in Section 1 is not feasiblein this case. In this problem, theMB smallest eigenvaluesare desired, where MB is the total number of orbitals.The test matrix is derived from the density functionalcalculation of a 510-atom system of silicon. The matrixsize is n = 175, 616, and the smallest 1,020 eigenpairsare desired. The linear systems are solved by the shiftedCOCG method using stopping criterion 10−4. One hun-dred circles are placed in the interval [−0.230, 0.243].The number of quadrature points of each circle is N = 8,and the number of sample vectors is s = 20. The resultsare shown in Fig. 2. The horizontal axis indicates theindex of the circles, and the vertical axis indicates theeigenvalue count for the circle’s sub-domain. The exactvalues are calculated by the conjugate gradient methodfor eigenvalue problems [13]. Although s is significantlysmaller than the matrix size n, our method roughly esti-mates the eigenvalue count. We obtained a rough eigen-value distribution that can be used in setting parametersfor an accurate eigensolver using only a few quadraturepoints and sample vectors.The computational cost of the conjugate gradient

method for eigenvalue problems is O(MB3) (see [13]).

We confirmed that the number of iteration of the shiftedCOCGmethod is proportional to n in preliminary exper-iments. The cost of the matrix-vector multiplication isO(n) due to the sparsity of the matrix. Therefore, whens is set much less than n and the scalar recurrences are

– 129 –


0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

Index of the circle

Eig

en

va

lue

co

un

t

Estimation

Exact

Fig. 2. Eigenvalue distribution of a 510-atom system of silicon.

introduced to the shifted COCG method, the computa-tional cost of our method is O(n2). Since n, the numberof grid points, is set to be proportional to MB, for ex-ample n ≈ 200MB, the cost of our method is O(MB

2)with a large coefficient. When the number of atoms inthe target system is large, our method can be employedas a preprocessing of accurate eigensolvers, due to thelower order of computational cost and the high parallelperformance.

5. Conclusions

We propose a stochastic estimation method of eigen-value counting within a given closed curve. Our methodis feasible for large sparse matrices or matrices that areonly referenced in the form of matrix-vector multiplica-tion. The stochastic estimation method for the eigen-value distribution is defined by separating the given do-main to several sub-domains and estimating the eigen-value count in each sub-domain. Furthermore, becausethe computation of our method has independence, it iseasy to execute on massively parallel computing environ-ments. An acceleration technique is introduced to stan-dard eigenvalue problems by using the shifted Krylovsubspace method. We show using numerical examplesthat our method roughly estimates the eigenvalue distri-bution using only a few quadrature points and samplevectors. The parameters of eigensolvers can be effectivelyset by using a given knowledge of the eigenvalue distri-bution, and this distribution need not to be accurate,but does need to be computed at low cost. Our methodis effective in such situations.

Acknowledgments

The authors would like to thank Jun-Ichi Iwata forhelpful comments and suggestions on density functionalcalculations. This research was supported in part by aGrant-in-Aid for Scientific Research from the Ministryof Education, Culture, Sports, Science and Technology,Japan, Grant Nos. 21246018 and 21105502.

References

[1] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe and H. van derVorst eds., Templates for the solution of Algebraic EigenvalueProblems: A Practical Guide, SIAM, Philadelphia, 2000.

[2] T. Sakurai and H. Sugiura, A projection method for general-

ized eigenvalue problems using numerical integration, J.Com-put. Appl. Math., 159 (2003), 119–128.

[3] T. Sakurai, J.Asakura, H.Tadano and T. Ikegami, Error anal-ysis for a matrix pencil of Hankel matrices with perturbed

complex moments, JSIAM Letters, 1 (2009), 76–79.[4] K. Senzaki, H. Tadano, T. Sakurai and Z. Bai, A method

for estimating the distribution of eigenvalues using the ASmethod, in: Proc. of Recent Advances in Numerical Methods

for Eigenvalue Problems (RANMEP2008), to appear.[5] Matrix Market, http://math.nist.gov/MatrixMarket.[6] Z. Bai, M. Fahey and G. Golub, Some Large Scale Matrix

Computation Problems, J. Comput. Appl. Math., 74 (1996),

71–89.[7] M. F. Hutchinson, A stochastic estimator of the trace of the

influence matrix for laplacian smoothing splines, Commun.

Stat., Simulation Comput., 19 (1990), 433–450.[8] B. Jegerlehner, Krylov space solvers for shifted linear systems,

arXiv:hep-lat/9612014v1, 1996.[9] R.Freund, Solution of shifted linear systems by quasi-minimal

residual iterations, in: Numerical Linear Algebra, L. Reichel,A. Ruttan, R. S. Varga and W. de Gruyter eds., Berlin,pp.101–121, 1993.

[10] R. Takayama, T. Hoshi, T. Sogabe, S. -L. Zhang and T. Fu-

jiwara, Linear algebraic calculation of the Green’s functionfor large-scale electronic structure theory, Phys. Rev. B, 73(2006), 165108.

[11] S. Yamamoto, T. Sogabe, T. Hoshi, S. -L. Zhang and T.

Fujiwara, Shifted conjugate-orthogonal-conjugate-gradientmethod and its application to double orbital extended Hub-bard model, J. Phys. Soc. Jpn, 77 (2008), 114713.

[12] J.R.Chelikowsky, N.Troullier, K.Wu and Y. Saad, Higher or-

der finite difference pseudopotential method: An applicationsto diatomic molecules, Phys. Rev. B, 50 (1994), 11355–11364.

[13] J. -I. Iwata, D.Takahashi, A.Oshiyama, T.Boku, K. Shiraishi,

S. Okada and K. Yamada, A massively-parallel electronic-structure calculations based on real-space density functionaltheory, J. Comput. Phys., 229 (2010), 2339–2363.

– 130 –


Error-controlling algorithm for simultaneous

block-diagonalization and its application

to independent component analysis

Takanori Maehara1 and Kazuo Murota1

1 Department of Mathematical Informatics, Graduate School of Information Science and Tech-nology, The University of Tokyo, Tokyo 113-8656, Japan

E-mail maehara misojiro.t.u-tokyo.ac.jp

Received April 14, 2010, Accepted August 28, 2010

Abstract

The finest simultaneous block-diagonalization for a given set of square matrices has beenstudied independently in the area of independent component analysis (ICA) and semidefiniteprogramming. A new algorithm for this problem, which finds the finest decomposition with acapability of coping with numerical errors, has recently been proposed by the present authors.In this paper we indicate the use of the algorithm for ICA by describing its main features andcomparing the method with the other existing methods.

Keywords simultaneous block-diagonalization, independent component analysis, eigenvalueproblem


1. Introduction

In this paper, we consider the following problem:

Given a set of n× n real matrices A1, . . . , AN ,find an n × n orthogonal matrix P such thatP⊤A1P, . . . , P

⊤ANP are in a common block-diagonal form.

Recently, this problem has been studied in the areaof signal processing, independent component analysis(ICA). ICA is an effective method for signal process-ing proposed independently by Ans, Herault, and Juttenand by Barness, Carlin, and Steinberger in the 1980’s.Let X be a (given) d-dimensional signal. In the mul-

tidimensional ICA, proposed by Cardoso [1], we decom-pose X into mutually independent signals by finding aninvertible (constant) matrix W and mutually indepen-dent (possibly multidimensional) signals Y1, . . . , Ym such

that W−1X = (Y1, . . . , Ym)⊤. There are several meth-

ods, to be explained in Section 2, that reduce this taskin ICA to simultaneous block-diagonalization of matri-ces. Standard methods for this reduction are JADE byCardoso and Souloumiac [2] and SOBI by Belouchrani,Abed-Meraim, Cardoso, and Moulines [3]. For simulta-neous block-diagonalization, the Jacobi-like algorithm ofCardoso and Souloumiac [2] is accepted as a standard al-gorithm in this area; see [4–6] and Section 3.Methods for simultaneous block-diagonalization are

also studied independently in the area of semidefiniteprogramming (SDP), which is a major field of optimiza-tion. When the data matrices of SDP is simultaneouslyblock-diagonalized, the associated SDP can be solvedefficiently [7–9]. The methods for simultaneous block-diagonalization proposed in this area focus on algebraicstructures, such as group symmetry, and find a decompo-

sition using the theory of algebra. In particular, Murota,Kanno, Kojima, and Kojima [10] and Maehara andMurota [11] recently proposed a numerical algorithm,to be called MKKKM algorithm in this paper, for si-multaneous block-diagonalization. The main idea of thealgorithm is to consider the matrix ∗-algebra generatedby the given matrices and to use the Artin-Wedderburntype structure theorem for matrix ∗-algebra. The algo-rithm is the first algorithm that finds the finest decom-position of given matrices from the numerical data of thematrices without knowing the underlying group symme-try. The algorithm is highly sensitive to the numericalerrors contained in the given matrices and sometimesfails to find reasonable or plausible decompositions, al-though this is a correct “theoretical” behavior that isconsistent with the Artin-Wedderburn theorem appliedto matrices contaminated by numerical errors.Very recently Maehara and Murota [12] have proposed

an algorithm for simultaneous block-diagonalization, tobe called the MM algorithm in this paper, which can findthe finest decomposition and has a capability of control-ling numerical errors. The algorithm can be seen as adual of the MKKKM algorithm, in the sense explainedin Section 4. This paper is intended to be a quick reporton the use of the algorithm of [12] in the context of ICA.

2. Independent component analysis via

simultaneous block-diagonalization

We here review two standard methods, JADE [2] andSOBI [3], that reduce the ICA problem to simultane-ous block-diagonalization of matrices. Recall that theICA problem is: given a d-dimensional signal X, find aninvertible matrix W and mutually independent signals

– 131 –

JSIAM Letters Vol. 2 (2010) pp.131–134 Takanori Maehara and Kazuo Murota

Y1, . . . , Ym such that W−1X = (Y1, . . . , Ym)⊤ =: Y .The JADE method first normalizes the given signal

X to that with zero-mean and unit-matrix variance viaaffine transformation. This process is called whitening,and is performed by the following algorithm. Here ⟨ · ⟩means the expected value, which is replaced by the sam-ple mean in practice.

Algorithm 1 (Whitening)

Input: Signal XOutput: Whitened signal X1: X ← X − ⟨X⟩2: X ←

⟨XX⊤⟩−1/2

X

After whitening, we can assume that W is an orthog-onal matrix. The reason is the following. If some de-composition Y = W−1X with invertible W is obtained,we modify Y and W by multiplying block-diagonal ma-trix so that Y has unit-matrix variance. Then we haveW−1W−⊤ =

⟨W−1XX⊤W−⊤⟩ = ⟨Y Y ⊤⟩ = I.

After whitening, JADE considers the fourth-order cu-mulant matrices Cij (i, j = 1, . . . , d) of X, where eachCij is a d× d matrix whose (k, l) entry is defined as

(Cij)kl = ⟨XiXjXkXl⟩ − ⟨XiXj⟩ ⟨XkXl⟩

− ⟨XiXk⟩ ⟨XjXl⟩ − ⟨XiXl⟩ ⟨XjXk⟩ (1)

for k, l = 1, . . . , d. The matrices have the property thatthe (k, l) entry of Cij is zero for i, j = 1, . . . , d if Xk

and Xl are contained in different independent compo-nents. Accordingly, if these matrices are brought to acommon block-diagonal form with m diagonal blocks, itis understood that X is decomposed into m independentcomponents. On the basis of this fact JADE reduces theICA problem to simultaneous block-diagonalization ofthe N = d2 fourth-order cumulant matrices Cij .The SOBI method also preprocesses the given sig-

nal by whitening. Then the method considers the time-delayed autocorrelation matrices

R(t, τ) =⟨X(t)X(t+ τ)⊤

⟩(2)

with some t and τ . Here X(t) denotes the delayed signalof X with delay time t. The matrices also have the prop-erty that (k, l) entry of R(t, τ) is zero if Xk and Xl aremutually independent. This property allows the SOBImethod to reduces the ICA problem to the simultane-ous block-diagonalization of the matrices R(t, τ).Here we emphasize that both methods need to com-

pute the expectations but, in practice, we can onlycompute approximate values of the expectations. As aconsequence the matrices generated by either methodare not exactly decomposable in the algebraic sense.Hence a method of block-diagonalization with someerror-controlling mechanism is needed.

3. Jacobi-like algorithm

Cardoso and Souloumiac’s algorithm [2] for simulta-neous block-diagonalization is an extension of the Jacobialgorithm for eigenvalue decomposition. It applies suc-cessive Givens rotations to the given matrices until thesum of squares of off-diagonals, which denotes a diago-nality criterion, becomes minimal. More concretely, let

R(i, j, θ) be the Givens rotation [13] with (i, j) planewith angle θ and let off(A) =

∑i =j a

2ij for a matrix A =

(aij). The algorithm for A1, . . . , AN is described as fol-lows.

Algorithm 2 (Jacobi-like algorithm [2])

Input: n× n matrices A1, . . . , AN

Output: block-diagonalized matrices A1, . . . , AN

1: repeat2: Find a Givens rotation R = R(i, j, θ) which mini-

mizes∑N

k=1 off(R⊤AkR)

3: Ak ← R⊤AkR (k = 1, . . . , N)4: until convergence

Step 2 is performed as follows. For each (i, j) computethe optimal Givens angle θ that results in the largestdecrement of

∑k off(R

⊤AkR) and then choose the best(i, j). The closed forms of the Givens angle and thedecrement are given in [5].It is easily seen that if the algorithm converges and∑Nk=1 off(Ak) = 0, then A1, . . . , AN are decomposed to

a simultaneous diagonal form. Conversely, it is provedby Bunse-Gerstner, Byers, and Mehrmann [4] that if thematrices A1, . . . , AN can be decomposed into a simul-taneous diagonal form, this algorithm locally convergesto∑

k off(Ak) = 0. However, global convergence of thealgorithm is still open.This algorithm, originally proposed for simultaneous

diagonalization [2], can also be used for simultaneousblock-diagonalization [1]. Recently, Theis [6] afforded areasonable theoretical basis for the method although thealgorithm is not rigorously guaranteed to find the finestdecomposition.

4. MM algorithm

The error-controlling algorithm of [12] for simultane-ous block-diagonalization is described. A set T of n× nmatrices is said to be a matrix ∗-algebra if T containsthe identity matrix and is closed under sum, product,transpose, and scalar product. Let T ′ denote the com-mutant algebra of a matrix ∗-algebra T , which is the setof all matrices that commute with all elements in T , i.e.,

T ′ = X | AX −XA = O (∀A ∈ T ) .

The outline of the algorithm in [12] is the following.Consider the matrix ∗-algebra T generated by the givenmatrices A1, . . . , AN . Sample a generic symmetric ma-trixX from the commutant algebra T ′ and output an or-thogonal matrix P that diagonalizes X. An error-controlparameter ϵ is introduced for robustness against numer-ical errors. The whole algorithm reads as follows.

Algorithm 3 (MM algorithm [12])


Output: Orthogonal matrix P that diagonalizes inputmatrices

1: Set error-control parameter ϵ ≥ 0.2: Sample a symmetric matrixX with ∥AkX−XAk∥ ≤

ϵ for k = 1, . . . , N .3: Find an orthogonal matrix P that diagonalizes X,

and output P .

The MM algorithm can be seen as a dual version of the

– 132 –


MKKKM algorithm in the sense that the MKKKM algo-rithm computes an orthogonal matrix that diagonalizesa randomly sampled symmetric matrix in T itself, andnot in its commutant T ′. Once the orthogonal matrix Pis found, the block-diagonal decomposition can be deter-mined from the zero-nonzero pattern of the off-diagonalentries of P⊤AkP for k = 1, . . . , N .The correctness of the MM algorithm can be stated

as follows.

Theorem 4 (Maehara and Murota [12]) If ϵ isset to zero, the orthogonal matrix P found by Algorithm3 gives the finest block-diagonal decomposition of A1, . . . ,AN . If ϵ is set to nonzero, the algorithm gives an error-controlled block-diagonal decomposition in the sense thatthe off-block-diagonal entries are of the order of ϵ.

The proof of this theorem relies on some algebraic factsabout matrix ∗-algebras and their commutant algebras,and also on the following linear-algebraic fact.

Lemma 5 (Maehara and Murota [12]) Let A bean n×n matrix and X be an n×n symmetric matrix. If∥AX−XA∥ ≤ ϵ, then, for any orthogonal matrix P thatdiagonalizes X as P⊤XP = diag(λ1, . . . , λn), we have

|(P⊤AP )ij | · |λi − λj | ≤ ϵ.

Step 2 of Algorithm 3 is performed as follows. Sincethe expression AkX −XAk is linear in the entries xij ofX, there exists an n2 × n2 matrix Tk such that

vec(AkX −XAk) = Tkvec(X),

where vec(X) = (x11, x21, . . . , xnn)⊤ is the vectorization

operation. We can show that Tk = I ⊗Ak−A⊤k ⊗ I. Let

S =N∑

k=1

(T⊤k Tk + TkT

⊤k

). (3)

The matrix S has the following property.

Proposition 6 Let S be the matrix in (3). Let u be avector with ∥u∥ = 1 and X be a matrix with vec(X) = u.

(a) If u⊤Su = 0, then AkX − XAk = O and A⊤k X −

XA⊤k = O (k = 1, . . . , N).

(b) If u⊤Su ≤ ϵ2 then ∥AkX −XAk∥ ≤ ϵ and ∥A⊤k X −

XA⊤k ∥ ≤ ϵ (k = 1, . . . , N).

A vector u in Proposition 6-(b) can be found by theeigenvalue decomposition of S. The algorithm for Step2 is given as follows.

Algorithm 7 (Sample a symmetric matrix [12])


Output: A symmetric matrix X for Step 2 in Algo-rithm 3

1: Construct the matrix S of (3) from A1, . . . , AN .2: Find normalized eigenvectors, say, v1, . . . , vr of S

that correspond to eigenvalues smaller than ϵ2.3: Sample real numbers c1, . . . , cr randomly subject to

c21 + · · ·+ c2r = 1.4: Put u = c1v1 + · · ·+ crvr, let X be the matrix such

that vec(X) = u, and output X = (X⊤ +X)/2.

When the error-control parameter ϵ in Step 1 is notprescribed, a reasonable value of ϵ can be chosen fromthe eigenvalues of the matrix S. Suppose that the given

matrices A1, . . . , AN can be regarded as being perturbedfrom the nominal values, say A1, . . . , AN , and let S bethe corresponding nominal values of the matrix S. ByProposition 6, we are to identify the null space of S,which may be approximated by subspace spanned by theeigenvectors of S that correspond to small eigenvalues.Therefore, as a reasonable choice of ϵ, we can adopt athreshold value that separates nearly zero eigenvalues ofS from nonzero eigenvalues.


Here we work with a problem of ICA, which is a stan-dard setting in the area of ICA. Let Y1, Y2, and Y3 be2-dimensional signals of length T = 10000, represent-ing three sequences of (x, y)-coordinates of T = 10000points that correspond to the characters “R”, “C”, and“H” shown in Fig. 1-(a). Let Y4 and Y5 be 1-dimensionalGaussian noises of length T = 10000 drawn from N(50,50) independently. It is assumed that Y1, . . . , Y5 are mu-tually independent. LetW be a fixed random matrix andput X = W (Y1, Y2, Y3, Y4, Y5)

⊤, which is 8-dimensionalsignal of length 10000. The ICA problem is to obtainY1, Y2, and Y3 from X; we have d = 8, m = 5 in thenotation of Section 2.According to the JADE method, we consider the

fourth-order cumulant matrices A1, . . . , AN , where N =64 and each Ak is of size 8 × 8. We used Cardoso’simplementation for building these matrices. To decom-pose these matrices we apply the MM algorithm (Algo-rithm 3). We plot the eigenvalues of the matrix S (seeFig. 2) to set ϵ = 1 as the error-control parameter.The signals decomposed by the MM algorithm are

depicted in Fig. 1-(b). The signals obtained using theJacobi-like algorithm are shown in Fig. 1-(c) and thoseusing the MKKKM algorithm are in Fig. 1-(d). It is legit-imate that the obtained signals are rotated or reflected.The MM algorithm and the Jacobi-like algorithm affordresults of comparable quality, much sharper than theoutputs of the MKKKM algorithm.To see the difference between the MM algorithm and

the Jacobi-like algorithm we work on a noisy ICA prob-lem which is generated by adding noise to the above ex-ample. (more specifically, adding random matrix drawnfrom N(0, 0.03) to the fourth cumulant matrices.) Theresults are shown in Fig. 3. It can be observed that theMM algorithm gives a slightly sharper result than theJacobi-like algorithm. (We here omit the result of theMKKKM algorithm because it does not produce anyreasonable result for this problem.) We remark that iflarger noises, e.g., drawn from N(0, 0.08), are added, theMM algorithm does not recover the original image.In addition, the MM algorithm is equipped with the

theoretical performance supports explained Section 4.The MM algorithm is thus a promising candidate forpractical use in ICA.

Acknowledgments

This work is supported by the Global COE “The Re-search and Training Center for New Development inMathematics”.

– 133 –


(a) Sources S1, S2, and S3.

(b) MM algorithm.

(c) Jacobi-like algorithm.

(d) MKKKM algorithm.

Fig. 1. The scatter plots of the signals for an ICA problem.

0 10 20 30 40 50 60 70

14

12

10

8

6

4

2

0

index i

eig

env

alu

e σ

o

f S

i

Fig. 2. Sorted eigenvalues of S for the ICA problem. The brokenline is a threshold of ϵ = 1. Note that if we set ϵ = 0.2 to splitσ9, the MM algorithm also recovers the original images.

References

[1] J. -F. Cardoso, Multidimensional independent componentanalysis, in: Proc. of IEEE Int. Conf. on Acoustics, Speech

and Signal Processing, Vol. 4, pp. 1941–1944, 1998.[2] J. -F. Cardoso and A. Souloumiac, Blind beamforming for non

Gaussian signals, in: IEE Proc. F, Vol. 140, pp. 362–370, 1993.[3] A. Belouchrani, K. Abed-Meraim, J. -F. Cardoso and E.

Moulines, A blind source separation technique using secondorder statistics, IEEE Trans. Signal Processing, 45 (1997),434–444.

[4] A. Bunse-Gerstner, R. Byers and V. Mehrmann, Numerical

(a) MM algorithm.

(b) Jacobi-like algorithm.

Fig. 3. The scatter plots of the signals for a noisy ICA problem.

0 10 20 30 40 50 60 70index i

14

12

10

8

6

4

2

0

eig

env

alu

e σ

o

f S

i

Fig. 4. Sorted eigenvalues of S for a noisy ICA problem. Thebroken line is a threshold of ϵ = 1.

methods for simultaneous diagonalization, SIAM J. MatrixAnal. Appl., 14 (1993), 927–949.

[5] J. -F Cardoso and A.Souloumiac, Jacobi angles for simultane-

ous diagonalization, SIAM J. Matrix Anal. Appl., 17 (1996),161–164.

[6] F. J. Theis, Towards a general independent subspace analysis,in: Proc. of Neural Information Processing Systems, Vol. 19,

pp. 1361–1368, 2006.[7] E. de Klerk, D. V. Pasechnik and A. Schrijver, Reduc-

tion of symmetric semidefinite programs using the regular∗-representation, Mathematical Programming, 109 (2007),

613–624.[8] E. de Klerk and R. Sotirov, Exploiting group symmetry in

semidefinite programming relaxations of the quadratic as-signment problem, Mathematical Programming, 122 (2010),

225–246.[9] K. Gatermann and P. A. Parrilo, Symmetry groups, semidef-

inite programs, and sums of squares, J. Pure Appl. Algebra,

192 (2004), 95–128.[10] K. Murota, Y. Kanno, M. Kojima and S. Kojima, A numer-

ical algorithm for block-diagonal decomposition of matrix ∗-algebras with application to semidefinite programming, Jpn

J. Indust. Appl. Math., 27 (2010), 125–160.[11] T. Maehara and K. Murota, A numerical algorithm for block-

diagonal decomposition of matrix ∗-algebras with general ir-reducible components, Jpn J. Indust. Appl. Math., 27 (2010),

263–293.[12] T. Maehara and K. Murota, Algorithm for error-controlled

simultaneous block-diagonalization of matrices, The Univ. ofTokyo Tech. Rep., METR 2009-53, 2009.

[13] G. H. Golub and C. F. van Loan, Matrix Computations, 3rded., Johns Hopkins Univ. Press, 1996.

– 134 –

JSIAM Letters Vol.2 (2010)

ISBN : 978-4-9905076-1-9

ISSN : 1883-0617

©2010 The Japan Society for Industrial and Applied Mathematics

Publisher :


4F, Nihon Gakkai Center Building

2-4-16, Yayoi, Bunkyo-ku, Tokyo, 113-0032 Japan

tel. +81-3-5684-8649 / fax. +81-3-5684-8663

J S I A Mjsiaml.jsiam.org/ebooks/JSIAMLetters_vol2-2010.pdf · Daichi Yanagisawa, Akiyasu Tomoeda,...

Documents

Transcript of J S I A Mjsiaml.jsiam.org/ebooks/JSIAMLetters_vol2-2010.pdf · Daichi Yanagisawa, Akiyasu Tomoeda,...