Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is...

38
Chapter 2 An Efficient Parallel Algorithm for a Class of Two-Point BVP involving a Fourth-Order Differential Equation .,. .• . -A"A""'HA .. , ...- ..r-#.t.J_..·, -¥'##.1111.-1·111.#-1.# ..# .ilrl.¥111-1·1.., ,;• M "'II'# ... .I -'".,. .-'" .#III.,JI£$#-1 We consider the extension of the parallel algorithm given in [1] to the class of two point boundary value problems set up as system of first order equations: dy -=ZI dx With the boundary conditions y(O) =AI, y(l) = A2 The above system can be, however, written in a compact form as y<iv> = f (x,y), y(O) y"(O) = A 2 , y(l) = y'(l) = B2 with Of :50 and Oy Of continuos on [0, I] x ( -oo, oo ). Oy 13

Transcript of Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is...

Page 1: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

Chapter 2

An Efficient Parallel Algorithm for a Class of Two-Point BVP involving a Fourth-Order Differential Equation

• .,. .• ~"" . • -A"A""'HA .. , ·~'.11, ...- ..r-#.t.J_..·, -¥'##.1111.-1·111.#-1.# ..# .ilrl.¥111-1·1.., ,;• M "'II'# ... .I -'".,. .-'" .#III.,JI£$#-1

We consider the extension of the parallel algorithm given in [1] to the class of two

point boundary value problems set up as system of first order equations:

dy -=ZI dx

With the boundary conditions

y(O) =AI,

y(l) = A2

The above system can be, however, written in a compact form as

y<iv> = f (x,y), y(O) =A~> y"(O) = A2, y(l) = B~> y'(l) = B2 with Of :50 and Oy

Of continuos on [0, I] x ( -oo, oo ). Oy

13

Page 2: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

In the following we consider devising parallel algorithm on the lines of [ 1] for our

TPBVP.

As in [ 1], the interval [0, I] is divided into p different divisions, each division

consisting of N or (N + I) unequal intervals. A new fourth order finite difference

scheme is developed for general non-~iform mesh and is applied to the above

class of TPBVP's on each of the p divisions. This leads to the solution of

N x N or (N + 1) x (N + 1) system of linear or non-linear equations which is

solved on p processors (p is power of 2) simultaneously. The solution of the

original problem is then obtained at n = Np equally spaced abscissas on [0, I].

2.1 INTRODUCTION

Consider the class ofTPBVP's

y (iv) = f(x, y), y(O) =A" y"(O) = A2, y(l) = B" y"(l) = B2 (2.1)

where Of~ 0 and Of continuos on [0, I] x (oo, oo). We also assume that (2.1) has Oy Oy

a unique solution on [0, 1] x (-oo, oo) [6].

A number of papers have recently appeared in the literature which have considered ~

the solution of TPBVP's on parallel computers; see, e.g., [3-5]. Most of these,

however, consider parallelizing the matrix computations which arise when the

differential equation and associated boundary conditions are replaced by its

equivalent finite difference schemes. A parallel chopping algorithm is considered

in [I] where on computer with p processors, the BVP is solved numerically at each

stage on p-meshes using a code based on COLNEW [7, 8].

14

Page 3: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

In the following, we solve (2.1) concurrently on a computer with p-processors, p a

power of 2. In particular, we develop a new fourth-order finite difference scheme

for non-uniform mesh and apply to the above class of TPBVP's on each of the p

divisions which leads to the solution of (N - 1) x (N - 1) or (N x N) system of

linear or non-linear equations which are solved concurrently on p processors.

2.2 THE PARALLEL ALGORITHM

To construct p different divisions of interval [0, I], each division consisting of N

or (N + I) equally spaced abscissas, we use the chopping algorithm as discussed in

[I]. However, for the sake of completeness we describe it in the following:

I. Divide [0, I] into N equal parts such that h = 1/N; xk = kh, k = 0, I, 2, .... ,

N. We solve the (N- I) x (N- 1) system in (N- I) unknowns Yt. y2, ... ..

. . . . . , YN-l arising out of finite difference discretization of (2.1) on the

above abscissas on processor I.

2. Next subdivide each interval [xt.t. xk] into two equal parts such that x'k is

the mid-point of [xk-t. xJ. Then x1' = Xo + h/2. As in stage 1, we apply the

finite difference discretization at the abscissas x1', x2', •••• xN' and solve the

N x N system inN unknowns y1', y2', •••• , YN' on processor 2.

3. Proceeding as in I we further sub-divide the intervals [xk-t. xk'l and [xk', xk]

into two equal parts and denote their mid-points by

x;k_1 and x"2k. k = I, 2, .... , N.

As in stage 2, we solve an algebraic N x N system in unknowns

y;·, y~· ..... y~N-I at the abscissas~;·, x~, .... x~N-I on processor 3. Note that

15

Page 4: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

4. On processor 4, we solve that N x N system inN unknowns .. .. ..

Y2• Y4·····Y2N.

Note that

The above procedure can be generalised as follows:

We wish to find the solution (2.1) on [0, 1] at n given points

Xo. x" x2, •••• , xN, which are equally spaced.

The p processors would then solve N x N system each concurrently in N unknowns

(except processor 1 which solves N - 1 x N - 1 system).

Thus,

n=Np (2.2)

Number the p processors in the order 1, 2, 3, .... , p = 2k (k fixed). For the

processor j, 2 ~ j ~ p = 2k (k fixed) we write j uniquely as j = j (d, m) = 2d-t

+ m, 1 ~ m ~ 2d·n, d = 1, 2, ... , k. The processor j then solves the N x N

., xN}, where

Pi= 2k-d (2m - 1) + i2k, i = 0, I, ... , N- I. (2.3)

Also for processor j,

_ =[(2m -l)Jh Xp X 0 , 0 2d

x -x = {1- [(2m -l)]}h N PN-1 2d (2.4)

16

Page 5: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

Processor 1 solves (N - I) x (N - 1) system y d I

di = 2k ( 1 + i), i = 0, I, 2, .... , N - 2.

2.3 THE FINITE DIFFERENCE METHOD

;

We next proceed to construct a fourth-order method for the solution of (2.1) on

[0, 1]. All the subintervals for any ofthe p divisions are equally spaced except the

first and the last whose length are ch and (1-c)h respectively, c E (0, 1). A fourth

order finite difference scheme for (2.1) is then given by,

-2yo + UJYI + a2Y2 + UJY3 + ah2y"(O) = h4 [Pofo + Plfl + P2f2 +P3f3] + tl (2.5)

Expanding left hand side and Right hand side by Taylor's series about (Xo, y0 ),

{

2h2 3h3 4h4 } 2 + (I) C (2) C (3) C (4) - y 0 a 1 yo + ch yo + --yo +--yo +--yo + ...... .

2! 3! 4!

+ { ( l)h (I) (c+l)2

h2

<2> (c~I)3 h 3 (J) (c+l)

4h

4 <4> }

0t2 Yo + C + Yo + Yo + Yo + Yo + ...... . 2! 3! 4!

+ { ( 2)h (I) (c+2)2

h2

<2> (c+2)3

h3

(J) (c+2)4

h4

<4> } a.3 Yo + c + Yo + Yo + Yo + Yo + ...... . 2! 3! 4!

+ ah2y (2) = h4 A y(4) +A y(4} +chy(S) +--y(6) +--y(7) +-[ {

c2h2 c3h3 } o 1-'o o 1-'t o o 2, o

3! o

+A y<4>+(c+l)hy<s> + y<6> + y<'> +.-{

(c+l) 2 h 2 (c+l)3 h 3 }

1-'2 0 0 2! 0 3! 0

Comparing coefficients of h0, h, h2, h3, h4, h5

, h6, h7

, we g~t 8 equations through

which 8 unknowns can be calculated.

17

Page 6: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

The corresponding 8 equations are:

c3a 1 + (c+ 1)3

a 2 + (c+2)3a 3 = 0

6 6 6

c5a 1 (c+ 1)5a 2 (c+2)5a 3 --+ + = r1 = c(31 + (c + 1)(32 + (c + 2)~3 5! 5! 5!

c6a 1 (c+ 1)6 a 2 (c+2)6 a 3 c2 (3 1 (c+ 1)2 (32 · (c+2)2 (33 --+ + =r =--+ +-----"-6! 6! 6! 2 2 2 2

From the above equations we get the following values:

a 1 = (c + 2) (2c + 3)/6, a 2 =- 4c(c + 2)/3, a 3 = c(2c + 1)/3,

a= c(c + 2)/3

PI= {(c+ l)(c+2)r1 -2r2 (2c+3)+6r3 }

2c

p2

= _ {c(c+2)r1 -4r2 (2c+3)+6r3 }

c+l

p3

= {c(c+ l)r1 -2r2(2c+ 1)+6r3 }

2(c + 2)

18

Page 7: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

}4 = X2 + 2h

X3 = X2 + h

Xt = X2- h

Xo = x2 - h - ch = x2 - h( 1 + c)

+2h · 4h2

• 8h3

(J) 16h4

<4> 32h5

(S) Y2 Y +-y +-y +--y +--y +

2 2! 2 3! 2 4! 2 5! 2 -

+a Y +hy. +-y· +-y<3> +-y<4> +-y<s> +-(

h2 h3 h4 hs J 1 2 2 2! 2 3! 2 4! 2 5! 2

19

Page 8: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

Comparing the coefficients on both sides, we get the following equations:

2 + a 1 - U.J - ( } + C )a.2 = Q

~ 3 8 + ~- a_l -(I+ c) a_2 = 0

6 6 6

{64+a 1 +a_ 1 +(l+c)6a_2 } ~ {3 p_ (l+ciJ3_

-'---------~ = R = 2A +_I +_I + 2

6! 2

'"'2

2! 2! 2!

Solving, we get

~ 3(c+3) 3(c+3) 6 a 1 =- . a= a 2 =

(c+2)' (c+l)' - {c(c+l)(c+2)}

(3 +c) a_l =---

c

20

Page 9: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

p = {(c+l)R 1 +2(c+2)R 2 +6R3 -6~2 (c+3)} 1 2(c + 2)

and

For the discretization of the differential equation, fork= 3, 4, ..... , N- 3

We have

04y, ~ h' [ 2a,f, + t. a. (r, •• + f,_. )] ~ t,

84yk = Yk+2- 4yk+I + 6yk- 4Yk-I + Yk-2

c - y(4) 1 k - k

f - (4) k+m - Yk+m

(2.7)

Comparing the coefficients of both sides by Taylor's expansion, we get

I I Clo= -, al =-, a2 =0

3 6

t. = __:_!___ hsy<&> + O(h IO) k 10080 k

2I

Page 10: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

For k = N - 2, we have

YN + J..!IYN-1 + JloYN-2 + J...l-IYN-3 + H-2 YN-4

(2.8)

XN-1 = XN-2 + h

XN = XN-2 + h(2 - C)

XN-4 = XN-2 - 2h

{ . 2 (2- c)2

.. 3 (2- c/ <J> } YN-2+h(2-C)YN-2+h 2! YN-2+.h 3! YN-2+ .....

= h4[v {y<4l + h(2- c)y<s> + h2 (2- c)~ y(6l + hJ (2- c)J y<?l + ..... } 2 N-2 N-2 2, N-2 3, N-2

22

Page 11: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

+V y(4) _ hy(S) + -y(6) _ -y(7) +-{

h2

hl } -1 N-2 N-2 2, N-2 J! N-2

Comparing the coefficients on both sides and solving the equations we get,

J.li =- (2- c)(3- c)(4- c) /6, J.l-J =- (2- c)(I- c)(4- c)/2,

J.l-2 = (1- c)(2- c)(3- c)/6, Jlo =-I- (111 + Jl.2 + Jl.1)

v1 = {2RI + 6R2 + 6R3 - (4- c)(3- c) (2- c) v2}/6,

v.1 = {2R2 + 6R3 - 2R1 -(I -c) (2- c) ( 4- c) v2}/2

v.2 = {RI - 6R3 + (1 -c) (2- c)(3- c) v2}/6

Rj = {(2- cY+4 + Jl1 + (-It4J.l. 1 + (-I)j+4 2j+4 f.l-2}/G + 4)!,j = 0, I, 2, 3 0

and finally the discretization of the boundary condition

y"(l> = B2 leads to

23

...... (2.9)

Page 12: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

Expanding about xN.2 = XN - (2 - c )h,

{ ' h

2 " h

3 (3) }

+~2 YN 2- hyN 2 +-yN 2 --yN 2 + ..... . - - 2! - 3! -;

h2{ <2> (2 )h<J> (2-c)h2

<4> }] +~ YN-2+ -c YN-2+ 2! YN-2·····

= h4[v {y<4> + h(2- c)y<s> + h2 (2- c)2 y<6> + h3 (2- c)3 y<7> + ·····} o N-2 N-2 2, N-2 3, N-2

Comparing coefficients of power of h and solving the equations we get,

Jl3 = (1 - c)(3- 2c)/3, Jl = ( c - I )( c - 3) /3,

~~ =(2c-5)(c-3)/3, Jl2 = - 4 (1 - c )(3 - c )/3

vo =CRt- 6R3)/ {(c- 1)(2- c)(3- c)}

24

Page 13: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

It is convenient to write (2.5) - (2.9) in the matrix fonn for further convergence

analysis. Thus, setting,

F = (d .)N.-11

where IJ I,J= '

dk.k = 6, dk,k±l = -4, dk,k±2 = 1 (k = 3, 4, .... , N- 3)

dN-2,N-3 = Jl-1, dN-2,N-4 = J.l-2•

dN·I,N-1 = ~~, dN-I,N-2 = J..12, dN-I,N-3 = JlJ

And

G(y) = (g~. ..... gN_1)T where

[k = 3, 4, .... N - 3]

25

Page 14: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

We may write (2.5)- (2.9) in the matrix form as

F Y + G(Y) + Q = T

Let Y be an approximation of Y. Then

F Y + G( Y ) + Q = 0

Subtracting (2.11) from (2.1 0) and setting

Y- Y = E = (e~> e2, .... , eN_1)T

We obtain the error equation

(F + h4 MU)E = T

Where

M = (m;i t~11 is a penta-diagonal matrix with

mk.k = 2ao, mk,k±I =a~> mk.w = 0, (k = 3, 4, .... , N- 3)

mN-2,N-1 = VJ, mN-2,N-2 = Vo

mN-2,N-3 = V-J, mN-2,N-4 = V_2,

A A

mN-l,N-1 =v,' mN-I,N-2 = V2 ,mN-l,N-3 = V3

26

(2.10)

(2.11)

(2.12)

Page 15: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

2.4 CONVERGENCE OF F'INITE DIFFERENCE SCHEME

Lemma:

Proof: The matrix F can be partitioned as

0 0

0 0

A

F

0 0

0 0

Where F = [: : l a is 2 x 2 matrix

represented by

b is 2 x (N- 5) matrix,

b = [~: .0 . . ~]

27

Page 16: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

c is (N - 5) x 2 matrix

1-4

0 1

0 0

c=

0 0

and d is the penta-diagonal (N - 5) x (N - 5) matrix where

d·. = 6 1,1 , d· "±I=- 4 d· "±2 = 1 1,1 , 1,1

(A)-I [A 8] Assume that F = C 0

This results in the following system of equations:

aA + bC =I

aB + bD = 0

cA + dC = 0

cB + dD =I

From (2.13) and (2.15),

A= (a- bd-1 cr1 =a (say)= (r\2. IJ Ji,j=l

28

(2.13)

(2.14)

(2.15)

(2.16)

Page 17: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

From (2.14) and (2.16),

B =- abd"1

D = d" 1 + d"1c a bd"1 (2.17)

We next show that IIDII ~ O(N4) (in sup norm)

From (2.17),

Let k = [ C a b ]N-5xN-5 = (k;j )~j~~

where

k21 = /3a3 + /4a 1, k22 = /4

All other terms of matrix k is zero.

Al · d-1 _ N-S so smce -(D .. ).· 1 IJ I,F

d" 1 c a b d"1 is given by (N - 5) x (N - 5) matrix whose (i, j)th element is

given by

• Dij (k11Di1 + k21Di2) + D2j (k12Di1 + k22Di2)

where

D = (N- i- 6)(N- i- S)j(j + 1) (i(. + 2)(N- 6)- (i + 1)(. -1)(N- 4)]fori ~ . (2.18) IJ 6(N - 5)(N - 6)(N - 4) J J J

29

Page 18: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

and

0 = (N- j- 6)(N- j- S)i(i + l) (j(i + 2)(N- 6)- (j + l)(i -l)(N- 4)]fori 5 j [1 0] IJ 6(N- 5)(N- 6)(N- 4)

Adding all terms of any row i we get,

N-5 N-5

(kttDit + k2tDi2) I Dij + (ki2Dil + k22Di2) L D2j

j=l j=l

It can be easily shown that

kttDit + k2tDi2::: 0 (1)

k21Dit + k22Di2::: 0 (I)

Also ~ D = (N - 6)(N - ?) ::: 0 (N 2 ) L. IJ 12 j=l

N-5 D . = (N

3 -21N

2 + 142N -316)::: 0 (N2)

~ 21 4(N - 6) -j=l

Sum of all terms of any row i ~ 0 (N2)

From (2.18) and (2.19),

30

.... (2.19)

Page 19: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

From (2.17),

IIDII

IIDII ~ O~)

. Since all A, B, C, D matrix have sup norm less than or equal to O(tt)

Now partitioning matrix F as [ ~ ~]

where p = [F] , N-3xN-3

q = (0]N-3x2

r= [~ ~

it can be shown in the similar manner as above that

31

Page 20: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

From (2.12), taking the sup norm of both sides we obtain

E - II F-111 IIlli

II II- 1- h4 IIF-111 II Mil IIUII (2.20)

provided v• = sup IUkl < c for an appropriate positive number c. IsksN

It is shown above that

2.5 NUMERICAL ILLUSTRATION

For the numerical illustration, we consider the following linear two-point boundary

value problem:

subject to boundary conditions:

y(O) = 1.0, y(l) = 0, y"(O) = -1.0, y"(l)=- 6e (2.21)

with exact solution y(x) = (1 - x2) exp(x). We solved (2.21) using classical

second-order method and the fmirth-order method given by (2.5)- (2.9).

Let Tj,N = time taken to solve the N x N algebraic system on jth processor using

the fourth-order scheme.

32

Page 21: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

Tp,N =max {Tj,N}

T1,n =time taken for solving the (n- 1) x (n- 1) system to obtain Y~> Y2· ...

• , Yn-I at (n - 1) abscissas XJ, x2, ••• ·Xn-I respectively, using the classical

second-order method,

II Ell~~> =maximum of absolute error obtained while solving N x N system

on jth processor using fourth-order method.

II Ell~>= max{IIEII~j>}

II Ell~> = maximum of absolute error obtained while solving (n- 1) x (n -1)

system using classical second order method.

Then p(p = 8, 16, 32, 64, 128) NxN system for N = 8, 16, 32, 64 and 128

respectively, were solved, in parallel, each on the single processor and the

CPU time noted against each. Speed-up was calculated by comparing T1,n

(obtained using classical second order method as benchmark) and Tp,N for

the values of II Ell~> and II Ell~> satisfying II Ell~> ~ HEll~> such that II Ell!;> is as

close to II Ell~> as possible.

The results are presented in Table 1. We note that as the system of algebraic

equations become large (n = 1024, 4096), the speed-up is considerably

improved and is very nearly equal to number of processors used.

33

Page 22: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

TABLE 1

CI~ssicatcSecond- · F~uJ1h-order .. ,Spe~d-up

order Method

5.08(-7), 3.68 (-7) n = 512

T1,n = 2.0 (-2)

·1.33 (-7), 2.85{;;8) n =;to24:~Y . ~ ' . ·: \ '

T1.n = 4.0 (-2)

3.24 -8), 2.86 ( -8) n = 2048

Tl,n = 8.0 (-2)

8.11 (-9), 1.85 (-9) n = 4096 '

T1.n = 1.6 (-1)

Note: a.b(-c) means a.b x ](f

2.6 THEORETICAL ESTIMATES

Method

N = 8, p = 64

Tp,N = 6.0 (-4)

N~~;p= 16

. . ,~p;t~3.2 (-3) .. · •.

. ,'~

N = 32, p = 64

Tp,N = 1.6 (-3)

;N·;,;32:p= 12R;

Tp,N ~~1.6 (-3)

.(Sp:= Tt,.!f p,N)

33.3

50

100·.· .. ·

We next calculate theoretically the speed-up of the linear two-point boundary

value problem given by

lv) = f(x).y + g(x)

y(O) = A 1, y(l) = B1

y"(O) = A2, y"(l) = B2 ) (2.22)

as attempted by an algorithm using fourth-order method for non-uniform meshes.

Suppose

k 1 =time for evaluating f(x) at any x E [0, I]

34

Page 23: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

k2 =time for evaluating g(x) at any x E [0, 1]

y = time for single addition/subtraction,

J..l = time for single multiplication,

v = time for single division

. For our linear TPBVP, the following estimates were observed

k1 = 1.58 x w-6, k2 = 1.456 x to-5

, y = 6.08 x 10·7,

ll = v = 6.95 x w-7

Then the time taken for solving (2.22) by classical second order is given by

T1,n = .20 y + 26J..l + (n+ l)k1 + (n + l)k2 + (13n- 35) y + (16n- 47)J..l

T1,n = (13n- 15) y + (16n- 2l)J.1 + (n + l)(k1 + k2)

and the time taken by parallel algorithm is given by

Tp,N = 112 y + 204J..l + (N + 2) (k1 + k2) + (17N- 39) y + (26N- 71)J..l

Tp,N = (17N + 73) y + (26N + 133)J..l + (N + 2)(ki + k2)

Thus the Speed up is given by

S = ..!!._ = (13n -15)y + (16n- 2l)J..l + (n + l)(k1 + k2 )

P TP (17N- 73)y + (26N + 133)J..l + (N + 2)(k1 + k2)

Thus if n = 512, N = 8, then the speed up given by Sp = 34.22. Theoretical

estimates thus confirm the results obtained in section 2.5.

We may note that in case f(x, y) is non-linear, the theoretical estimates for

obtaining speed would be function of the number of iterations n1 and n2 required

for the convergence of classical ~econd order and fourth order method

respectively.

35

Page 24: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

2.7 EXTENSION OF PARALLEL ALGORITHM FOR ELLIPTIC PDE's

We consider the solution of linear Partial Differential Equation:

a2u a2u au au A-+C-+0-+E-+Fu=F*

ax 2 &/ ax ay (2.24)

in the region 9i with boundary a R, where A, C, D, E, F, F* are continuous

functions of x and y in 9i + a 9i. We shall assume that A > 0, C > 0 and F ~ 0 so

that ellipticity condition and weak min-max principle are satisfied. The Boundary

Conditions (BC), as is well known, can ;be one of the following three types:

(i) The Dirichlet BC.

We have

u(x, y) = g(x, y), (x, y) E a9i (2.25)

where g(x, y) is a prescribed function which is defined and continuous on a9i. The

condition (2.25) is the Dirichlet Condition (DC).

(ii) The Neumann BC

We have

au on = g(x,y), (X, y) E a 9i

where g(x, y) is a prescribed function defined and continuous on a9i and n is the

outwardly directed normal. The (2.25) is the Neumann BC.

(iii) Mixed BC

We have

au - + a(x,y)u = g(x, y), an (x, y) E a9i (2.26)

where a{x, y) >band g(x, y) are defined and continuous on a9i.

First of all we concentrate our attention on Poisson Equation, namely,

36

Page 25: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

Thus,

A= C = 1, D = E = F = 0 and F* * 0 in (2.24)

82u 82u -+-=F* ax2 ay2 (2.27)

and a~ is the rectangular region on which the BC's are specified. Thus

u (0, y) = g (0, y)

} u (1, y) = g(l, y) 0:5y:51

y (x, 0) = g(x, 0) } u(x, 1) = g(x, 1) 0:5x:51

y

0 X

In order to find the numerical solution of (2.24) with appropriate BCs (in this case,

say (2.28)) we superimpose on~. rectangular network with mesh lengths h and k

in the x andy directions respectively. Thus, the nodal points are given by

x1 = Xo +I h, I= 0, ± 1, ± 2, .....

Ym =Yo+ mh, m = 0, ± 1, ± 2, ...... .

The aim is to find the solution at the nodal points (x1, Ym) for small values of the

mesh spacings h and k.

37

Page 26: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

In this section, our basic aim, is to consider development of parallel algorithms for

the class of TPBVP's (2.27) - (2.28). Because of the fact that elliptic PDE's always

occur as BVP's, these algorithms could, in principle, be obtained as extension of

the parallel algorithm presented in [ 1].

In this case, following [ 1 ], we divide the [0, 1] along the x-axis into p different

divisions, each division consisting of N or (N + 1) (N small) unequal intervals.

Similarly divide [0, 1] along the y-axis into p different divisions. We then have a

total of p2 regions (p a power of 2) on which the solution of (2.24) can be obtained

simultaneously provided p2 processors are available.

For the solution of (2.24) on the coarser p2 grids we would have to develop high

order finite difference schemes for BVP (2.27) - (2.28). As in [1 ], a high order

scheme would have to be developed for general non-uniform grid on the

rectangular domain. Once the high order methods are successfully devised the

scheme is applied to each of the p2 regions on which BVP is defined. This finally

leads to a solution of N x N or (N - 1) x (N - 1) system of linear or Non-linear

equations which are concurrently solved on p-processors. As an example, let us

consider the case when p = 4, Nh = 4, Nk = 3. The final region in which the

solution has to be obtained is the following:

38

Page 27: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

1

y

r

The p2 processors simultaneously solves the problem (2.24) concurrently in the

following 16 grids. This yields the solution at all points depicted on figure I but as

the solution is found in parallel substantial speed up is expected, especially in view

of the fact that no communication is required amongst the processors.

39

Grid# 0

1

Page 28: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

1

113 y

l 113

113

114 114 114 --•1111- X

40

Grid# 1

114 1

Page 29: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

1

116

Grid#

113

116

0 114 114 114 114 1

---. ... X

41

Page 30: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

1 rt~~~7t~~~~~~~~ 1112

1/3

114

0 114 1/4 114 114 1

--•• X

42

Page 31: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

114

Grid# 4

1/3

1112

0 114 114 1/4 1/4 1 ---+ ... X

43

Page 32: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

For each different division ofh, 4 different grids (pertaining to different k's) can be

drawn. So total number of grids are 4 x 4 = 16. Four grids pertaining to h = 114 are

shown in the figure. Twelve more grids can be drawn. If all the girds are super

imposed, we get the solution ofBVP (2.24) at the nodes on GRID# 0.

Now, we have to develop high order finite difference scheme for obtaining the

solution simultaneously on p2 regions, consisting or coarser non-uniform grids.

Before the actual implementation, the stability and convergence of the scheme

should be proved which could be tricky affair in itself.

As an examples, consider the problem (2.27) - (2.28) i.e. the Poisson Equation

together with the BC's

write

&u = F*/2 and Ox.2

Discretigation of the above differential equation leads to

2 ( • • • J 2

h Fk+ll + I OFk I + Fk-11 Uk+ I I - Uk I + Uk-1 I = - . . .

. ' ' I2 2

2 ( • • • J 2

k F .. /+1 +I OFk I + Fk 1-1 Uk 1+1 - Ukf + Uk,l-1 = -. . ' .

' ' 12 2

• Substituting (2.30) and (2.31) in (2.28)

44

(2.28)

(2.29)

(2.30)

(2.31)

Page 33: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

h2 ( • • • • • ) Uk+ 1,1 + Uk. 1,1 - 4uk./ + Utt./+1 + Uk.f.1 =

24 Fk+l,l + Fk-l.l + 20F".I + Fk.l-1 + Fk.l+l

Similarly,

where

Adding (2.33) and (2.34),

I~ k ~ N,

where [0, I] is sub divided as

I~ l ~ N'

0 = y < x1 < < x - I and ·~ · · · · · · N+l-

45

....... (2.32)

(2.33)

(2.34)

(2.35)

Page 34: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

0. = y 0 < y I < ....... < YN'+ 1 = 1.

Similarly

(2.36)

Uk.N+I - (1 + ~N) llk,N + ~Nllk,N-1

F• F. - k 2 (Jl~ + llN -I) k;+l + (J.!N + l)(Jl~ + 3J.!N + 1) ;N

+ tN

12 +J.!N(l+J.!N -J.!~) F:.N-1 2

(2.37)

Adding (2.37) and (2.38)

(2.38)

(2.32), (2.35) and (2.38) can be written in the matrix form as .

A Y + G(Y) + Q = T (2.39)

for appropriate matrices A, G(Y) and Q.

Y = (UJJ, U12• •••.• , UJN', U2J, ••••••• U2N'• .••••• , UNN·]

46

Page 35: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

Let Y be an approximation to Y. Then

A Y + G( Y) + Q = 0

(2.40)

We can show that the matrix A is irreducible and monotone since its off diagonal

elements are non-positive. Also it can be shown that

(2.41)

Finally, there is no basic reason why our algorithm should not work for BVP's and

Initial value problems with other kinds of PDE's. For example, we could easily

consider the parabolic equation

au 82u at= ax2

with any of the following conditions:

(i) u (x, 0) = f(x), O~x<oo

Boundary Conditions

ao (0, t)u + a1 (0, t) au = a2 (0, t) ax

where

ao (0, t) 2:: 0, a1 (0, t) ~ 0 and ao - a1 > 0 and also a condition at x = oo, t 2:: 0

(ii) Initial Condition

u {x, 0) = f(x), a~x~b

Boundary Conditions

ao (a, t)u + a1(a, t) au = a2 {a, t) ax

bo (b, t)u + a1{b, t) au = b2 {b, t) ax

47

Page 36: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

where

ao(a, t) ~ 0, a1{a, t) ~ 0 and ao- a1 > 0,

b0 (b, t) ~ 0, b1(b, t) ~ 0 and b0 - b, > 0

then this is an initial boundary value problem.

Thus, we see that our algorithm is extremely versatile in the sense that the same

algorithm can be used, in parallel, to obtain the solution of varied BVP's. The only

trick is that we must be able to obtain high and very high order finite difference

methods for relevant BVP's and IVP's. In view of the fact that no communication

takes place the speed up expected is substantial.

48

Page 37: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

REFERENCES

[ 1] M. Paprzycki and I. Gladwell, A Parallel chopping algorithm for ODE

boundary value problems, Parallel comput. 19,651-666 (1993).

[2] C. P. Katti and S. Goel, A Parallel Mesh Chopping Algorithm for a class of

Two-point boundary value problems, Computer Math. Applic. Vol. 35, No.

9, 121-128. (1998).

[3] U. Ascher and S. Y. P. Chan, On parallel methods for boundary value

ODE's, University of British Columbia, Department of Computer Science,

Technical Report 89-119, 1989.

[ 4] M. Paprzycki and I. Gladwell, Solving almost block diagonal systems on

Parallel Computers, Parallel Comput. 17, 133-153 ( 1991)

[5] M. Paprzycki and I. Gladwell, Solving almost block diagonal systems using

level 3 BLAS, In proc. 5th SIAM conference on Parallel Processing in

Scientific Computation,(Edited by J.Dongerra et al.),pp.52-62,SIAM,

Philadelphia, PA (1992) 52-62.

[6] P. Henrici, Discrete Variable Methods in Ordinary Differential Equations,

John Wiley, New York, (1962)

[7] U. Ascher, J. Christiansen and R. D. Russell, Collocation Software for

boundary value ODEs, ACM Trans. Math. Soft. 7 (1981) 209-229.

49

Page 38: Chapter 2shodhganga.inflibnet.ac.in/bitstream/10603/18070/7/07_chapter 2.pdf · original problem is then obtained at n = Np equally spaced abscissas on [0, I]. 2.1 INTRODUCTION Consider

[8] A. Bader and U. Ascher, A new basis implementation for a mixed-order

boundary value ODE solver, SIAM J. Sci. Stat. Comp. 8 ( 1987) 483-500.

[9] G. S. Subramianium, Variable mesh difference methods for the solution of

two point singular perturbation BVP's, Ph.D. Thesis, /lTD, Indian Institute

of Technology, Delhi, (1982)

[I 0] Riaz A. Usmani and Dereck S. Meek, On the Application of a five-band

matrix in the numerical solution of a boundary value problem, UTILITAS

MATHEMATICA Vol. I4 (1978), pp. 21-29.

[II] L.F .Shampine, Boundary value problems for ordinary differential equations,

SIAM Journal of Numerical Analysis,2( 1968) 2I9-242.

[I2] M.M.Chawla and C.P.Katti, A finite difference method for a class of singular

two-point boundary value problems, IMA Journal of Numerical Analysis,

4(1984) 457-466.

[13] J.R. Cash and A.Singhal, High order methods for the numerical solution of

two-point boundary value problems,B/T, 22 (1982) 184-199.

50