Download - Three dimensional MPI parallel implementation of the PML algorithm for truncating finite-difference time-domain Grids

Parallel Computing 33 (2007) 109–115

www.elsevier.com/locate/parco

Three dimensional MPI parallel implementationof the PML algorithm for truncating finite-difference

time-domain Grids

Omar Ramadan *

Computer Engineering Department, Eastern Mediterranean University, Gazi Magusa, Mersin 10, Turkey

Available online 10 January 2007

Abstract

Three dimensional parallel implementation of the perfectly matched layer (PML) formulations are presented for trun-cating finite-difference time-domain (FDTD) Grids. The FDTD computational domain is divided into subdomains usingone-dimensional topology and the interprocessor communication operations between the subdomains are carried out byusing the message-passing interface (MPI) library. The validity of the proposed algorithm is shown through numerical sim-ulations carried out for a point source radiating in three-dimensional domains of different sizes and performed on a net-work of PCs interconnected with Ethernet.� 2006 Published by Elsevier B.V.

Keywords: Finite difference time domain; Parallel computing; Message-passing interface; Perfectly matched layer

1. Introduction

In the last decade, the finite-difference time domain (FDTD) [1] method has been widely used for solvingelectromagnetic problems [2]. The main advantage of the FDTD method is that it is a straightforward solu-tion of the six-coupled field components of the Maxwell’s curl equations. This method is based on Yee’salgorithm [1] and computes the electric and the magnetic field components by discretizing the Maxwell’s curlequations both in time and space, and then solving the discretized equations in a time marching sequence byalternatively calculating the electric and magnetic fields in the computational domain. To analyze largeproblems using the FDTD method, intensive computing time and memory storage are needed. Hence, par-allelizing the FDTD algorithm has been shown to be one of the latest challenges in the FDTD research. Inthis case, the FDTD computational domain need to be spatially decomposed into contiguous non-overlap-ping subdomains where each subdomain is processed by one processor. Therefore, to update the field

0167-8191/$ - see front matter � 2006 Published by Elsevier B.V.

doi:10.1016/j.parco.2006.11.003

* Tel.: +90 392 630 1194; fax: +90 392 365 0711.E-mail address: [email protected].

mailto:[email protected]

110 O. Ramadan / Parallel Computing 33 (2007) 109–115

components exist on the interface between the FDTD subdomains, it is necessary to exchange data betweenneighboring processors. Hence, an interprocessor communication is needed. Recently, the message-passinginterface (MPI) system [3, 4] has been successfully introduced to carry out the interprocessorcommunications.

Another important issue in the FDTD research is the requirements of efficient absorbing boundary condi-tions (ABCs) to truncate open region problems. Berenger’s perfectly matched layer (PML) has been shown tobe one of the most effective FDTD ABCs [5]. Berenger’s PML consists of a lossy and artificial layer with athickness varying from 4 to 12 cells and can be placed generally very close to the radiating structures locatedinside the inner FDTD computational domain. This type of PML is based on splitting each field componentinto two subcomponents. In [6], new parallel algorithm based on the splitted field Berenger’s PML formula-tions has been introduced. Recently, unsplit field parallel PML algorithm, based on the anisotropic PML [7],has been introduced for truncating FDTD domains [8].

In this paper, an alternative unsplit field parallel PML algorithm, based on the MPI library, is introducedfor truncating three dimensional (3-D) FDTD domains. The proposed algorithm is based on the stretchedcoordinate PML formulations [9]. The contribution of this paper is twofold. First, unsplit field PML formu-lations are introduced, where only two additional first order differential equations are needed per each fieldcomponent. Second, the proposed parallel algorithm has the same number of interprocessor communicationoperations as the parallel implementations of the conventional FDTD algorithm, and therefore, the proposedalgorithm can be applied in the PML region at the domian boundaries as well as the inner FDTD computa-tional domain. In the proposed algorithm, the computational domain is divided into subdomains along onedirection by using one-dimensional topology. The validity of the proposed parallel algorithm is shownthrough numerical simulations carried out for a point source radiating in 3-D domains and performed on anetwork of PCs interconnected with Ethernet.

The paper is organized as follows. In Sections 2 and 3, the formulations of the FDTD and the PML algo-rithms are presented, respectively. In Section 4, the proposed parallelization technique is described. Section 5includes the results to validate the proposed parallel algorithm. Finally, a summary and conclusions areincluded in Section 6.

2. The finite difference time domain method

In a linear, homogeneous, isotropic and lossless medium, the normalized Maxwell’s equations can be writ-ten in the frequency domain as

cr�H ¼ jxE ð1Þcr� E ¼ �jxH ð2Þ

where c is the speed of light in vacuum. Eqs. (1) and (2) can be decomposed into a system of six first orderscalar differential equations in terms of the fields Ex, Ey, Ez, Hx, Hy, and Hz. As an example, Ez field compo-nent can be written as

oEz

ot¼ c

oH y

ox� oH x

oy

� �ð3Þ

Using the Yee’s FDTD algorithm [1], the six field components are organized in the Yee’s unit cell as shown inFig. 1: the electric field components lies on the center of the edges of the Yee’s cell and computed at the timet = (n + 1)Dt, where Dt is the time step. On the other hand, the magnetic fields are located at the center of eachface of the Yee’s unit cell and computed at the time t = (n + 1/2)Dt. Applying central-differencing discretiza-tion in both space and time to the continuous differential operators of (3), Ez can be computed as

Enþ1zi;j;kþ1=2

¼ Enzi;j;kþ1=2

þ v H nþ1=2yiþ1=2;j;kþ1=2

� H nþ1=2yi�1=2;j;kþ1=2

� Hnþ1=2xi;jþ1=2;kþ1=2

þ H nþ1=2xi;j�1=2;kþ1=2

h ið4Þ

where D = Dx = Dy = Dz is the space cell size in the x-, y-, and z-directions, and v = cDt/D. Similar formula-tions can be obtained for the other field components.

ΔΔΔΔY

ΔΔΔΔX

ΔΔΔΔZ

(i,j,k) (i,j+1,k)

(i,j,k+1)

Ey

Ey

Ez Ez

Hx

Hx

Ex

Hy

Ey

Hz Ex

Ex

Hy

Ez

Ez

Ey

Ex

Hz

(i-1,j,k)

y

z

x

Fig. 1. Yee’s unit cell.

O. Ramadan / Parallel Computing 33 (2007) 109–115 111

3. PML formulation

Using the stretched coordinate PML formulations [9], the frequency domain modified Maxwell’s equationsin the PML region at the domian boundaries can be written as

crs �H ¼ jxE ð5Þcrs � E ¼ �jxH ð6Þ

where

rs ¼X

g¼x;y;z

ag1

Sg

o

ogð7Þ

where Sg (g = x,y,z) is the PML stretched coordinate variable in the g-direction and chosen as [9]

1

Sg¼ 1

1þ rg=jxe0

¼ 1� rg=e0

jxþ rg=e0

ð8Þ

where rg is the conductivity profile in the PML region along the g-direction. To discretize (5) and (6), consideras an example, the Ez-field component of (5):

jxeEz ¼ c1

Sx

o

oxeH y � c

1

Sy

o

oyeH x ð9Þ

where eEz, eH x, and eH y are the Fourier transform of the corresponding fields. Substituting (8) into (9), the fol-lowing can be obtained [10]


jxeEz ¼ co

oxeH y � ~f zx � c

o

oyeH x þ ~f zy ð10Þ

where the auxiliary variables fzx and fzy are given by [10]:

~f zx ¼ crx=e0

jxþ rx=e0

o

oxeH y ð11Þ

~f zy ¼ cry=e0

jxþ ry=e0

o

oyeH x ð12Þ

Transforming (10) into the time domain using the inverse Fourier transform relation jx! o/ot, the followingcan be obtained:

o

otEz ¼ c

o

oxHy � fzx � c

o

oyHx þ fzy ð13Þ

where fzx and fzy are given by the following first order differential equations obtained by transforming (11) and(12) into the time domain:

o

otfzx þ

rx

e0

fzx ¼ crx

e0

o

oxH y ð14Þ

o

otfzy þ

ry

e0

fzy ¼ cry

e0

o

oyH x ð15Þ

By using the conventional Yee’s FDTD algorithm for discretizing the space and the time derivatives in (13),the following FDTD expression to calculate the Ez field can be obtained:

Enþ1zi;j;kþ1=2

¼ Enzi;j;kþ1=2

þ v H nþ1=2yiþ1=2;j;kþ1=2

� H nþ1=2yi�1=2;j;kþ1=2

� �� v Hnþ1=2

xi;jþ1=2;kþ1=2� H nþ1=2

xi;j�1=2;kþ1=2

� �� f nþ1

zxi;j;kþ1=2þ f nþ1

zyi;j;kþ1=2ð16Þ

where the additional auxiliary variables are obtained by using the standard Yee’s FDTD algorithm for discret-izing (14) and (15) as

f nþ1zxi;j;kþ1=2

¼ r0i fnzxi;j;kþ1=2

þ r1iv Hnþ1=2yiþ1=2;j;kþ1=2

� Hnþ1=2yi�1=2;j;kþ1=2

� �ð17Þ

f nþ1zyi;j;kþ1=2

¼ r0j fnzyi;j;kþ1=2

þ r1jv Hnþ1=2xi;jþ1=2;kþ1=2

� H nþ1=2xi;j�1=2;kþ1=2

� �ð18Þ

where

r0l ¼1� al

1þ aland r1l ¼

2al

1þ al; for l ¼ i; j ð19Þ

with

al ¼Dtrl

2e0

ð20Þ

In the same manner, Hy field component can be calculated as

H nþ1=2yiþ1=2;j;kþ1=2

¼ Hn�1=2yiþ1=2;j;kþ1=2

þ v Enziþ1;j;kþ1=2

� Enzi;j;kþ1=2

� �� v En

xiþ1=2;j;kþ1� En

xiþ1=2;j;k

� �� hnþ1=2

yxiþ1=2;j;kþ1=2þ hnþ1=2

yyiþ1=2;j;kþ1=2

ð21Þ
where the additional auxiliary variables can be updated,
hnþ1=2yxiþ1=2;j;kþ1=2

¼ r0iþ1=2hn�1=2

yxiþ1=2;j;kþ1=2þ r1iþ1=2

v Enziþ1;j;kþ1=2

� Enzi;j;kþ1=2

� �ð22Þ

hnþ1=2yyiþ1=2;j;kþ1=2

¼ r0kþ1=2hn�1=2

yyiþ1=2;j;kþ1=2þ r1kþ1=2

v Enxiþ1=2;j;kþ1

� Enxiþ1=2;j;k

� �ð23Þ

Similar expressions can be obtained for the other field components. It should be pointed out that the addi-tional auxiliary variables, introduced in the above formulations, are zero outside the PML regions. This isbecause the PML conductivity profiles in these regions are zero.


4. Parallelizing the proposed PML algorithm

Using the one-dimensional topology, the computational domain is divided into subdomains along the x-direction, where each subdomain is assigned to one processor, as shown in Fig. 2. To calculate the field com-ponents at the subdomain boundaries, data from the neighboring subdomains are needed. In this paper, theMPI system is used to exchange data between processors. A complete detail of the MPI is provided in [3,4]. Tocalculate Ez using (16) at the cells located at the left boundary of a subdomain, the values of Hy from the sub-domain on its left are needed. Similarly, this subdomain must send the values of Hy at cells located at the rightboundary to the subdomain on its right. To calculate Hy using (21) at the cells located at the right boundary ofthe subdomain, the values of Ez from the right subdomain are needed. Also, this subdomain should send thevalues of Ez at the cells located at the left boundary to the left subdomain. Fig. 3 shows the data to beexchanged between neighboring subdomains in order to parallelize the proposed PML formulations. Basedon Fig. 3, the parallel implementation of the PML algorithm can be summarized as:

1. MPI initialization.2. Reading the simulation parameters.3. Divide the computational domain into subdomains.4. At each time step:

4.1. Calculate Ex, Ey and Ez field components and the additional auxiliary variables.4.2. Communicate Ey and Ez at the subdomain boundaries.4.3. Calculate Hx, Hy and Hz field components and the additional auxiliary variables.4.4. Communicate Hy and Hz at the subdomain boundaries.

5. MPI finalization.

zy

P0 P1 PN-1 PN x

Fig. 2. Computational domain partitioning.

Receive Hy, Hz

Send Ey, Ez

Subdomain left boundary cells Subdomain right boundary cells

X

Y

Receive Ey, Ez

Send Hy, Hz

Fig. 3. Communications at the boundaries of a subdomain for the proposed parallel PML algorithm.


It should be pointed out that the computation of Ex, and Hx at the subdomain boundaries can be obtainedwithout the need for interprocessor communications. This is due to the fact that all information needed tocompute these fields are available in the subdomain because the Maxwell’s equations for these fields do notinclude spatial derivative normal to the subdomain boundaries, i.e., o/ox. Also, it is interesting to note thatthe proposed PML parallel algorithm involves the same number of interprocessor communication operationsas the conventional FDTD algorithm. Hence, the proposed algorithm can be applied in the PML region at thedomain boundaries as well as in the inner FDTD domain.

5. Simulation study

To validate the proposed formulations, numerical tests have been carried out for a 3-D radiation problem.A point source excites a 3-D domain at its center. The cell size was chosen as D = Dx = Dy = Dz = 15 mm andthe time step was taken as Dt = 25 ps. The excitation pulse used in this test was chosen to be similar to thepulse used in [11] and it is defined as

TableSimula

Grid s

300 · 2600 · 2900 · 2

Enz ¼

að10� 15 cos x1nþ 6 cos x2n� cos x3nÞ n 6 s

0 n > s

�ð24Þ

where a ¼ 132

, s = 10�9 s, n = nDt, and xm ¼ 2pms , m = 1,2,3. The tests were carried out for different number of

processors and for different computational domain sizes and performed on a network of Pentium IV PCs run-ning with 256 MB of memory. The PCs were interconnected with a 10 Mbit/s Ethernet. Table 1 shows the sim-ulation time (in seconds) for the proposed parallel PML algorithm as obtained by using one, four and eightPCs for three different domain sizes. As can be seen from Table 1, the parallel implementations of the pro-posed PML algorithm provides a significant reduction in the simulation time compared with the serialPML implementations.

1tion time of the proposed parallel PML algorithm for different grid sizes

ize (cells) Number of PCs

1 PCs 4 PCs 8 PCs

0 · 20 34.509 20.622 19.3780 · 20 69.361 28.115 22.9370 · 20 112.250 37.2 27.458

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

0 1 2 3 4 5 6 7 8 9 10

Number of PCs

Sp

eed

up

Ideal

300X20X20

600X20X20

900X20X20

Fig. 4. Speedup for proposed parallel algorithm.


To measure the performance of the parallel algorithms, the speedup was calculated as

SðNÞ ¼ T ð1Þ=T ðNÞ ð25Þ
where T(1) is the time needed to solve the problem using one PC and T(N) is the time needed to solve the sameproblem using N PCs. Fig. 4 shows the speedup obtained with four and eight PCs for the parallel implemen-tation of the proposed PML formulations. For the purpose of comparison, the ideal speedup was also shown.From Fig. 4, it can be observed that as the computational domain size increases, the efficiency of the parallelalgorithm increases. On the other hand, when the computational domains are partitioned over many proces-sors, especially for the small domains, the efficiency of the parallelization will reach a limitation. This is be-cause the communication time needed to perform the data exchange between the processors becomescomparable to the computational time needed to compute the field components.
6. Conclusion

In this paper, an efficient parallel PML algorithm, based on the MPI library, has been implemented fortruncating 3-D FDTD problems. In this algorithm, the field equations in the PML region are not splittedand only two additional first order differential equations are needed per field component. The performanceof the proposed parallel algorithm has been studied by using a point source radiating in 3-D domains. Ithas been observed that the proposed parallel PML algorithm requires the same number of interporcessor com-munications as the conventional parallel FDTD algorithms. Hence, the proposed algorithm can be applied inthe inner FDTD domain as well as the PML boundary layer. Furthermore, it must be pointed out that whenthe computational domains are partitioned over many processors, especially for the small domains, the effi-ciency of the parallelization will reach a limitation because the communication time between the processorsbecomes comparable to the computational time of the field components.

References

[1] K.S. Yee, Numerical solution of initial boundary value problems involving Maxwell’s equations in isotropic media, IEEE Transactionon Antennas and Propagation 14 (1966) 302–307.

[2] A. Taflove, Computational Electrodynamics: the Finite-Difference Time-Domain Method, Artech House, Boston, London, 1995.[3] W. Gropp, E. Lusk, A. Skjellum, Using MPI: Potable parallel Programming with the Message-Passing Interface, MIT Press,

Cambridge, Mass, 1994.[4] P.S. Pacheco, Parallel Programming with MPI, Morgan Kaufmann Publishers, San Francisco, CA, 1997.[5] J.P. Berenger, A perfectly matched layer for the absorption of electromagnetic waves, Journal of Computational Physics 114 (1994)

185–200.[6] H. Hoteit, R. Sauleau, B. Philippe, P. Coquet, J.P. Daniel, Vector and parallel implementations for the FDTD analysis of millimeter

wave planar antennas, International Journal of High Speed Computing 10 (1999) 1–25.[7] S.D. Gedney, An anisotropic perfectly matched layer absorbing medium for the truncation of FDTD lattices, IEEE Transactions on

Antennas and Propagation 44 (1996) 1630–1639.[8] C. Guiffaut, K. Mahdjoubi, A parallel FDTD algorithm using the MPI library, IEEE Antennas and Propagation Magazine 43 (2001)

94–103.[9] W.C. Chew, W.H. Weedon, A 3-D perfectly matched medium from modified Maxwell’s equations with stretched coordinates,

Microwave and Optical Technology Letters 7 (1994) 599–604.[10] O. Ramadan, Auxiliary differential equation formulation: an efficient implementation of the perfectly matched layer, IEEE

Microwave and Wireless Components Letters 13 (2003) 69–71.[11] P.A. Tirkas, C.A. Balanis, R.A. Renaut, Higher order absorbing boundary conditions for the finite-difference time-domain method,

IEEE Transaction on Antennas and Propagation 40 (1992) 1215–1222.