TG08TG08 [email protected]@UHCL.edu 11
Toward Parallel Space Radiation Analysis
Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni,
Ahmed Khan, Sergio J. Larrondo, Susan Strausser,
Travis Gilbert, Victor Shum, Romeo ChuaUniversity of Houston Clear Lake
TG08TG08 [email protected]@UHCL.edu 22
Runtime Profile Of HZETRN Functions
texp, 2.91
powf.J, 1.35
prpgt_, 0.93
od_, 0.73
cvtas_s_to_a, 0.73
PHI (Interpolation Function Most Time
Spent Here at: 34.5% of total
runtime)
13,220,184 calls made to this function over
program run!
Remaining Functions, 9.03
expf.J, 8.72%
iuni 4.36
prpli_, 1.97
logf.J, 30.43%
anu, 4.26
phi
logf.J
expf.J
iuni_
anu
texp
prpli_
powf.J
prpgt_
od_
cvtas_s_to_a
Remaining Functions
This project continues Space Radiation Research work preformed last year by Dr. Liwen Shih’s students to investigate HZETRN code optimization options.
This semester we will analyze HZETRN code using standard static analysis tools and runtime analysis tools. In addition we will examine code parallelization options for the most called numerical method in the source code: the PHI function.
TG08TG08 [email protected]@UHCL.edu 33
What is Space Radiation?What is Space Radiation?
Two major sources galactic cosmic rays (GCR) solar energetic particles (SEP).
GCR are ever-present and more energetic, thus they are able to penetrate much thicker materials than SEP.
In order to evaluate the space radiation risk and design the spacecraft and habitat for better radiation protection, space radiation transport codes, which depends on the input physics of nuclear interactions, have been developed
TG08TG08 [email protected]@UHCL.edu 44
Space Radiation and the Space Radiation and the EarthEarth
Earth protected from Space Radiation
Animation Sources: Rice University, Connections Program.
This image shows how the Earth's magnetic field causes electrons to drift one way about the Earth. Protons drift the opposite direction.
original clips provided courtesy of Professor Patricia Reiff,
Rice University, Connections Program
TG08TG08 [email protected]@UHCL.edu 55
What about Galactic Cosmic Radiation What about Galactic Cosmic Radiation (GCR)?(GCR)?
A typical high energy particle of A typical high energy particle of radiation found in the space radiation found in the space environment is ionized itself and environment is ionized itself and as it passes through material as it passes through material such as human tissue it disrupts such as human tissue it disrupts the electronic clouds of the the electronic clouds of the constituent molecules and leaves constituent molecules and leaves a path of ionization in its wake. a path of ionization in its wake. These particles are either singly These particles are either singly charged protons or more highly charged protons or more highly charged nuclei called "HZE" charged nuclei called "HZE" particles.particles.
TG08TG08 [email protected]@UHCL.edu 66
HZETRN - HZETRN - Space Radiation Space Radiation Nuclear Transport CodeNuclear Transport Code
The three included source code files are:
1-NUCFRAG.FOR for generating nuclear absorption and reaction cross sections
2-GEOMAG.FOR for defining the GCR
transmission coefficient cutoff effects within the magnetosphere.
3-HZETRN.FOR for propagating the user defined GCR environments through two
layers of user supplied materials. The current version is setup to propagate through aluminum, tissue (H2O), CH2 and LH2.
HZETRN : High Charge and HZETRN : High Charge and Energy Nuclear Transport Energy Nuclear Transport
CodeCode
FORTRAN-77 FORTRAN-77 Written: 1992Written: 1992
Environment: VAX mainframeEnvironment: VAX mainframe
Code Metrics:Code Metrics:
Files:Files: 3 3Lines:Lines:
96659665Code Lines:Code Lines:
68036803Comment Lines:Comment Lines:
28592859Declarative Statements: 780Declarative Statements: 780Executable Statements: 6563Executable Statements: 6563Ratio Comment/Code: 0.42Ratio Comment/Code: 0.42
TG08TG08 [email protected]@UHCL.edu 77
HZETRN Numerical MethodHZETRN Numerical Method
TG08TG08 [email protected]@UHCL.edu 88
HZETRN Calculates:HZETRN Calculates:
Radiation Fluence of HZE particles:Radiation Fluence of HZE particles:time-integrated flux of HZE particles per unit area.time-integrated flux of HZE particles per unit area.
Energy absorbed per gram:Energy absorbed per gram:first measuring energy amount left behind by first measuring energy amount left behind by
radiation in question and, then, amount and type radiation in question and, then, amount and type of material.of material.
Dose Equivalent:Dose Equivalent:A unit of dose equivalent A unit of dose equivalent amount of any type of amount of any type of
radiation absorbed in a biological tissue as a radiation absorbed in a biological tissue as a standardized valuestandardized value
TG08TG08 [email protected]@UHCL.edu 99
HZETRN AlgorithmHZETRN Algorithm
TG08TG08 [email protected]@UHCL.edu 1010
HZETRN used for Mars HZETRN used for Mars Mission Mission
Thus, protection from the hazards of severe space radiation is of paramount importance for the new vision. There is an overwhelming emphasis on the reliability issues for the mission and the habitat. Accurate risk assessments critically depend on the accuracy of the input information about the interaction of ions with materials, electronics and tissues.
NASA has a new vision for space exploration in the 21st Century encompassing a broad range of human and robotic missions including missions to Moon, Mars and beyond. As a result, there is a focus on long duration space missions. NASA, as much as ever, is committed to the safety of the missions and the crew. Exposure from the hazards of severe space radiation in deep space long duration missions is ‘the show stopper.’
TG08TG08 [email protected]@UHCL.edu 1111
Martian Radiation Climate Martian Radiation Climate Modeling Using HZETRN CodeModeling Using HZETRN Code
Calculations of the skin dose equivalent for astronauts on the surface of Mars near solar minimum.
The variation in the dose with respect to altitude is shown.
Higher altitudes (such as Olympus Mons) offer less shielding.
Mars Radiation Environment (Source Wilson et al: http://marie.jsc.nasa.gov)
TG08TG08 [email protected]@UHCL.edu 1212
HZETRN Model vs. Actual Mars Radiation HZETRN Model vs. Actual Mars Radiation Climate HZETRN Climate HZETRN underestimatesunderestimates!!
Graph Source: Aliena Spazio European Space Agency Report 2004
Dose rate measured byMARIE spacecraft in the transit period
from April 2001 to August 2001 compared with HZETRN Calculated Doses
Code calculationsSpike in May due to SPEDifferences between theobserved (red) andpredicted (black) dosesvary from factor 1 to 3
Partly Because of Code Partly Because of Code Inefficiency Dosage Inefficiency Dosage Data is Data is underestimated underestimated
TG08TG08 [email protected]@UHCL.edu 1313
Project Goal: Project Goal: SpeedupSpeedup of Runtime via Analysis and of Runtime via Analysis and modification of HZETRN Code numerical algorithmmodification of HZETRN Code numerical algorithm
Runtime Profile Of HZETRN Functions
texp, 2.91
powf.J, 1.35
prpgt_, 0.93
od_, 0.73
cvtas_s_to_a, 0.73
PHI (Interpolation Function Most Time
Spent Here at: 34.5% of total
runtime)
13,220,184 calls made to this function over
program run!
Remaining Functions, 9.03
expf.J, 8.72%
iuni 4.36
prpli_, 1.97
logf.J, 30.43%
anu, 4.26
phi
logf.J
expf.J
iuni_
anu
texp
prpli_
powf.J
prpgt_
od_
cvtas_s_to_a
Remaining Functions
The major Space Radiation Code Bottleneck lies inside the function call to the PHI interpolation function
TG08TG08 [email protected]@UHCL.edu 1414
Code Optimization OptionsCode Optimization Options4028 C ************************************************************** 4028 C ************************************************************** 4029 C 4029 C 4030 FUNCTION PHI(R0,N,R,P,X)4030 FUNCTION PHI(R0,N,R,P,X) 4031 C4031 C 4032 C FUNCTION PHI INTERPOLATES IN P(N) ARRAY DEFINED OVER R(N) 4032 C FUNCTION PHI INTERPOLATES IN P(N) ARRAY DEFINED OVER R(N)
ARRAY ARRAY 4033 C ASSUMES P IS LIKE A POWER OF R OVER SUBINTERVALS4033 C ASSUMES P IS LIKE A POWER OF R OVER SUBINTERVALS 4034 C 4034 C 4035 DIMENSION R(N),P(N)4035 DIMENSION R(N),P(N) 4036 C4036 C 4037 SAVE4037 SAVE 4038 C4038 C 4039 XT=X4039 XT=X 4040 PHI=P(1)4040 PHI=P(1) 4041 INC=((R(2)-R(1))/ABS(R(2)-R(1)))*1.014041 INC=((R(2)-R(1))/ABS(R(2)-R(1)))*1.01 4042 IF(X.LE.R(1).AND.R(1).LT.R(2))RETURN4042 IF(X.LE.R(1).AND.R(1).LT.R(2))RETURN 4043 C4043 C 4044 DO 1 I=3,N-14044 DO 1 I=3,N-1 4045 IL=I4045 IL=I 4046 IF(XT*INC.LT.R(I)*INC)GO TO 24046 IF(XT*INC.LT.R(I)*INC)GO TO 2 4047 1 CONTINUE4047 1 CONTINUE 4048 C4048 C 4049 IL=N-14049 IL=N-1 4050 2 CONTINUE4050 2 CONTINUE 4051 PHI=0.4051 PHI=0.
1. Fix Inefficient code
2. Fix/Remove unnecessary function calls (TEXP) SAVE, and dummy arguments
3. Use optimized ALOG function
4. Use Lookup Table instead
5. Investigate Parallelization Of Interpolation Statements
Link to HZETRN
TG08TG08 [email protected]@UHCL.edu 1515
Code Optimization Code Optimization Improve Code Structure
USE FASTER ALOG function (LOG)
Remove extraneous Function Calls
TG08TG08 [email protected]@UHCL.edu 1616
Steps toward a Steps toward a fasterfaster HZETRNHZETRN
Step Purpose Result
1. Review Algorithm Understand underlying numerical algorithm
HZETRN algorithm is complex – Needs further review –overall functions of
code are understood
2. Analyze Source Code and Data files
Understand code structure and function Review of Code and data files reveals that much of the code is inefficient, with redundant elements and archaic
structure Data files contain sparse matrices amenable to performance improvement
3. Portability Study Attempt to port HZETRN code To various HPC platforms and compilers
Portability study revealed problems with code and additional requirements for optimization
4. Static Analysis Develop understanding of program structure –Document code for optimization and report
We generated a detailed HTML report documenting HZETRN source code functions and structure of subroutine calls
5. Runtime Analysis Target runtime bottlenecks and determine most called functions/subroutines
Revealed that the PHI interpolation function is the major bottleneck function
\The natural logarithm intrinsic function Is also a performance issue
6. Serial Optimization of Code
Starting with the PHI function We removed extraneous function calls,
cleaned up ‘messy code’
Resulted in Runtime Performance improvement
(initially a 10% overall increase)
TG08TG08 [email protected]@UHCL.edu 1717
Parallel Space Radiation Parallel Space Radiation
Analysis Analysis The goal of project was to speed up The goal of project was to speed up
the execution of the HZETRN code the execution of the HZETRN code
using parallel processing.using parallel processing.
The Message Passing Interface (MPI) The Message Passing Interface (MPI)
standard library was to be used to standard library was to be used to
perform the parallel processing across perform the parallel processing across
a cluster with distributed memory.a cluster with distributed memory.
TG08TG08 [email protected]@UHCL.edu 1818
Computing Resources UsedComputing Resources Used Itanium 2 cluster (Itanium 2 cluster (AtlantisAtlantis) - ) - Texas Learning & Computation Texas Learning & Computation
CenterCenter (TLC (TLC22) at the University of Houston.) at the University of Houston.
AtlantisAtlantis is a cluster of 152 dual Itanium2 (1.3 GHz) compute is a cluster of 152 dual Itanium2 (1.3 GHz) compute
nodes networked via a Myrinet 2000 interconnect. nodes networked via a Myrinet 2000 interconnect. AtlantisAtlantis
is running RedHat Linux version 5.1.is running RedHat Linux version 5.1.
The Intel Fortran compiler (version 10.0) and OpenMPI (an The Intel Fortran compiler (version 10.0) and OpenMPI (an
Open Source MPI-2 implementation) of MPI is being used.Open Source MPI-2 implementation) of MPI is being used.
In addition, a home PC running Linux (Ubuntu 7.10) with the In addition, a home PC running Linux (Ubuntu 7.10) with the
Sun Studio 12 Fortran 90 compiler and MPICH2 was used. Sun Studio 12 Fortran 90 compiler and MPICH2 was used.
TeraGrid has just started been usedTeraGrid has just started been used
TG08TG08 [email protected]@UHCL.edu 1919
PHI Routine (Lagrangian PHI Routine (Lagrangian
Interploation)Interploation) Figure showing HZETRN runtime profileFigure showing HZETRN runtime profile
Most time is spent by function PHI Most time is spent by function PHI
- 3- 3rdrd order Lagrangian Interpolation. order Lagrangian Interpolation.
PHI function is heavily called by the PHI function is heavily called by the
propagation and integration routines propagation and integration routines
-called 229,380 times at each depth -called 229,380 times at each depth
typically.typically.
Early focus - optimizing PHI routine.Early focus - optimizing PHI routine.
The PHI routine takes the The PHI routine takes the natural lognatural log of of
the input ordinate and abscissas prior the input ordinate and abscissas prior
to peforming the Lagrangian to peforming the Lagrangian
interpolation and returns the interpolation and returns the
exponentialexponential of the interpolated of the interpolated
ordinate.ordinate.
(Source: Shih, Larrondo, et al, HIgh-Performance
Martian Space Radiation Mapping,
NASA/UHCL/UH-ISSO, pp. 121-122)
Runtime Profile Of HZETRN Functions
texp, 2.91
powf.J, 1.35
prpgt_, 0.93
od_, 0.73
cvtas_s_to_a, 0.73
PHI (Interpolation Function Most Time
Spent Here at: 34.5% of total
runtime)
13,220,184 calls made to this function over
program run!
Remaining Functions, 9.03
expf.J, 8.72%
iuni 4.36
prpli_, 1.97
logf.J, 30.43%
anu, 4.26
phi
logf.J
expf.J
iuni_
anu
texp
prpli_
powf.J
prpgt_
od_
cvtas_s_to_a
Remaining Functions
Removing the calls to the natural log and exponential functions resulted in a 21%
(Atlantis) to 45% (home) speedup, but had negative impact on numerical results (see
next page) since the the functions being interpolated are logarithmic.
TG08TG08 [email protected]@UHCL.edu 2020
PHI Routine - PHI Routine - Needs LOG/TEXPNeeds LOG/TEXPSignificant different comparing results with and without calls to LOG/TEXP
TG08TG08 [email protected]@UHCL.edu 2121
PHI Routine OptimizationPHI Routine Optimization Bottleneck PHI routine being Bottleneck PHI routine being called so heavilycalled so heavily, message , message
passing overhead to parallelize would be passing overhead to parallelize would be prohibitiveprohibitive..
Simple Simple code optimizationscode optimizations of PHI routine resulted in: of PHI routine resulted in:
– 11.4 % speedup on home PC running Linux compiled 11.4 % speedup on home PC running Linux compiled
using the Sun Studio 12 Fortran compiler.using the Sun Studio 12 Fortran compiler.
– 3.85% speedup on an Atlantis node using the Intel 3.85% speedup on an Atlantis node using the Intel
Fortran compiler.Fortran compiler.
– Reduced speedup on Atlantis may be that the Reduced speedup on Atlantis may be that the Intel Intel
compilercompiler was already generating more optimized was already generating more optimized
code.code.
TG08TG08 [email protected]@UHCL.edu 2222
PHI Routine FPGA PrototypePHI Routine FPGA Prototype
Implementing bottleneck Implementing bottleneck
routinesroutines: PHI routine, and/or : PHI routine, and/or
logarithm/exponential logarithm/exponential
routines routines in an in an FPGAFPGA could could
result in a significant result in a significant
speedupspeedup. .
A reduced precision floating-A reduced precision floating-
point FPGA prototype was point FPGA prototype was
developed for an estimated developed for an estimated
~325 times faster PHI ~325 times faster PHI
computation in hardwarecomputation in hardware..
TG08TG08 [email protected]@UHCL.edu 2323
HZETRN Main Program FlowHZETRN Main Program FlowBasic flow of HZETRNBasic flow of HZETRN::
– Step 1: Call MATTER to obtain the material property (density, Step 1: Call MATTER to obtain the material property (density,
atomic weight and atomic number of each element) of the shield.atomic weight and atomic number of each element) of the shield.
– Step 2: Generate the energy grid.Step 2: Generate the energy grid.
– Step 3: Dosemetry and propagation in the shield materialStep 3: Dosemetry and propagation in the shield material
Call DMETRIC to compute dosemetic quantities at current Call DMETRIC to compute dosemetic quantities at current
depth.depth.
Call PRPGT to propagate the GCR's to the next depthCall PRPGT to propagate the GCR's to the next depth
Repeat step 3 until target material is reachedRepeat step 3 until target material is reached
– Step 4: Dosemetry and propagation in the target materialStep 4: Dosemetry and propagation in the target material
Call DMETRIC to compute dosemetric quantities at current Call DMETRIC to compute dosemetric quantities at current
depth.depth.
Call PRPGT to propagate the GCR's to the next depthCall PRPGT to propagate the GCR's to the next depth
Repeat step 4 until required depth is reached.Repeat step 4 until required depth is reached.
TG08TG08 [email protected]@UHCL.edu 2424
DMETRIC RoutineDMETRIC Routine The suboutine DMETRIC is called by the main program at The suboutine DMETRIC is called by the main program at
each user specified depth in the shield and target to compute each user specified depth in the shield and target to compute
dosimetric quantities.dosimetric quantities.
Their are 6 main Their are 6 main do-loopsdo-loops in the routine. Approximately 60% in the routine. Approximately 60%
of DMETRICs processing time is spent in loop 2 and 39% of of DMETRICs processing time is spent in loop 2 and 39% of
DMETRICs processing time is spent in loop 5.DMETRICs processing time is spent in loop 5.
To check whether the above loop could be done in parallel, To check whether the above loop could be done in parallel,
the the order of the looporder of the loop was reversedwas reversed to test for data to test for data
dependency. dependency.
The results were identical The results were identical there was there was no data dependency no data dependency
between the dosemetric calculations for each isotopebetween the dosemetric calculations for each isotope. .
TG08TG08 [email protected]@UHCL.edu 2525
DMETRIC RoutineDMETRIC Routine - Dependent? - Dependent? To determine if loop 5 is parallelizable, To determine if loop 5 is parallelizable, the outer the outer
loop was firstloop was first changed to decrement from changed to decrement from IIII to 1 to 1
rather than from 1 to rather than from 1 to IIII. The results were . The results were
identical identical outer loop of loop 5 should be outer loop of loop 5 should be
parallelizableparallelizable..
Next the inner loop was changed to decrement Next the inner loop was changed to decrement
from from IJIJ to 2 rather than from 2 to to 2 rather than from 2 to IJIJ. . Differences Differences
appear in the last significant digitappear in the last significant digit (see next page). (see next page).
These differences are due to These differences are due to floating point floating point
rounding differencesrounding differences during four summations. during four summations.
TG08TG08 [email protected]@UHCL.edu 2626
DMETRIC RoutineDMETRIC Routine - Not - Not
DependentDependent Minor results difference changing order of inner loop of loop 5Minor results difference changing order of inner loop of loop 5
TG08TG08 [email protected]@UHCL.edu 2727
Parallel DMETRIC RoutineParallel DMETRIC Routine Since there is Since there is no data dependecy in the dosemetric no data dependecy in the dosemetric
calculations for each of the 59 isotopescalculations for each of the 59 isotopes, these computations , these computations
could be done in parallel.could be done in parallel.
Statements (using MPI's wall-time function: MPI_WTIME) were Statements (using MPI's wall-time function: MPI_WTIME) were
inserted to measure the amount of time spent in each inserted to measure the amount of time spent in each
subroutine.subroutine.
Approximately Approximately 17%17% of the processing of the processing timetime is spent in is spent in
subroutine subroutine DMETRICDMETRIC while about while about 82%82% of the processing time is of the processing time is
spent in subroutine spent in subroutine PRPGTPRPGT and less than and less than 1%1% of the processing of the processing
time is spent in the time is spent in the remainder remainder of the program.of the program.
Assuming infinite Assuming infinite parallelization of DMETRIC, the maximum parallelization of DMETRIC, the maximum
speedup obtained would be up to 17%.speedup obtained would be up to 17%.
TG08TG08 [email protected]@UHCL.edu 2828
PRPGT RoutinePRPGT Routine PRPGT - propagate GCR's through the shielding and the target.PRPGT - propagate GCR's through the shielding and the target.
~ 82% of HZETRN processing is spent in PRPGT or routines it ~ 82% of HZETRN processing is spent in PRPGT or routines it
calls.calls.
At each propagation step from one depth to the next in the At each propagation step from one depth to the next in the
shield or target, the propagation for each of the 59 isotopes is shield or target, the propagation for each of the 59 isotopes is
performed in two stages:performed in two stages:
– The first stage computes the energy shift due to propagationThe first stage computes the energy shift due to propagation
– The second stage computes the attenuation and the The second stage computes the attenuation and the
secondary particle production due to collisionssecondary particle production due to collisionsTo test whether the propagation for each of the 59 ions could be done in parallel, the loop was broken up into four pieces (a J loop from 20 to 30, from 1 to 19, from 41 to 59, and from 31 to 40).If the loop can be performed in parallel, then the results from these four loops should be the same as the single loop from 1 to 59.
TG08TG08 [email protected]@UHCL.edu 2929
PRPGT Routine PRPGT Routine - Check - Check
DependencyDependency The following compares the results of breaking up main loop into four The following compares the results of breaking up main loop into four
loops (on the left) with the original results.loops (on the left) with the original results.
Significant different results demonstrate that the propagation can not be parallelized
for each of the 59 ions.
TG08TG08 [email protected]@UHCL.edu 3030
PRPGT Routine PRPGT Routine - Data - Data
DependentDependent Identical to original results reversing inner 1Identical to original results reversing inner 1stst and 2 and 2ndnd stage I loops stage I loops
possible to parallelize the 1possible to parallelize the 1stst or 2 or 2ndnd stages stages..
However, to test data dependence from the 1However, to test data dependence from the 1stst stage to the 2 stage to the 2ndnd
stage, the main J loop was divided into two loops (one for the 1stage, the main J loop was divided into two loops (one for the 1stst
stage and one for the 2stage and one for the 2ndnd stage) stage)
Results changed Results changed the 2the 2ndnd stage is dependent on the 1 stage is dependent on the 1stst stage stage
A barrier to prevent execution of the 2A barrier to prevent execution of the 2ndnd stage until the 1 stage until the 1stst stage stage
completescompletes
24% of the HZETRN processing is spent on the 124% of the HZETRN processing is spent on the 1stst stage while less stage while less
than 2% of the time is spent on the 2than 2% of the time is spent on the 2ndnd stage. Therefore, parallel stage. Therefore, parallel
processing of both stages does not appear worthwhileprocessing of both stages does not appear worthwhile..
TG08TG08 [email protected]@UHCL.edu 3131
Parallel PRPLI RoutineParallel PRPLI Routine PRPLI is called by PRPGT after the 1PRPLI is called by PRPGT after the 1stst and 2 and 2ndnd stage propagation stage propagation
has been completed for each of the 59 isotopes.has been completed for each of the 59 isotopes.
PRPLI performs the propagation of the six light ions (ions Z < 5).PRPLI performs the propagation of the six light ions (ions Z < 5).
~ ~ 53%53% of total HZETRN of total HZETRN time is spent on light ions propagation.time is spent on light ions propagation.
PRPLI propagates 45 x 6 fluence (# particles intersect a unit PRPLI propagates 45 x 6 fluence (# particles intersect a unit
area) matrix (45 energy points for each of the 6 light ions) area) matrix (45 energy points for each of the 6 light ions)
named PSI.named PSI.
Analysis of the has shown that there is Analysis of the has shown that there is no data dependency no data dependency
among the energy grid pointsamong the energy grid points..
It should, therefore, be It should, therefore, be possible to parallelize the PRPLI code possible to parallelize the PRPLI code
across the 45 energy grid pointsacross the 45 energy grid points..
TG08TG08 [email protected]@UHCL.edu 3232
General HZETRN General HZETRN
RecommendationsRecommendations
Arrays in Arrays in FortranFortran are stored in are stored in column-ordercolumn-order. .
more effecient to access in column order, rather more effecient to access in column order, rather
that row-orderthat row-order. .
HZETRN is using an old Fortran technique of
alternate entry points.
The use of alternate entry points is discouraged.
HZETRN uses COMMON blocks for global memory.
Fortran-90 MODULES should be used instead.
TG08TG08 [email protected]@UHCL.edu 3333
Conclusions & Future WorkConclusions & Future Work
HZETRN performance, written in Fortran 77 in the HZETRN performance, written in Fortran 77 in the
early 1990's, can be improved via simple code early 1990's, can be improved via simple code
optimizations and parallel processing using MPI optimizations and parallel processing using MPI
Maximum 50% speedup with current HZETRN Maximum 50% speedup with current HZETRN
expected expected
Additional performance improvements could be Additional performance improvements could be
obtained by implementing the 3obtained by implementing the 3rdrd Order Order
Lagrangian Interpolation routine (PHI), or the Lagrangian Interpolation routine (PHI), or the
natural log (LOG) and exponential (TEXP) functions natural log (LOG) and exponential (TEXP) functions
on a FPGA.on a FPGA.
TG08TG08 [email protected]@UHCL.edu 3434
ReferencesReferences J.W. Wilson, F.F. Badavi, F. A. Cucinotta, J.L. Shinn, G.D. Badhwar, R. Silberberg, C.H. Tsao, L.W. J.W. Wilson, F.F. Badavi, F. A. Cucinotta, J.L. Shinn, G.D. Badhwar, R. Silberberg, C.H. Tsao, L.W.
Townsend, R.K. Tripathi, Townsend, R.K. Tripathi, HZETRN: Description of a Free-Space Ion and Nucleon Transport Shielding HZETRN: Description of a Free-Space Ion and Nucleon Transport Shielding
Computer ProgramComputer Program, NASA Technical Paper 3495, May 1995., NASA Technical Paper 3495, May 1995.
J. W. Wilson, J.L. Shinn, R. C. Singleterry, H. Tai, S. A. Thibeault, L.C. Simmons, J. W. Wilson, J.L. Shinn, R. C. Singleterry, H. Tai, S. A. Thibeault, L.C. Simmons, Improved Spacecraft Improved Spacecraft
Materials for Radiation ShieldingMaterials for Radiation Shielding, NASA Langley Research Center. , NASA Langley Research Center.
spacesciene.spaceref.com/colloquia/mmsm/wilson_pos.pdfspacesciene.spaceref.com/colloquia/mmsm/wilson_pos.pdf
NASA Facts: Understanding Space RadiationNASA Facts: Understanding Space Radiation, FS-2002-10-080-JSC, October 2002., FS-2002-10-080-JSC, October 2002.
P. S. Pacheco, P. S. Pacheco, Parallel Programming with MPIParallel Programming with MPI, Morgan Kaufmann Publishers Inc.: San Francisso, , Morgan Kaufmann Publishers Inc.: San Francisso,
1997.1997.
S. J. Chapman,S. J. Chapman, Fortran 90/95 for Scientists and Engineers Fortran 90/95 for Scientists and Engineers, 2, 2ndnd edition. McGraw Hill: New York, 2004. edition. McGraw Hill: New York, 2004.
L. Shih, S. Larrondo, K. Katikaneni, A. Khan, T. Gilbert, S. Kodali, A. Kadari, L. Shih, S. Larrondo, K. Katikaneni, A. Khan, T. Gilbert, S. Kodali, A. Kadari, HIgh Performance Martian HIgh Performance Martian
Space Radiation MappingSpace Radiation Mapping, NASA/UHCL/UH_ISSO, pp. 121-122., NASA/UHCL/UH_ISSO, pp. 121-122.
L. Shih, L. Shih, Efficient Space Radiation Computation with Parallel FPGAEfficient Space Radiation Computation with Parallel FPGA, Y2006 – ISSO Annual Report, pp. , Y2006 – ISSO Annual Report, pp.
56-61.56-61. Gilbert, T. and L. Shih. "High-Performance Martian Space Radiation Mapping," IEEE/ACM/UHCL Gilbert, T. and L. Shih. "High-Performance Martian Space Radiation Mapping," IEEE/ACM/UHCL
Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005.Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005.
Kadari, A.. S. Kodali, T. Gilbert, and L. Shih. "Space Radiation Analysis with FPGA," IEEE/ACM/UHCL Kadari, A.. S. Kodali, T. Gilbert, and L. Shih. "Space Radiation Analysis with FPGA," IEEE/ACM/UHCL Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005.Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005.
F. A. Cucinotta, "Space Radiation Biology," NASA-M. D. Anderson Cancer Center Mini-Retreat, Jan. 25, F. A. Cucinotta, "Space Radiation Biology," NASA-M. D. Anderson Cancer Center Mini-Retreat, Jan. 25, 2002 <2002 <http://advtech.jsc.nasa.gov/presentation_portal.shtmhttp://advtech.jsc.nasa.gov/presentation_portal.shtm>.>.
Space Radiation Health Project, May 3, 2005, NASA-JSC, March 7, 2005 <Space Radiation Health Project, May 3, 2005, NASA-JSC, March 7, 2005 <http://srhp.jsc.nasa.gov/http://srhp.jsc.nasa.gov/> >
TG08TG08 [email protected]@UHCL.edu 3535
AcknowledgementsAcknowledgements NASA LaRC -NASA LaRC - Robert C. Singleterry JrRobert C. Singleterry Jr, PhD, PhD NASA JSC/CARR PVA&M -NASA JSC/CARR PVA&M - Premkumar B. SagantiPremkumar B. Saganti, PhD, PhD TeraGrid, TACC TeraGrid, TACC TLC2 -TLC2 - Mark HuangMark Huang & Erik & Erik EngquistEngquist Texas Space Grant Consortium ISSOTexas Space Grant Consortium ISSO
Top Related