Efficient solution of the n-body problem
description
Transcript of Efficient solution of the n-body problem
Honor Thesis Presentation
Mike Bantegui, Hofstra University
Advisor: Dr. Xiang Fu, Hofstra University
EFFICIENT SOLUTION OF THE N-BODY PROBLEM
OUTLINE• The N-Body Problem
• Computational challenges
• Results
• Existing work
• IGS Framework
• Extensions
THE N-BODY PROBLEM• Given mass, position and velocities of N bodies at some time t
• Know pairwise forces using
• Determine evolution of the system of bodies
• Used to study Stellar Dynamics
THE N-BODY PROBLEM CONT.• Must integrate 3N coupled nonlinear second order ordinary differential equations
• Not analytically possible except for N = 2
• Must resort to numerical methods
CANDIDATE NUMERICAL SOLUTION• Set
• While
• For each body, evaluate total force
• Integrate position, velocity of each body over small time
• Set
FORCE EVALUATION STEP• For i = 0 … N – 1
• For j = I + 1 … N – 1
• Compute pairwise gravity between body and
EXAMPLE FORCE EVALUATION
TIME COMPLEXITY• Total steps =
• Force evaluations =
• Integration =
• Overall complexity =
ISSUES WITH CANDIDATE SOLUTION• Very slow due to quadratic behavior of force evaluation
• Requires evaluations for
• Constant time steps could miss near-collisions
• Force evaluation not obviously parallelizable
“GEOMETRIC TRICK” FOR PARALLELIZATION
“GEOMETRIC TRICK” FOR PARALLELIZATION
“GEOMETRIC TRICK” FOR PARALLELIZATION
TIME COMPLEXITY OF GEOMETRIC TRICK• (Triangular portion)
• (Block portion)
ISSUES WITH GEOMETRIC TRICK• Parallelizes poorly beyond P > 2 using OpenMP
• Use naïve force evaluation algorithm for P > 2:
• For i = 0 … N – 1
• For j = 0 … N – 1
• if i != j
• Add Force due to acting on onto
• Twice as much work, for scalability
NAÏVE FORCE EVALUATION SPEEDUP
1 3 5 7 9 11 13 150
100
200
300
400
500
600
700
800
900
1000
1
3
5
7
9
11
13
15
Naïve Parallelization Scaling
CPU Time
Speedup
Theoretical
Threads
CPU
Tim
e (m
illis
econ
ds)
Spee
dup
HANDLING CLOSE ENCOUNTERS• Keep track of a minimum collision timescale for each body:
• Use to vary when integrating
• Number of steps taken is no longer predictable
• Support higher order PEC-type integrators (Leapfrog, 4 th and 6th order Hermite)
• Allows dramatic increase in step size for similar error
A MORE EFFICIENT WAY OF EVALUATING FORCE• Treat clustered system as point-like body
• Force between cluster and a body is given by total mass at center of mass
EXAMPLE OF CLUSTERING BODIES
HIERARCHICAL FORCE EVALUATION • Apply the clustering principle recursively
• Consider sub-clusters within a cluster
• Refine force evaluation via sub-clusters instead of main cluster
CLUSTERING ALGORITHM• Node Cluster(bodies, min_bound, max_bound)
• If bodies.size == 0
• return null
• if bodies.size == 1
• return node containing body
• Collect bodies into spatial groups
• For each group
• Cluster(group.bodies, group.min_bound, group.max_bound)
• Compute first order multipole expansion, passing up tree
• return node containing groups
METHODS OF SPATIAL SUBDIVISION• Octree Barnes-Hut Algorithm
• K-D tree Stadel Algorithm
• Other choices possible
• Time to build tree is
• Parallelization opportunity available on recursive calls to Cluster
FORCE EVALUATION ALGORITHM• TreeWalk(body)
• For each branch in the tree
• If branch is leaf
• If branch.body != body
• Compute force of branch.body on body
• Else
• Compute distance between branch and body
• If body is well separated from cluster
• Compute force of branch on body
• Else
• branch.TreeWalk(body)
FORCE EVALUATION ALGORITHM, CONT• Call TreeWalk for each body:
• For i = 0 .. N – 1
• TreeWalk()
SEPARATION CRITERIA• Body is well separated when acceptance criteria met:
• Accept the approximation when clusters compact, body is far away
• allows tuning for performance vs. accuracy
• Alternative criteria available
TIME COMPLEXITY OF TREE WALK• For single body walking a node of size N:
• For N bodies:
OVERVIEW OF HIERARCHICAL ALGORITHM• Build tree in steps
• Walk tree in steps
• For N = 65536, brute-force pairwise requires evaluations
• Hierarchical algorithm can do same in evaluations
• Very easily parallelization
PARALLELIZING TREE METHODS
1 3 5 7 9 11 13 150
100
200
300
400
500
600
700
800
1
3
5
7
9
11
13
15
Octree Parallelization Scaling
CPU Time
Speedup
Theoretical
Threads
Tim
ing
(mill
isec
onds
)
Spee
dup
SCALING TO LARGE N, OCTREE
SCALING TO LARGE N, KD-TREE
SCALING TO LARGE N, BRUTE FORCE
BRUTE FORCE VS TREE METHODS
ENERGY ERRORS
ENERGY ERRORS CONT.
RELATED WORK• nbody1 – nbody6, Sverre Aarseth
• ACS toolkit, Jun Makino and Piet Hut
• Grav-Sim, Mark Ridler
• Gravit, Gerald Kaszuba et al.
THE PROPOSED FRAMEWORK• Component based Interactive Gravitational Simulator
• CoreIGS – Core simulation library
• CmdIGS – Command line driven interface
• VisIGS – Interactive visualizer
SOFTWARE ARCHITECTUREAccelerator
Integrator
System
Model
+Position : Vector+Velocity : Vector
PhasePoint
+Time : double+Acceleration : Vector+Jerk : Vector
WorldPoint
+Mass : double+Potential : double+Radius : double
Graviton
+CollisionTimescaleSq : double+NextTime : double+Start : WorldPoint+End : WorldPoint
Body
-Bodies : Body-N : int-Mass : double-Softening : double-Dynamics
NBodySystem
+Kinetic : double+Potential : double+CenterOfMass : Vector+CenterOfMomentum : Vector+AngularMomentum : Vector+AngularVelocity : Vector
Dynamics
1 1
1
*
-Bodies : PhasePoint-N : int-Mass : double-Dynamics : Dynamics
Snapshot1*
11
#Fill() : void
Model
SphericalModel PlummerModel
DiskModel
CompositeModel
1
1..*
+Predict() : void+Correct() : void
Integrator
EulerIntegrator
HermiteIntegrator
LeapfrogIntegrator
SymplecticEuler
EulerTrapezoidIntegrator
+Branches : Graviton+Quadrupole : Tensor
Node1
0..*
#Branch() : Node
#Root : Node-Theta : double-BranchSize : int-LeafSize : int
Tree1
0..1
#Branch() : Node
Octree
#Branch() : Node
KDTree
-System : NBodySystem-Stepper : Integrator-Accelerator : Tree-Time : double-NextTime : double-Steps : int
Simulator
PROJECT STATS• Open Source – Available at IGS.codeplex.com
• 3+ years development
• 5670 lines of code
• 65 .cpp, .h files
• 107 subversion revisions
• 8 development iterations
• Many failures before success!
EXTENSIONS• Individual time steps
• Distributed simulations using MPI
• Higher order integrators
• Command interface for the visualizer
• Other spatial subdividers (Hilbert curves, etc.)
• Multipole expansion for acceleration and higher derivatives
• Pairwise interactions between tree nodes ( evaluations!)
CONCLUSIONS• Tree methods very efficient
• Performance vs. Accuracy tradeoff possible
• Extremely parallelizable
• Accurate, real-time simulations possible on commodity hardware
ACKNOWLEDGEMENTS• Dr. Xiang Fu for advisement and helpful discussions
• Dr. Gerda Kamberova for helpful comments on the paper
• Lukasz Bator for many discussions on algorithmic efficiency
THANK YOU FOR COMING!• Any questions?