Solving Irregular Problems Through Parallel Irregular Trees
-
Upload
justin-clark -
Category
Documents
-
view
48 -
download
1
description
Transcript of Solving Irregular Problems Through Parallel Irregular Trees
Solving Irregular Problems Through Parallel Irregular Trees
Fabrizio Baiardi
Paolo Mori
Laura Ricci
Dipartimento di Informatica
Università di Pisa
Istituto di Informatica e Telematica
CNR - Pisa
PDCN 2005
Outline
• Irregular problems main features
• Hierarchical representation of the domain
• Parallel Irregular Tree library
• Experimental results
• Future works
PDCN 2005
Irregular Problems• the domain includes a set of elements characterised by
– the position in the domain– other problem specific properties
• the elements distribution is– non-homogeneous– dynamic and non-predictable
• the evolution of an element– depends upon that of other elements (locality)– updates the element properties
• Examples– Barnes Hut – Adaptive Multigrid Methods– Radiosity methods
PDCN 2005
Hierarchical Representation
• the domain is recursively partitioned into a set of spaces by applying a a problem dependent condition
• the Hierarchical Tree represents the decomposition and each Hnode represents either a space or an element
PDCN 2005
Distributed Hierarchical Tree Htree representation distributed among the p-nodes
pt = <{h0,..hn-1}, mHt>– private Htree (pHt): subtree assigned to a p-node
– mapping Htree (mHt): represents the hierarchical relations among the private Htrees ( )
h0h1
h2h3
PDCN 2005
PIT Library defines:
– PITree
– PIT operations• key point: both the sequential and the parallel
versions of the application are structured in terms of operations on Htrees
• aims– be a simple, complete and effective parallelization tool
– hide to the user the details of the parallel programming
– preserve most of the sequential code
PDCN 2005
PIT API• main operations
– PITree creation– PITree completion– PITree update
• alternative API– standard– advanced
• composition of the adopted API– standard structure– customised for the specific problem
PDCN 2005
PITree Creation
• it creates the PITree starting from the domain elements– one (or more) pHt for each p-node– one mHt replicated in each p-node
• it implements a distributed strategy to exploit memory at best
• it needs some user-defined functions to manage the elements of the target problem
PDCN 2005
PITree Completion (I)
• standard API: – fault prevention and informed fault prevention– one function only implements the strategy– invoked before each operator
PITree_completion(pht_root, stencil_0) tp_op_0(pht_root)
this comes from the sequential code
PDCN 2005
PITree Completion (II)• advanced API:
– informed fault prevention only– two distinct functions
• PITree_det_neighbours: invoked each time the neighbourhood relations among the elements changes
• PITree_exch_neighbours: invoked before each operator
PITree_det_neighbors(pht_root, stencil_0)
PITree_exch_neighbors(pht_root, stencil_0) tp_op_0(pht_root)
this comes from the sequential code
PDCN 2005
PITtree Update (I)• advanced API: two distinct functions
– PITree correction: • updates the mapping of the elements violating the mapping strategy• it is invoked after each operator that updates the distribution
tp_op_0(pht_root) PITree_correction(pht_root)
– PITree balance: • updates the mapping to redistribute the workload among the p-nodes• it is invoked after each operator that modifies the workload
tp_op_0(pht_root) PITree_balance(pht_root, Tresh)
PDCN 2005
PITtree Update (II)
• Standard API: – one function only, PITree update, implements the
PITree correction and balancing – PITree update is invoked after each operator
tp_op_0(pht_root)
PITree_update(pht_root, Tresh)
PDCN 2005
Parallelization• Standard:
– the functions of the sequential version are inserted into the standard structure
– the development is straighforward– a deep knowledge of the target problem is not required
• Customized– the PIT operations are inserted into the sequential code
according to the semantics of the target problem– a deep knowledge of the target problem is required – both the standard and the advanced API can be adopted– it achieves a better efficiency
PDCN 2005
Sequential Code
irregular_problem(tElementList *dom) {
...
root = Htree_creation(dom)
...
while (not solution_computed) {
tp_op_0(root)
…
tp_op_n(root)
}
}
problem operator: mainly consists in a visit of the Htree
PDCN 2005
Standard Structure
irregular_problem(tElementList *dom) { ... pht_root = PITree_creation(dom, dec_el, incl_el, rem_el) ... while (not solution_computed) { PITree_completion(pht_root, stencil_0) tp_op_0(pht_root) pht_root = PITree_update(pht_root, T) …. PITree_completion(pht_root, stencil_n) tp_op_n(pht_root) pht_root = PITree_update(pht_root, T) }}
PDCN 2005
Customised Structureirregular_problem(tElementList *dom) { … pht_root = PITree_creation(dom, dec_el, incl_el, rem_el) ... while (not solution computed) { PITree_det_neighbors(pht_root, stencil_0+..+stencil_i) PITree_exch_neighbors(pht_root, stencil_0) tp_op_0(pht_root) … PITree_exch_neighbors(pht_root, stencil_i) tp_op_i(pht_root) PITree_correction(pht_root) PITree_det_neighbors(pht_root, stencil_i+1+..+stencil_n) … PITree_exch_neighbors(pht_root, stencil_n) tp_op_n(pht_root) PITree_update(pht_root) }}
PDCN 2005
Validation• Applications
– Adaptive Multigrid Methods– Hierarchical Radiosity
• Parallel architectures– PC cluster
• Intel Pentium II 266MHz• 128 Mb• 100Mb Fast Ethernet
– IBM Beowulf (x330)• Intel Pentium III 1.133GHz• 1GB per p-node (2 procs) • Myricom LAN (264MB)
PDCN 2005
Adaptive Multigrid Methods• fast iterative methods to solve partial diff. equations
• discretized and multi level domain representation through a grid hierarchy
• adaptive problem: – the discretization is finer where the equation
is irregular– new grids are added during the computation
in )8(
))2(2())(2cos(10),(
[1,0][1,0]in 02
2
2
2
sinh
yxsinhyxyxu
dy
ud
dx
ud• Poisson
Problem
PDCN 2005
Sequential Codeamm(tElementList *initial_grid) {
root=Htree_creation(initial_grid)
while (not end) { smoothing(root, v, f, all_levels) for level from Lmax downto Lg { rest(root, level) restriction(root, level-1) smoothing(root, e, r, level-1) } for level frm Lg+1 to Lmax { prolongation(root, level) correction(root, e, level) smoothing(root, e, r, level) } correction(root, v, all_levels) end = norm(root) if (not end) Lmax = refinement(root)}
PDCN 2005
Parallel Code (I)
amm(tElementList *initial_grid) {
pht_root = PITree_creation(initial_grid, dec_el, incl_el, rem_el)
while (not end) { PITree_det_neighbors(pht_root, stencil_union) PITree_exch_neighbors(pht_root, smooth-rest_stencil, all_levels) smoothing(pht_root, v, f, all_levels) for level from Lmax downto Lg { PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) rest(pht_root, level) PITree_exch_neighbors(pht_root, restriction_stencil, level) restriction(pht_root, level-1) PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) smoothing(pht_root, e, r, level-1) }
PDCN 2005
Parallel code (II)
for level frm Lg+1 to Lmax { PITree_exch_neighbors(pht_root, prolongation_stencil, level) prolongation(pht_root, level) correction(pht_root, e, level) PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) smoothing(pht_root, e, r, level) } correction(pht_root, v, all_levels) PITree_exch_neighbors(pht_root, norm_stencil, level) end = norm(pht_root) if (not end) Lmax = refinement(pht_root) pht_root = PITree_update(pht_root, T) }}
PDCN 2005
Domain
Hierarchical
Decomposition
After
10 Iterations
PDCN 2005
Load Balancing
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2 5 10 20 30 50 100
Treshould (%)
Co
mp
leti
on
tim
e (
sec)
PDCN 2005
Efficiency
50
60
70
80
90
100
2 4 8 10 16 32
numbero of p-nodes
eff
icie
ncy
(%
)
IBM Beowulf PC Beowulf
PDCN 2005
Hierarchical Radiosity• a model of the light exchanges to
compute the illumination of a scene
• representation of the scene– discretized and hierarchical– adaptive
• locality: interactions among objects at distinct abstraction levels
PDCN 2005
Sequential Codehierarchical_rad(segment_list *scene) {
root = Htree_creation(scene)
visib_list_det(root)
while (not end) {
Gather_H(root)
for level from L_min to L_max
Push_H(root, level)
for level from L_max downto L_min
Pull_H(root, level)
end = RefineLink_H(root)
}
}
PDCN 2005
Parallel Code (I)
hierarchical_rad(segment_list *scene) {
pht_root = PITree_creation(scene, dec_el, incl_el, rem_el)
PITree_exch_neighbors(pht_root, vis_stencil, all_levels)
visib_list_det(pht_root)
while (not end) {
PITree_exch_neighbors(pht_root, int_list, all_levels)
Gather_H(pht_root)
for level from L_min to L_max {
PITree_exch_neighbors(pht_root, push_stencil, level)
Push_H(pht_root, level)
}
PDCN 2005
Parallel Code (II)
for level from L_max downto L_min {
PITree_exch_neighbors(pht_root, pull_stencil, level)
Pull_H(pht_root, level)
}
end = RefineLink_H(pht_root)
pht_root = PITree_balance(pht_root)
}
}
PDCN 2005
Test
Scene
• 192 polygons
• 896 segments
PDCN 2005
Efficiency
50
60
70
80
90
100
2 4 8 10 16 32
number of p-nodes
eff
icie
ncy
(%
)
IBM Beowulf PC Beowulf
PDCN 2005
Future Works
• the definition of the set of problems that cannot be solved adopting our methodology
• the definition of programming constructs for the considered class of problems