A Parallel Architecture for the Generalized Traveling Salesman Problem
description
Transcript of A Parallel Architecture for the Generalized Traveling Salesman Problem
1
A Parallel Architecture for the Generalized Traveling Salesman Problem
Max ScharrenbroichAMSC 664 Final Presentation 05/12/2009Advisor: Dr. Bruce L. GoldenR. H. Smith School of Business
2
Presentation Overview
• Background and Review– GTSP Review– Review of Project Objectives
• Review of mrOX GA• Parallelism in the mrOX GA• Parallel Architecture• Final Results• Status Summary
3
Review of the GTSP
• The Generalized Traveling Salesman Problem (GTSP):– Variation of the well-known traveling salesman
problem.– A set of nodes is partitioned into a number of clusters.– Objective: Find a minimum-cost tour visiting exactly
one node in each cluster.– Example on the following slides…
4
GTSP Example
• Given a set of locations (nodes).
5
GTSP Example (continued)
• Partition the nodes into clusters.
1
2
3
4
5
6
6
GTSP Example (continued)
• Find the minimum tour visiting each cluster.
1
2
3
4
5
6
Algorithms >
7
Algorithms for the GTSP
• Like the TSP, the GTSP is NP-hard.• There exist exact algorithms that rely on smart
enumeration techniques:– Brand-and-cut (B&C) algorithm (M. Fischetti, 1997)– Provided exact solutions to reasonably sized GTSP
problems (48 ≤ n ≤ 442 and 10 ≤ m ≤ 89 ).– For problems with larger than 90 clusters the run
time of the B&C algorithm began approaching one day.
Heuristic Algorithms >
8
Algorithms for the GTSP (continued)
• Heuristic approaches to the GTSP:– Generalized Nearest Neighbor Heuristic (C.E.
Noon, 1988)– Random-key Genetic Algorithm (L. Snyder and M.
Daskin, 2006)– mrOX Genetic Algorithm (J. Silberholz and B.L.
Golden, 2007)*
Project Objectives >
9
Review of Project Objectives
• Develop a generic software architecture and framework for parallelizing serial heuristics for combinatorial optimization (in a minimally invasive way).
• Extend the framework to host the serial mrOX GA and the GTSP problem class.
• Investigate the performance of the parallel implementation of the mrOX GA on large instances of the GTSP (> 90 clusters).
Overview >
10
Presentation Overview
• Background and Review• Review of mrOX GA• Parallelism in the mrOX GA• Parallel Architecture• Final Results• Status Summary
11
Review of mrOX GA
Pop 1
Merge
…
Start
End
Pop 2
Pop 7
Post-Merge
Initialization Phase (Light-Weight):• Sequentially generates and evolves 7 isolated populations of 50
individuals each.• Only use the rOX for crossover.• Apply one cycle of 2-opt followed by 1-swap improvement heuristic
only to new best solution.• 5% chance of mutation.• Terminate each population after no new best solution is found for 10
consecutive generations.
Post-Merge Phase:• Use the full mrOX for crossover.• Apply full cycles of 2-opt followed by 1-swap improvements until no improvements
are found to:1. Child solutions that are better than both parents.2. Random 5% of population (preserve diversity).
• 5% chance of mutation.• Terminate after no new best solution is found for 150 consecutive generations.
Select the best 50 individuals from the union of the isolated populations to continue on.
Overview >
12
Presentation Overview
• Background and Review• Review of mrOX GA• Parallelism in the mrOX GA– Concurrent Exploration– Cellular GA Inspired Parallel Cooperation
• Parallel Architecture• Final Results• Status Summary
13
Type 3 Parallelism in the mrOX GA• Genetic algorithms are amenable to parallelism via concurrent
exploration.• Cooperation between processes (migration) can be implemented
to ensure diversity while maintaining intensification.
P1 P2 Pn…
start
end
P1 P2 Pn…
P1 P2 Pn
start
end
P1 P2 Pn
P1 P2 Pn
MigrationPeriod
Parallel Cooperation w/ Mesh >
14
Parallel Cooperation with Mesh Topology
• Inspired by cellular genetic algorithms (cGAs), where individuals in a population only interact with nearest neighbors.
• Processes cooperate over a toroidal mesh topology.• Ensures diversity while maintaining intensification.
Each process has four neighbors.
Processes periodically exchange the best solutions with neighbors.
High-quality solutions diffuse through the population.
Overview >
15
Presentation Overview
• Background and Review• Review of mrOX GA• Parallelism in the mrOX GA• Parallel Architecture– Overview of Parallel Architecture– Parallel mrOX GA
• Final Results• Status Summary
16
Overview of Parallel Architecture• Processes are arranged as nodes in a grid topology.• Each node runs iterations of a serial stochastic search algorithm
(e.g. mrOX GA).• A node updates its state after each iteration.• The parallel architecture uses the MPI Cartesian coordinate
communication topology to send state updates from a node to its nearest neighbors.
• All communications are handled off of the main processing thread to prevent blocking.
• All state updates are handled periodically (e.g. 100 updates/sec).• Updates are only sent to neighbors if the node’s state has changed
(limits unnecessary I/O).
Parallel mrOX GA. >
17
Parallel mrOX GA
Overview >
Pop 1
Merge
…
Start
End
Pop 7
Post-Merge
• The initialization phase is the same as in the serial case.• There is no cooperation among processes in this phase.
• After each iteration of the mrOX GA, a node updates its state with the best solution found thus far.
• Migrate: After K generations with no improvement a node incorporates its neighbor’s solutions if they are better than the worst in the population.
• An MxN grid of processes is started with mpirun.• Random seeds are broadcast to each process.
• The parallel mrOX GA terminates when all processes have had no improvements after 150 generations.
18
Presentation Overview
• Background and Review• Review of mrOX GA• Parallelism in the mrOX GA• Parallel Architecture• Final Results– Performance of parallel mrOX GA
• Status Summary
19
Migration Parameter in Parallel mrOX GA• Vary the migration parameter for three problem instances.
Objective Value Performance >
0 20 40 60 80 100 1200.000.050.100.150.200.250.300.350.400.450.50 Solution Quality vs. Migration Parameter: 5x5 Grid, Population Size = 50
132d657157rat783207si1032
Migration Parameter
% F
rom
Opti
mal
0 20 40 60 80 100 1200.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00Solution Time vs. Migration Parameter: 5x5 Grid, Population Size = 50
132d657157rat783207si1032
Migration Paramter
Tim
e (s
ec)
Results averaged over 20 runs.Quad Core AMD Opteron (8360 SE), 2511 MHz, 512K Cache.
Results averaged over 20 runs.Quad Core AMD Opteron (8360 SE), 2511 MHz, 512K Cache.
20
• Objective value performance of cooperative vs. non-cooperative parallel mrOX GA on five problem instances.
Speedup Performance >
Performance of Parallel mrOX GA
132d657 157rat783 207si1032 217vm1084 421d21030.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
Cooperative vs. Non-Cooperative Parallel Implementation: # Processors = 32, Population Size = 50, Migrate = 10
No Co-op
4x8 Grid, Migrate = 10
Problem Instance
% F
rom
Opti
mal
Results averaged over 20 runs.Quad Core AMD Opteron (8360 SE), 2511 MHz, 512K Cache.
21
Performance of Parallel mrOX GA• Speedup performance of cooperative vs. non-cooperative parallel mrOX GA on five
problem.
Vary Grid Size >
132d657 157rat783 207si1032 217vm1084 421d21030.90
1.10
1.30
1.50
1.70
1.90
2.10
Cooperative vs. Non-Cooperative Parallel Implementation: # Processors = 32, Population Size = 50, Migrate = 10
No Co-op
4x8 Grid, Migrate = 10
Problem Instance
Spee
dup
Results averaged over 20 runs.Quad Core AMD Opteron (8360 SE), 2511 MHz, 512K Cache.
22
Performance of Parallel mrOX GA• Performance of parallel mrOX GA using different grid configurations.
Speedup vs. Solution Quality >
1x32 2x16 4x8 No Co-op0.000.020.040.060.080.100.120.140.16
Grid Shape vs. Solution Quality: Population Size = 50, Migrate = 10207si1032217vm1084
Grid Shape
% Fr
om O
ptim
al
Results averaged over 20 runs.Quad Core AMD Opteron (8360 SE), 2511 MHz, 512K Cache.
1x32 2x16 4x8 No Co-op1.00
1.20
1.40
1.60
1.80
2.00
2.20Grid Shape vs. Speedup: Population Size = 50, Migrate = 10
207si1032217vm1084
Grid Shape
Spee
dup
Results averaged over 20 runs.Quad Core AMD Opteron (8360 SE), 2511 MHz, 512K Cache.
23
Performance of Parallel mrOX GA
Vary # Processors >
• Tradeoff between speedup and solution quality of parallel mrOX GA by varying grid shape.
1.00 1.20 1.40 1.60 1.80 2.00 2.200.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
Speedup vs. Solution Quality: # Processors = 32, Population Size = 50, Migrate = 10
207si1032
217vm1084
Speedup
% F
rom
Opti
mal
Results averaged over 20 runs.Quad Core AMD Opteron (8360 SE), 2511 MHz, 512K Cache.
24
Performance of Parallel mrOX GA
Vary # Processors >
• Percentage of runs where optimum was found.
132d657 157rat783 207si1032 217vm1084 421d21030
10
20
30
40
50
60
70
80
90
% of Runs where Optimum was Found for Different Grid Shapes
no co-op
1x32
2x16
4x8
Problem Instance
% o
f Run
s Opti
mum
was
Foun
d
Percentages taken from 20 runs.
25
Performance of Parallel mrOX GA
Conclusions >
• Number of processors vs. solution quality of parallel mrOX GA (all grids are square except for 32 processor case).
• Single processor case is equivalent to the serial version.
0 5 10 15 20 25 30 350.00
0.50
1.00
1.50
2.00
2.50
# Processors vs. Solution Quality132d657157rat783207si1032
# Processors(all grids are square except for 32 processor case)
% F
rom
Opti
mal
Results averaged over 20 runs.Quad Core AMD Opteron (8360 SE), 2511 MHz, 512K Cache.
26
Conclusions
• Cooperation among processes improves both the solution quality and convergence time of the parallel implementation.
• Changing the grid shape affects the diffusion of solutions through the process population:– Thinner grids lead to slower convergence and improved
solution quality.– Square grids give the fastest convergence but at the cost of
solution quality.
Status Summary >
27
Status Summary• October 16-30: Start design of the parallel architecture.• November: Finish design and start coding and testing of the parallel
architecture.• December and January: Continue coding parallel architecture and
extend the framework for the mrOX GA algorithm and the GTSP problem class.
• February 1-15: Begin test and validation on multi-core PC.• February 16-March 1: Move testing to Deepthought cluster*.• March: Perform final testing on full data sets and collect results.• April-May: Generate parallel architecture API documentation**, write
final report.* Was not pursued because of success on 32-core genome5.** Future work.
References >
28
References• Crainic, T.G. and Toulouse, M. Parallel Strategies for Meta-Heuristics. Fleet Management and
Logistics, 205-251. • L. Davis. Applying Adaptive Algorithms to Epistatic Domains. Proceeding of the International
Joint Conference on Artificial Intelligence, 162-164, 1985.• M. Fischetti, J.J. Salazar-Gonzalez, P. Toth. A branch-and-cut algorithm for the symmetric
generalized traveling salesman problem. Operations Research 45 (3): 378–394, 1997.• G. Laporte, A. Asef-Vaziri, C. Sriskandarajah. Some Applications of the Generalized Traveling
Salesman Problem. Journal of the Operational Research Society 47: 1461-1467, 1996.• C.E. Noon. The generalized traveling salesman problem. Ph. D. Dissertation, University of
Michigan, 1988.• C.E. Noon. A Lagrangian based approach for the asymmetric generalized traveling salesman
problem. Operations Research 39 (4): 623-632, 1990.• J.P. Saksena. Mathematical model of scheduling clients through welfare agencies. CORS
Journal 8: 185-200, 1970.• J. Silberholz and B.L. Golden. The Generalized Traveling Salesman Problem: A New Genetic
Algorithm Approach. Operations Research/Computer Science Interfaces Series 37: 165-181, 2007.
• L. Snyder and M. Daskin. A random-key genetic algorithm for the generalized traveling salesman problem. European Journal of Operational Research 17 (1): 38-53, 2006.
29
Acknowledgements
Chris Groer, University of MarylandWilliam Mennell, University of MarylandJohn Silberholz, University of MarylandAleksey Zimin, University of Maryland