© 2011 IBM Corporation1
Guiding Combinatorial Optimizationwith UCT
Ashish Sabharwal and Horst SamulowitzIBM Watson Research Center
(presented by Raghuram Ramanujan)
MCTS Workshop at ICAPS-2011June 12, 2011
© 2011 IBM Corporation2
MCTS and Combinatorial Search Monte Carlo Tree Search (MCTS): widely used in a variety of domains in AI
Upper Confidence bounds on Trees (UCT): a form of MCTS, especially successful in two-agent game tree search, e.g., Go, Kriegspiel, Mancala, General Game Playing
Based on single-agent tree search: one multi-armed bandit at each node of a tree goal: find the most “rewarding” root-to-leaf path in the tree
Combinatorial Search
A discrete search space, e.g., {0,1}N or {R, G, B}N
A “feasible” subspace of interest: typically defined indirectly by a finite set of constraints
Goal: find a solution – an element of the discrete space that satisfies all constraints
If a utility function / objective function given: find an optimal solution
E.g., Boolean Satisfiability (SAT), Graph Coloring (COL), Constraint Satisfaction Problems (CSPs), Constraint Optimization, Integer Programming (IP)
Can MCTS/UCT inspired techniques be used to improve the performance of combinatorial search algorithms?
graph coloring
© 2011 IBM Corporation3
Mixed Integer Programming (MIP) :A Challenging but Promising Opportunity
MIP: linear inequality constraints, continuous & discrete variables
Typically with a linear (or quadratic) objective function
NP-hard; highly useful, with several academic and commercial solvers available
MIP search appears much more suitable than, e.g., SAT for applying UCT!
Opportunity for applying UCT
MIP solvers such as IBM ILOG’s CPLEX, Gurobi, etc.:
maintain a “frontier” of open nodes, exploring them with acombination of best-first search, “diving” to the bottom of the tree, etc.
rely on spending substantial effort per node, e.g., computing LP relaxation to obtain a bound on the objective value in the subtree: an estimate of the true value
In contrast, state-of-the-art SAT solvers not easily adapted to UCT:
are based on enhancements to basic depth-first search traversal
rely on processing nodes extremely fast (~ 2000-5000 per second)
Can we improve CPLEX by letting UCT decide search tree exploration order?
© 2011 IBM Corporation4
Mixed Integer Programming (MIP) :A Challenging but Promising Opportunity
Challenges and Differences from the “usual” setup for UCT
Biggest success of UCT so far: two-agent game tree search, rather than single-agent
Random playouts are costly to implement in MIP search
Unlike game tree search, too costly to create a full UCT tree at each node
Exploitation isn’t very meaningful after true value of a node is revealed:no reason to repeatedly visit that node even if it is optimal
LP relaxation – available for “free”, provides a guaranteed bound on the true value averaging backups may not be the best strategy!
Highly optimized commercial MIP solvers such as CPLEX very hard to improve upon!
Implementation: no easy access to CPLEX’s internal data structures; must maintain our own “shadow tree” for exploring UCT strategies – additional overhead
Main Finding:
Guidance near the top of the tree can improve performance across a variety of instances!
© 2011 IBM Corporation5
How does Search in CPLEX (roughly) work?
Sea
rch
Tre
e
CPLEX open nodes and corresponding quality estimate of the underlying sub-tree(e.g., LP objective value)
10x
3E 4E
iE
CPLEX explores the search tree by alternating between two operations:
I. Node Selection: Select the next open search node to continue search on: CPLEX selects node with the best estimate E
II. Branching: Select the next variable to branch on (assume binary branching)
Root-Node
10x
5y5y
6E
2z2z
7E 8E
1v1v
- Node Selection: Initially only one node that can be selected- Branching: Select variable x- Node Selection: Select node with estimate- Branching: Select variable y
1E
CPLEX closed nodes
- Node Selection: Select node with estimate 2E- Branching: Select variable z- Node Selection: Select node with estimate 5E- Branching: Select variable v
1E
0E
2E
5E
E
© 2011 IBM Corporation6
Guiding Node Selection in CPLEX with UCT
Node Selection with UCT
Idea: expand nodes in the order in which UCT would expand them
Traverse search tree from root to a current leaf node (i.e., “open” node) while at each node selecting the child that has the highest UCT score s.
UCT score s: Combines estimate of the “quality” of a node (the same CPLEX uses) with how often this node has been visited already
Goal: Balance Exploration / Exploitation in CPLEX search
Tree Update Phase
When node selection reaches a leaf node,
compute its quality estimate (e.g., objective value of LP relaxation) and propagate it upwards towards the root
branch on this node using the default variable/value selection of CPLEX
Update rule / backup operator: max of the two children (no averaging!), if maximization problem; min if minimization
Result: estimate at each node N along this leaf-to-root path equals the best value seen in the entire sub-tree under N
© 2011 IBM Corporation7
Guiding Search in CPLEX with UCT Node Selection
Node Selection is now guided by UCT scores (as illustrated below)
UCT score is based on estimate E and number of visits to a search nod
In order to employ UCT one needs to maintain a shadow tree of CPLEXs search tree
CPLEX maintains just a frontier of open nodes; the underlying search tree only exists implicitly
Sea
rch
Tre
e 10x
3E 4E
Root-Node
10x
5y5y
4E 6E
2z2z
7E 8E
1v1v
- Node Selection: Initially only one node that can be selected- Branching: Select variable x- Node Selection: Select node with highest UCT score based on and - Branching: Select variable y
1E
- Node Selection: Select node with highest UCT score based on and
2E…
1#visits 2#visits
1#visits
0#visits
3#visits
2#visits
1E 2E
0E
5E
8#visits7#visits6#visits
5#visits
4#visits
CPLEX open nodes and corresponding quality estimate of the underlying sub-tree(e.g., LP objective value)iE
CPLEX closed nodes
E
© 2011 IBM Corporation8
Guiding Search in CPLEX with UCT Tree Update Phase
After selecting a node N and branching on a variable, two child nodes N_left and N_right will be created with their corresponding estimates E_left and E_right
When propagating estimates upwards, we only consider the best estimate (e.g., no averaging)
Update using the “backup operator”
Sea
rch
Tre
e 10x
3E 4E
Root-Node
10x
5y5y 1E 2E
0E 121 ),max( EEE - Propagate to 0E
443 ),max( EEE - Propagate to 1E as long as new estimates improve current best estimate at a node on path to the root.
E.g., only if then propagate new estimate to node labeled with . However, visit counts are updated for each node on the path to root.
04 EE 0E
CPLEX open nodes and corresponding quality estimate of the underlying sub-tree(e.g., LP objective value)iE
CPLEX closed nodes
E
© 2011 IBM Corporation9
UCT Score: “Epsilon Greedy” Variant of UCB1
UCT Score computation:
N = tree node under considerationP = parent of N = a constant balancing exploration and exploitation (0.7 in
experiments) = theoretically a number decreasing inversely proportional to visits(N) ( = a constant set to 0.01 in experiments)
Fast and accurate enough for our purposes, compared to the standard UCB1 formula
© 2011 IBM Corporation10
Experimental Evaluation
Starting with 1,024 publically available MIP instances we removed:
All instances solved by default CPLEX within 10 seconds (too easy)
All instances not solved by default CPLEX within 900 seconds (too hard)
Experimental Evaluation is based on the 170 remaining instances
Spanning a variety of domains
Experimentation not limited to any particular instance family (e.g., TSP instances, set covering, etc.)
Experiments were conducted on:
Intel Xeon CPU E5410, 2.33GHz with 8 cores, and 32GB of memory
Only a single run per machine since multiple CPLEXs on one machinecan (and often do!) interfere with each other
OS: Ubuntu
© 2011 IBM Corporation11
Experimental Evaluation: Solvers
Default CPLEX
Uses various strategies, including a combination of best-first node selection and depth-first “diving” to reach a leaf node from each best node
Highly optimized; very challenging to beat by a large margin across a large variety of problem domains
CPLEX with node selection guided by UCT
Best results when guidance limited to the top 5 levels of the tree;then revert to the default node selection of CPLEX
Other standard exploration schemes
Best-first
Breadth-first
Depth-first
© 2011 IBM Corporation12
Preliminary Experimental Results
[ timeout: 600 sec ]
Promising performance:
UCT guidance results in the fewest instances timing out (8)Fastest on 39 instancesLowest average runtime (albeit only by a few seconds)
© 2011 IBM Corporation13
Preliminary Experimental Results
Pairwise performance measure (timeout: 600 sec) :how often does the row solver outperform the column solver?e.g., UCT guidance outperforms default CPLEX on 64 instances;
52 times vice versa
Promising performance:
UCT guidance outperforms default CPLEX and other natural alternatives
© 2011 IBM Corporation14
Conclusion
Explored the use of MCTS/UCT in a combinatorial search setting
Specifically, for mixed integer programming (MIP) search, with CPLEX
Typical “random playouts” very costly but LP relaxation objective value serves as a good estimate – a guaranteed one-sided bound!
Max-style update rule performs better here than the usual averaging backups
Guiding combinatorial search with UCT holds promise!
Improving performance of highly optimized MIP solvers across a variety of problem domains is a huge challenge
UCT-inspired guidance for node selection shows promise
Most benefit when UCT used only near the top of the search tree
Further exploration along these lines appears fruitful, e.g.:
using UCT for variable or value selection (rather than node selection)
building a “full” UCT tree at each search tree node before branching
Top Related