Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer...

65
Learning to Branch Ellen Vitercik Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm Published in ICML 2018 1

Transcript of Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer...

Page 1: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Learning to BranchEllen Vitercik

Joint work with Nina Balcan, Travis Dick, and Tuomas Sandholm

Published in ICML 2018

1

Page 2: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Integer Programs (IPs)

a

maximize 𝒄 βˆ™ 𝒙subject to 𝐴𝒙 ≀ 𝒃

𝒙 ∈ {0,1}𝑛

2

Page 3: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Facility location problems can be formulated as IPs.

3

Page 4: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Clustering problems can be formulated as IPs.

4

Page 5: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Binary classification problems can be formulated as IPs.

5

Page 6: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Integer Programs (IPs)

a

maximize 𝒄 βˆ™ 𝒙subject to 𝐴𝒙 = 𝒃

𝒙 ∈ {0,1}𝑛

NP-hard

6

Page 7: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Branch and Bound (B&B)

β€’ Most widely-used algorithm for IP-solving (CPLEX, Gurobi)

β€’ Recursively partitions search space to find an optimal solutionβ€’ Organizes partition as a tree

β€’ Many parametersβ€’ CPLEX has a 221-page manual describing 135 parameters

β€œYou may need to experiment.”

7

Page 8: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Why is tuning B&B parameters important?

β€’ Save timeβ€’ Solve more problems

β€’ Find better solutions

8

Page 9: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B in the real world

Delivery company routes trucks dailyUse integer programming to select routes

Demand changes every daySolve hundreds of similar optimizations

Using this set of typical problems…

can we learn best parameters?

9

Page 10: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

Model

𝐴 1 , 𝒃 1 , 𝒄 1 , … , 𝐴 π‘š , 𝒃 π‘š , 𝒄 π‘š

How to use samples to find best B&B parameters for my domain?

10

Page 11: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Model

Model has been studied in applied communities [Hutter et al. β€˜09]

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

𝐴 1 , 𝒃 1 , 𝒄 1 , … , 𝐴 π‘š , 𝒃 π‘š , 𝒄 π‘š

11

Page 12: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Model

Model has been studied from a theoretical perspective

[Gupta and Roughgarden β€˜16, Balcan et al., β€˜17]

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

𝐴 1 , 𝒃 1 , 𝒄 1 , … , 𝐴 π‘š , 𝒃 π‘š , 𝒄 π‘š

12

Page 13: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Model

1. Fix a set of B&B parameters to optimize

2. Receive sample problems from unknown distribution

3. Find parameters with the best performance on the samples

β€œBest” could mean smallest search tree, for example

𝐴 1 , 𝒃 1 , 𝒄 1 𝐴 2 , 𝒃 2 , 𝒄 2

13

Page 14: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Questions to address

How to find parameters that are best on average over samples?

Will those parameters have high performance in expectation?

𝐴, 𝒃, 𝒄

?

𝐴 1 , 𝒃 1 , 𝒄 1 𝐴 2 , 𝒃 2 , 𝒄 2

14

Page 15: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

15

Page 16: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

16

Page 17: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

17

Page 18: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

18

Page 19: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

π‘₯1 = 0 π‘₯1 = 1

19

Page 20: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

π‘₯1 = 0 π‘₯1 = 1

20

Page 21: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

π‘₯1 = 0 π‘₯1 = 1

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

π‘₯2 = 0 π‘₯2 = 1

21

Page 22: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

π‘₯1 = 0 π‘₯1 = 1

π‘₯2 = 0 π‘₯2 = 1

22

Page 23: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

π‘₯1 = 0 π‘₯1 = 1

π‘₯6 = 0 π‘₯6 = 1 π‘₯2 = 0 π‘₯2 = 1

23

Page 24: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0, 1,1

3, 1, 0, 0, 1

133.3

π‘₯1 = 0 π‘₯1 = 1

π‘₯6 = 0 π‘₯6 = 1 π‘₯2 = 0 π‘₯2 = 1

0,3

5, 0, 0, 0, 1, 1

116

24

Page 25: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

0, 1, 0, 1, 1, 0, 1 0,4

5, 1, 0, 0, 0, 1

118

π‘₯1 = 0 π‘₯1 = 1

π‘₯6 = 0 π‘₯6 = 1 π‘₯2 = 0 π‘₯2 = 1

π‘₯3 = 0 π‘₯3 = 1

133

25

Page 26: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution is

integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isn’t better than best-known integral solution

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

π‘₯1 = 0 π‘₯1 = 1

π‘₯6 = 0 π‘₯6 = 1 π‘₯2 = 0 π‘₯2 = 1

π‘₯3 = 0 π‘₯3 = 1

0,4

5, 1, 0, 0, 0, 1

118

0, 1, 0, 1, 1, 0, 1

133

26

Page 27: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution

is integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isn’t better than best-known integral solution

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

π‘₯1 = 0 π‘₯1 = 1

π‘₯6 = 0 π‘₯6 = 1 π‘₯2 = 0 π‘₯2 = 1

π‘₯3 = 0 π‘₯3 = 1

0,4

5, 1, 0, 0, 0, 1

118

0, 1, 0, 1, 1, 0, 1

133Integral

27

Page 28: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution is

integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isn’t better than best-known integral solution

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙

s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

0,3

5, 0, 0, 0, 1, 1

116

0, 1,1

3, 1, 0, 0, 1

133.3

π‘₯1 = 0 π‘₯1 = 1

π‘₯6 = 0 π‘₯6 = 1 π‘₯2 = 0 π‘₯2 = 1

π‘₯3 = 0 π‘₯3 = 1

0,4

5, 1, 0, 0, 0, 1

118

0, 1, 0, 1, 1, 0, 1

133

28

Page 29: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

B&B

1. Choose leaf of tree

2. Branch on a variable

3. Fathom leaf if:i. LP relaxation solution is

integral

ii. LP relaxation is infeasible

iii. LP relaxation solution isn’t better than best-known integral solution

This talk: How to choose which variable?(Assume every other aspect of B&B is fixed.)

29

Page 30: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Variable selection policies can have a huge effect on tree size

30

Page 31: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bounda. Algorithm Overview

b. Variable Selection Policies

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

31

Page 32: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Variable selection policies (VSPs)

Score-based VSP:

At leaf 𝑸, branch on variable π’™π’Š maximizing𝐬𝐜𝐨𝐫𝐞 𝑸, π’Š

Many options! Little known about which to use when

1,3

5, 0, 0, 0, 0, 1

136

1, 0, 0, 1, 0,1

2, 1

120

1, 1, 0, 0, 0, 0,1

3

120

π‘₯2 = 0 π‘₯2 = 1

32

Page 33: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Variable selection policies

For an IP instance 𝑄:

β€’ Let 𝑐𝑄 be the objective value of its LP relaxation

β€’ Let π‘„π‘–βˆ’ be 𝑄 with π‘₯𝑖 set to 0, and let 𝑄𝑖

+ be 𝑄 with π‘₯𝑖 set to 1

Example.

1

2, 1, 0, 0, 0, 0, 1

140

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7 𝑐𝑄𝑄

33

Page 34: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Variable selection policies

For an IP instance 𝑄:

β€’ Let 𝑐𝑄 be the objective value of its LP relaxation

β€’ Let π‘„π‘–βˆ’ be 𝑄 with π‘₯𝑖 set to 0, and let 𝑄𝑖

+ be 𝑄 with π‘₯𝑖 set to 1

Example.

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

π‘₯1 = 0 π‘₯1 = 1

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

𝑐𝑄1βˆ’ 𝑐𝑄1+

𝑐𝑄𝑄

34

Page 35: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

For a IP instance 𝑄:

β€’ Let 𝑐𝑄 be the objective value of its LP relaxation

β€’ Let π‘„π‘–βˆ’ be 𝑄 with π‘₯𝑖 set to 0, and let 𝑄𝑖

+ be 𝑄 with π‘₯𝑖 set to 1

Example.

Variable selection policies

1

2, 1, 0, 0, 0, 0, 1

140

1,3

5, 0, 0, 0, 0, 1

136

0, 1, 0, 1, 0,1

4, 1

135

π‘₯1 = 0 π‘₯1 = 1

𝑐𝑄1βˆ’ 𝑐𝑄1+

𝑐𝑄𝑄

max (40, 60, 10, 10, 3, 20, 60) βˆ™ 𝒙s.t. 40, 50, 30, 10, 10, 40, 30 βˆ™ 𝒙 ≀ 100

𝒙 ∈ {0,1}7

The linear rule (parameterized by 𝝁) [Linderoth & Savelsbergh, 1999]

Branch on variable π‘₯𝑖 maximizing:

score 𝑄, 𝑖 = πœ‡min 𝑐𝑄 βˆ’ π‘π‘„π‘–βˆ’ , 𝑐𝑄 βˆ’ 𝑐𝑄𝑖

+ + (1 βˆ’ πœ‡)max 𝑐𝑄 βˆ’ π‘π‘„π‘–βˆ’ , 𝑐𝑄 βˆ’ 𝑐𝑄𝑖

+

35

Page 36: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Variable selection policies

And many more…

The (simplified) product rule [Achterberg, 2009]

Branch on variable π‘₯𝑖 maximizing:

score 𝑄, 𝑖 = 𝑐𝑄 βˆ’ π‘π‘„π‘–βˆ’ βˆ™ 𝑐𝑄 βˆ’ 𝑐𝑄𝑖

+

The linear rule (parameterized by 𝝁) [Linderoth & Savelsbergh, 1999]

Branch on variable π‘₯𝑖 maximizing:

score 𝑄, 𝑖 = πœ‡min 𝑐𝑄 βˆ’ π‘π‘„π‘–βˆ’ , 𝑐𝑄 βˆ’ 𝑐𝑄𝑖

+ + (1 βˆ’ πœ‡)max 𝑐𝑄 βˆ’ π‘π‘„π‘–βˆ’ , 𝑐𝑄 βˆ’ 𝑐𝑄𝑖

+

36

Page 37: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Variable selection policies

Given 𝑑 scoring rules score1, … , scored.

Goal: Learn best convex combination πœ‡1score1 +β‹―+ πœ‡π‘‘scored.

Branch on variable π‘₯𝑖 maximizing:

score 𝑄, 𝑖 = πœ‡1score1 𝑄, 𝑖 + β‹―+ πœ‡π‘‘scored 𝑄, 𝑖

Our parameterized rule

37

Page 38: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

Model

𝐴 1 , 𝒃 1 , 𝒄 1 , … , 𝐴 π‘š , 𝒃 π‘š , 𝒄 π‘š

How to use samples to find best B&B parameters for my domain?

38

Page 39: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Application-

Specific

DistributionAlgorithm

Designer

B&B parameters

Model

𝐴 1 , 𝒃 1 , 𝒄 1 , … , 𝐴 π‘š , 𝒃 π‘š , 𝒄 π‘š

πœ‡1, … , πœ‡π‘‘

How to use samples to find best B&B parameters for my domain?πœ‡1, … , πœ‡π‘‘

39

Page 40: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithmsa. First-try: Discretization

b. Our Approach

4. Experiments

5. Conclusion and Future Directions

40

Page 41: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

First try: Discretization

1. Discretize parameter space

2. Receive sample problems from unknown distribution

3. Find params in discretization with best average performance

πœ‡

Average tree size

41

Page 42: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

First try: Discretization

This has been prior work’s approach [e.g., Achterberg (2009)].

πœ‡

Average tree size

42

Page 43: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Discretization gone wrong

πœ‡

Average tree size

43

Page 44: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Discretization gone wrong

πœ‡

Average tree size

This can

actually

happen!

44

Page 45: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Discretization gone wrong

Theorem [informal]. For any discretization:

Exists problem instance distribution π’Ÿ inducing this behavior

Proof ideas:

π’Ÿβ€™s support consists of infeasible IPs with β€œeasy out” variablesB&B takes exponential time unless branches on β€œeasy out” variables

B&B only finds β€œeasy outs” if uses parameters from specific range

Expected tree size

πœ‡

45

Page 46: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithmsa. First-try: Discretization

b. Our Approachi. Single-parameter settings

ii. Multi-parameter settings

4. Experiments

5. Conclusion and Future Directions

46

Page 47: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Simple assumption

Exists πœ… upper bounding the size of largest tree willing to build

Common assumption, e.g.:

β€’ Hutter, Hoos, Leyton-Brown, StΓΌtzle, JAIR’09

β€’ Kleinberg, Leyton-Brown, Lucier, IJCAI’17

47

Page 48: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

πœ‡ ∈ [0,1]

Lemma: For any two scoring rules and any IP 𝑄,

𝑂 (# variables)πœ…+2 intervals partition [0,1] such that:

For any interval [π‘Ž, 𝑏], B&B builds same tree across all πœ‡ ∈ π‘Ž, 𝑏

Much smaller in our experiments!

Useful lemma

48

Page 49: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Useful lemma

branch

on π‘₯2

branch

on π‘₯3

πœ‡

πœ‡ βˆ™ score1 𝑄, 1 + (1 βˆ’ πœ‡) βˆ™ score2 𝑄, 1

πœ‡ βˆ™ score1 𝑄, 2 + (1 βˆ’ πœ‡) βˆ™ score2 𝑄, 2

πœ‡ βˆ™ score1 𝑄, 3 + (1 βˆ’ πœ‡) βˆ™ score2 𝑄, 3

branch

on π‘₯1

Lemma: For any two scoring rules and any IP 𝑄,

𝑂 (# variables)πœ…+2 intervals partition [0,1] such that:

For any interval [π‘Ž, 𝑏], B&B builds same tree across all πœ‡ ∈ π‘Ž, 𝑏

49

Page 50: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Useful lemma

𝑄

𝑄2βˆ’ 𝑄2

+

Any πœ‡ in yellow interval:

π‘₯2 = 0 π‘₯2 = 1

branch

on π‘₯2

branch

on π‘₯3

πœ‡

branch

on π‘₯1

Lemma: For any two scoring rules and any IP 𝑄,

𝑂 (# variables)πœ…+2 intervals partition [0,1] such that:

For any interval [π‘Ž, 𝑏], B&B builds same tree across all πœ‡ ∈ π‘Ž, 𝑏

𝑄2βˆ’

50

Page 51: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Useful lemma

Lemma: For any two scoring rules and any IP 𝑄,

𝑂 (# variables)πœ…+2 intervals partition [0,1] such that:

For any interval [π‘Ž, 𝑏], B&B builds same tree across all πœ‡ ∈ π‘Ž, 𝑏

πœ‡

πœ‡ βˆ™ score1 𝑄2βˆ’, 1 + (1 βˆ’ πœ‡) βˆ™ score2 𝑄2

βˆ’, 1

πœ‡ βˆ™ score1 𝑄2βˆ’, 3 + (1 βˆ’ πœ‡) βˆ™ score2 𝑄2

βˆ’, 3

𝑄

𝑄2βˆ’ 𝑄2

+

Any πœ‡ in yellow interval:

π‘₯2 = 0 π‘₯2 = 1

branch on

π‘₯2 then π‘₯3

branch on

π‘₯2 then π‘₯1

51

Page 52: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Useful lemma

Lemma: For any two scoring rules and any IP 𝑄,

𝑂 (# variables)πœ…+2 intervals partition [0,1] such that:

For any interval [π‘Ž, 𝑏], B&B builds same tree across all πœ‡ ∈ π‘Ž, 𝑏

πœ‡

Any πœ‡ in blue-yellow interval:

branch on

π‘₯2 then π‘₯3

branch on

π‘₯2 then π‘₯1

𝑄

𝑄2βˆ’ 𝑄2

+

π‘₯3 = 0 π‘₯3 = 1

52

π‘₯2 = 0 π‘₯2 = 1

Page 53: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Useful lemma

Proof idea.

β€’ Continue dividing [0,1] into intervals s.t.:

In each interval, var. selection order fixed

β€’ Can subdivide only finite number of times

β€’ Proof follows by induction on tree depth

Lemma: For any two scoring rules and any IP 𝑄,

𝑂 (# variables)πœ…+2 intervals partition [0,1] such that:

For any interval [π‘Ž, 𝑏], B&B builds same tree across all πœ‡ ∈ π‘Ž, 𝑏

πœ‡

branch on

π‘₯2 then π‘₯3

branch on

π‘₯2 then π‘₯1

53

Page 54: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Learning algorithm

Input: Set of IPs sampled from a distribution π’Ÿ

For each IP, set πœ‡ = 0. While πœ‡ < 1:1. Run B&B using πœ‡ βˆ™ score1 + (1 βˆ’ πœ‡) βˆ™ score2, resulting in tree 𝒯

2. Find interval πœ‡, πœ‡β€² where if B&B is run using the scoring rule πœ‡β€²β€² βˆ™ score1 + 1 βˆ’ πœ‡β€²β€² βˆ™ score2

for any πœ‡β€²β€² ∈ πœ‡, πœ‡β€² , B&B will build tree 𝒯 (takes a little bookkeeping)

3. Set πœ‡ = πœ‡β€²

Return: Any ΰ·œπœ‡ from the interval minimizing average tree size

πœ‡ ∈ [0,1]

54

Page 55: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Learning algorithm guarantees

Let ΖΈπœ‡ be algorithm’s output given ΰ·¨π‘‚πœ…3

πœ€2ln(#variables) samples.

W.h.p., 𝔼𝑄~π’Ÿ[tree-size(𝑄, ΖΈπœ‡)] βˆ’ minπœ‡βˆˆ 0,1

𝔼𝑄~π’Ÿ[treeβˆ’size(𝑄, πœ‡)] < πœ€

Proof intuition: Bound algorithm class’s intrinsic complexity (IC)β€’ Lemma bounds the number of β€œtruly different” parameters

β€’ Parameters that are β€œthe same” come from a simple set

Learning theory allows us to translate IC to sample complexity

πœ‡ ∈ [0,1]

55

Page 56: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithmsa. First-try: Discretization

b. Our Approachi. Single-parameter settings

ii. Multi-parameter settings

4. Experiments

5. Conclusion and Future Directions

56

Page 57: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Useful lemma: higher dimensions

Lemma: For any 𝑑 scoring rules and any IP,

a set β„‹ of 𝑂 (# variables)πœ…+2 hyperplanes partitions 0,1 𝑑 s.t.:

For any connected component 𝑅 of 0,1 𝑑 βˆ–β„‹,

B&B builds the same tree across all 𝝁 ∈ 𝑅

57

Page 58: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Learning-theoretic guarantees

Fix 𝑑 scoring rules and draw samples 𝑄1, … , 𝑄𝑁~π’Ÿ

If 𝑁 = ΰ·¨π‘‚πœ…3

πœ€2ln(𝑑 βˆ™ #variables) , then w.h.p., for all 𝝁 ∈ [0,1]𝑑,

1

𝑁

𝑖=1

𝑁

treeβˆ’size(𝑄𝑖 , 𝝁) βˆ’ 𝔼𝑄~π’Ÿ[treeβˆ’size(𝑄, 𝝁)] < πœ€

Average tree size generalizes to expected tree size

58

Page 59: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

59

Page 60: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Experiments: Tuning the linear rule

Let: score1 𝑄, 𝑖 = min 𝑐𝑄 βˆ’ π‘π‘„π‘–βˆ’ , 𝑐𝑄 βˆ’ 𝑐𝑄𝑖

+

score2 𝑄, 𝑖 = max 𝑐𝑄 βˆ’ π‘π‘„π‘–βˆ’ , 𝑐𝑄 βˆ’ 𝑐𝑄𝑖

+

Our parameterized rule

Branch on variable π‘₯𝑖 maximizing:

score 𝑄, 𝑖 = πœ‡ βˆ™ score1 𝑄, 𝑖 + (1 βˆ’ πœ‡) βˆ™ score2 𝑄, 𝑖

This is the linear rule [Linderoth & Savelsbergh, 1999]

60

Page 61: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Experiments: Combinatorial auctions

Leyton-Brown, Pearson, and Shoham. Towards a universal test suite for combinatorial auction algorithms. In Proceedings of the Conference on Electronic Commerce (EC), 2000.

β€œRegions” generator:

400 bids, 200 goods, 100 instances

β€œArbitrary” generator:

200 bids, 100 goods, 100 instances

61

Page 62: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Additional experiments

Facility location:

70 facilities, 70 customers,

500 instances

Clustering:

5 clusters, 35 nodes,

500 instances

Agnostically learning

linear separators:

50 points in ℝ2,

500 instances

62

Page 63: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Outline

1. Introduction

2. Branch-and-Bound

3. Learning algorithms

4. Experiments

5. Conclusion and Future Directions

63

Page 64: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Conclusion

β€’ Study B&B, a widely-used algorithm for combinatorial problems

β€’ Show how to use ML to weight variable selection rulesβ€’ First sample complexity bounds for tree search algorithm configuration

β€’ Unlike prior work [Khalil et al. β€˜16; Alvarez et al. β€˜17], which is purely empirical

β€’ Empirically show our approach can dramatically shrink tree sizeβ€’ We prove this improvement can even be exponential

β€’ Theory applies to other tree search algos, e.g., for solving CSPs

64

Page 65: Learning to Branch - GitHub PagesBinary classification problems can be formulated as IPs. 5. Integer Programs (IPs) a maximize βˆ™π’™ subject to 𝐴𝒙= π’™βˆˆ{0,1} . NP-hard.

Future directions

How can we train faster?β€’ Don’t want to build every tree B&B will make for every training instance β€’ Train on small IPs and then apply the learned policies on large IPs?

Other tree-building applications can we apply our techniques to?β€’ E.g., building decision trees and taxonomies

How can we attack other learning problems in B&B? β€’ E.g., node-selection policies

65

Thank you! Questions?