OPTIMIZATION METHODS IN INTENSITY MODULATED...

OPTIMIZATION METHODS IN INTENSITY MODULATED RADIATION THERAPYTREATMENT PLANNING

By

DIONNE M. ALEMAN

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOLOF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT

OF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2007

1

c© 2007 Dionne M. Aleman

2

To my ever-patient wife Nancy, and to my father Roberto, who, if not for the

shortcomings of current cancer treatments, might still be with us today

3

ACKNOWLEDGMENTS

Many thanks to Nancy Huang, Christopher Fox and Bart Lynch for so helpfully and

happily explaining the physics of medical physics to me on a wide range of topics, even

when those topics are not relevant to my own research.

This work was supported in part by the NSF Alliances for Graduate Education and

the Professoriate, the NSF Graduate Research Fellowship and NSF grant DMI-0457394.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

CHAPTER

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.1 Intensity Modulated Radiation Therapy Treatment Planning . . . . . . . . 121.2 Dissertation Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2.1 Fluence Map Optimization . . . . . . . . . . . . . . . . . . . . . . . 131.2.2 Beam Orientation Optimization . . . . . . . . . . . . . . . . . . . . 141.2.3 Fractionation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.2.4 Modeling the Dose Deposition of a Beam . . . . . . . . . . . . . . . 15

1.3 Contribution Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.3.1 Fluence map optimization . . . . . . . . . . . . . . . . . . . . . . . 161.3.2 Beam Orientation Optimization . . . . . . . . . . . . . . . . . . . . 171.3.3 Fractionation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.3.4 Modeling the Dose Deposition of a Beam . . . . . . . . . . . . . . . 19

2 FLUENCE MAP OPTIMIZATION . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.3 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4 Spatial Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.5 A Primal-Dual Interior Point Algorithm for FMO . . . . . . . . . . . . . . 25

2.5.1 Primal-Dual Interior Point Algorithm . . . . . . . . . . . . . . . . . 282.5.2 Hessian Approximations . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5.2.1 Single Hessian Approximation . . . . . . . . . . . . . . . . 292.5.2.2 BFGS Hessian Update . . . . . . . . . . . . . . . . . . . . 30

2.5.3 Insignificant Beamlets . . . . . . . . . . . . . . . . . . . . . . . . . . 302.5.4 Warm Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.6.1 How Small of a Duality Gap is Necessary? . . . . . . . . . . . . . . 332.6.2 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . 342.6.3 Clinical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.6.4 Spatial Coefficient Results . . . . . . . . . . . . . . . . . . . . . . . 372.6.5 Warm Start Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5

3 BEAM ORIENTATION OPTIMIZATION . . . . . . . . . . . . . . . . . . . . . 46

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.4 Mixed-Integer Model Formulation . . . . . . . . . . . . . . . . . . . . . . . 503.5 Beam Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523.6 A Response Surface Approach to BOO . . . . . . . . . . . . . . . . . . . . 54

3.6.1 Overview of Response Surfaces . . . . . . . . . . . . . . . . . . . . . 553.6.2 Determining the Next Observation . . . . . . . . . . . . . . . . . . . 58

3.6.2.1 Maximizing the expected improvement . . . . . . . . . . . 593.6.2.2 Obtaining an upper bound on the uncertainty . . . . . . . 593.6.2.3 Branch-and-Bound . . . . . . . . . . . . . . . . . . . . . . 61

3.6.3 Method of Obtaining the Next Observation . . . . . . . . . . . . . . 693.7 Neighborhood Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.7.2 Neighborhood Search Approaches . . . . . . . . . . . . . . . . . . . 703.7.3 A Deterministic Neighborhood Search Method for BOO . . . . . . . 70

3.7.3.1 Neighborhood Definition . . . . . . . . . . . . . . . . . . . 713.7.3.2 Neighbor Selection . . . . . . . . . . . . . . . . . . . . . . 723.7.3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 72

3.7.4 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . 733.7.4.1 Neighborhood Definition . . . . . . . . . . . . . . . . . . . 753.7.4.2 Neighbor Selection . . . . . . . . . . . . . . . . . . . . . . 753.7.4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 753.7.4.4 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.7.5 A New Neighborhood Structure . . . . . . . . . . . . . . . . . . . . 773.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.8.1 Evaluating Plan Quality . . . . . . . . . . . . . . . . . . . . . . . . 793.8.1.1 Target coverage . . . . . . . . . . . . . . . . . . . . . . . . 793.8.1.2 Critical structure sparing . . . . . . . . . . . . . . . . . . 80

3.8.2 Response Surface Method Results . . . . . . . . . . . . . . . . . . . 813.8.2.1 Proof of concept . . . . . . . . . . . . . . . . . . . . . . . 833.8.2.2 Adding a non-coplanar beam to a coplanar solution . . . . 843.8.2.3 Clinical results . . . . . . . . . . . . . . . . . . . . . . . . 85

3.8.3 Neighborhood Search Method Results . . . . . . . . . . . . . . . . . 883.8.3.1 Add/Drop algorithm results . . . . . . . . . . . . . . . . . 893.8.3.2 Simulated Annealing results . . . . . . . . . . . . . . . . . 893.8.3.3 Clinical results . . . . . . . . . . . . . . . . . . . . . . . . 91

3.9 Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . 923.9.1 Response Surface Conclusions . . . . . . . . . . . . . . . . . . . . . 923.9.2 Neighborhood Search Conclusions . . . . . . . . . . . . . . . . . . . 95

6

4 FRACTIONATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964.2 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.3.1 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . 1014.3.2 Clinical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.3.3 Spatial Coefficient Results . . . . . . . . . . . . . . . . . . . . . . . 103

4.4 Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . 111

5 A MONTE CARLO METHOD FOR MODELING DOSE DEPOSITION . . . . 120

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1205.2 Monte Carlo Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1215.3 Dose Distribution of a Beamlet . . . . . . . . . . . . . . . . . . . . . . . . 121

5.3.1 Depth-Dose Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.3.2 Lateral Penumbra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.4 Methodology to Model a Beamlet . . . . . . . . . . . . . . . . . . . . . . . 1245.4.1 Modeling the Depth-Dose Curve . . . . . . . . . . . . . . . . . . . . 1255.4.2 Modeling the Lateral Penumbra . . . . . . . . . . . . . . . . . . . . 128

5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.6 Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . 138

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

7

LIST OF TABLES

Table page

2-1 Average run times for 5-beam treatment plans . . . . . . . . . . . . . . . . . . . 36

2-2 FMO value obtained using ε = 0.001 . . . . . . . . . . . . . . . . . . . . . . . . 36

2-3 Comparison of duality gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2-4 Performance measures of interior point method warm starts . . . . . . . . . . . 43

2-5 Performance measures of projected gradient method warm starts . . . . . . . . . 44

3-1 Sparing criteria varies for each critical structure . . . . . . . . . . . . . . . . . . 80

3-2 Sizes of test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3-3 Minimum FMO value obtained and time required to obtain it . . . . . . . . . . 86

3-4 Target coverage achieved by the treatment plans . . . . . . . . . . . . . . . . . . 86

3-5 Percentage of plans in which an organ is spared . . . . . . . . . . . . . . . . . . 87

3-6 Definitions of implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4-1 Case sizes and run times using identical algorithm and weighting parameters . . 102

4-2 Sparing criteria varies for each critical structure . . . . . . . . . . . . . . . . . . 103

5-1 Computation times in minutes of Monte Carlo simulations . . . . . . . . . . . . 132

5-2 Computation times for dose distribution fits . . . . . . . . . . . . . . . . . . . . 134

5-3 Variation of fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

8

LIST OF FIGURES

Figure page

2-1 Progression of duality gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2-2 Dose received by targets as a function of the duality gap . . . . . . . . . . . . . 35

2-3 Dose received by saliva glands as a function of the duality gap . . . . . . . . . . 35

2-4 Quality of DVHs for various duality gaps . . . . . . . . . . . . . . . . . . . . . . 37

2-5 The spatial coefficients used for two cases . . . . . . . . . . . . . . . . . . . . . 38

2-6 Comparison of spatial and non-spatial treatment plans . . . . . . . . . . . . . . 39

2-7 Comparison of spatial and non-spatial treatment plans . . . . . . . . . . . . . . 40

3-1 A linear accelerator and the available movements . . . . . . . . . . . . . . . . . 46

3-2 FMO value as a function of two angles . . . . . . . . . . . . . . . . . . . . . . . 51

3-3 Initial regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3-4 Partitioning a region into subregions . . . . . . . . . . . . . . . . . . . . . . . . 67

3-5 Accounting for symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3-6 The flip neighborhood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3-7 Selection probabilities in Nh(θ) and N Fh (θ) . . . . . . . . . . . . . . . . . . . . 78

3-8 Proof of concept results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3-9 Comparison of response surface, Add/Drop and equi-spaced targets . . . . . . . 87

3-10 Comparison of response surface, Add/Drop and equi-spaced targets . . . . . . . 88

3-11 Add/Drop and simulated annealing comparison of FMO convergence . . . . . . 90

3-12 Comparison of Add/Drop and 7-beam equi-spaced plans . . . . . . . . . . . . . 93

3-13 Comparison of simulated annealing and 7-beam equi-spaced plans . . . . . . . . 93

4-1 Target DVHs, saliva DVHs and axial slices in Fractions 1 and 2 . . . . . . . . . 104





9



4-8 DVHs and axial slices in Fractions 1 and 2 using spatial coefficients . . . . . . . 112







5-1 Dose distribution of a single beamlet in various tissues . . . . . . . . . . . . . . 122

5-2 Colorwash of the lateral penumbra of a finite sized pencil beam . . . . . . . . . 124

5-3 Plot of the lateral penumbra of a finite sized pencil beam . . . . . . . . . . . . . 125

5-4 Observed depth-dose curve in water for several histories . . . . . . . . . . . . . . 126

5-5 Polynomial fits of several histories . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5-6 Variation of polynomial fit as function of degree . . . . . . . . . . . . . . . . . . 128

5-7 An error function and an error function pair . . . . . . . . . . . . . . . . . . . . 129

5-8 Lateral penumbra for several numbers of Monte Carlo histories . . . . . . . . . . 130

5-9 Error function fits of several histories . . . . . . . . . . . . . . . . . . . . . . . . 131

5-10 Error function pairs summed to approximate a beamlet in water . . . . . . . . . 135

5-11 Depth-dose curves in muscle tissue. . . . . . . . . . . . . . . . . . . . . . . . . . 135

5-12 Lateral penumbra curves in muscle tissue. . . . . . . . . . . . . . . . . . . . . . 136

5-13 Depth-dose curves in lung tissue. . . . . . . . . . . . . . . . . . . . . . . . . . . 136

5-14 Lateral penumbra curves in lung tissue. . . . . . . . . . . . . . . . . . . . . . . . 137

5-15 Depth-dose curves in heterogeneous muscle and lung tissue. . . . . . . . . . . . 138

5-16 Variation of fits as a function of number of histories . . . . . . . . . . . . . . . . 139

10

Abstract of Dissertation Presented to the Graduate Schoolof the University of Florida in Partial Fulfillment of theRequirements for the Degree of Doctor of Philosophy

OPTIMIZATION METHODS IN INTENSITY MODULATED RADIATION THERAPYTREATMENT PLANNING

By

Dionne M. Aleman

December 2007

Chair: H. Edwin RomeijnMajor: Industrial and Systems Engineering

The design of a treatment plan for intensity modulated radiation therapy a

mathematical programming problem which is not yet satisfactorily solved. Current

techniques include dividing the problem into several subproblems, which are then solved

sequentially. My research addresses several of these subproblems, particularly, beam

orientation optimization (BOO), fluence map optimization (FMO) and fractionation.

The integration of the BOO and FMO subproblems is considered, as well as improved

techniques to model the dose deposition of a beamlet.

11

CHAPTER 1INTRODUCTION

1.1 Intensity Modulated Radiation Therapy Treatment Planning

Every year, approximately 1.4 million people in the United States alone are newly

diagnosed with cancer (American Cancer Society, [1]). More than half of these patients

will receive some form of radiation therapy (Murphy et al. [2], Perez and Brady [3]), and

approximately half of these patients may significantly benefit from conformal radiation

therapy (Steel [4]). During this therapy, beams of radiation pass through a patient,

thereby killing both cancerous and normal cells. Although some patients die of their

disease despite sophisticated treatment methods, many patients may suffer unpleasant side

effects as a result of the radiation therapy which may severely detract from the patient’s

quality of life.

Thus, the radiation treatment must be carefully planned so that a clinically

prescribed dose is delivered to targets containing cancerous cells so that the cancer

will be eradicated. Simultaneously, a small enough dose must be delivered to the nearby

organs and tissues (called critical structures) so that they may survive the treatment. This

is achieved by irradiating the patient using several beams sent at different orientations

spaced around the patient so that the intersection of these beams includes the targets,

which thus receive the highest radiation dose, whereas the critical structures receive

radiation from some, but not all, beams and may thus be spared. Currently, a technique

called intensity modulated radiation therapy (IMRT) is considered to be the most effective

radiation therapy for many forms of cancer.

The problem of designing an IMRT treatment plan for an individual patient is a

large-scale mathematical programming problem that is not yet solved satisfactorily.

Current treatment planning systems decompose the planning problem into several stages,

and the corresponding subproblems are solved sequentially. These subproblems include

determining the number and orientation of the beams of radiation, the radiation dose

12

distribution of each beam and the decomposition of a single treatment plan into several

smaller fractions.

This work addresses the integration of the beam orientation optimization (BOO) and

fluence map optimization (FMO) subproblems based on a convex formulation of the latter

and associated efficient algorithms for solving it, an approach which has not received much

attention in previous studies. The fractionation problem, the problem of dividing a single

treatment plan into the 35 treatments (fractions) the patient will actually receive, is also

addressed. Also, the problem of modeling the dose deposition of a beam is also considered.

1.2 Dissertation Summary

In IMRT, each beam is modeled as a collection of hundreds of small beamlets, the

fluences of which can be controlled individually. These fluence values are known as a

fluence map, and optimization of these fluences given a fixed set of beams is known as

fluence map optimization. The optimal solution value of the FMO problem quantifies the

quality of the treatment plan, where quality means the ability of the plan to deliver the

prescribed radiation dose to the specified target structures while sparing critical structures

by ensuring that they receive an acceptably low amount of radiation. Thus, the quality of

a set of beams can be measured by the optimal solution of the FMO problem performed

with those beams. Thus, the problem of selecting the best directions from which to

deliver radiation to the patient (the BOO problem) is based on the treatment plan quality

indicated by the optimal solution value to the corresponding FMO problem.

1.2.1 Fluence Map Optimization

One of the most popular subproblems of the intensity modulated radiation therapy

(IMRT) treatment planning problem is the fluence map optimization (FMO) problem.

In IMRT, each beam of radiation can be discretized in hundreds of smaller beamlets, the

radiation intensities (fluences) of which can be modulated independently of the other

beamlets. For a given set of beams, the beamlet fluences can greatly influence the quality

of the treatment plan, that is, the ability of the treatment to deposit the prescibed amount

13

of dose to cancerous target structures while simultaneously delivering a small enough dose

to critical structures so that they may continue to function after the treatment. These

fluence values are known as a fluence map, and optimization of these fluences given a fixed

set of beams is known as fluence map optimization.

Because the fluences of the beamlets can drastically affect the quality of the

treatment plan it is critical to obtain good fluence maps for radiation delivery. As the

FMO problem is one of the most popular subproblems in IMRT optimization, it has been

extensively studied in the literature. Several problem structures and algorithms to solve

various models are presented in many studies.

1.2.2 Beam Orientation Optimization

In a typical head-and-neck treatment plan, radiation beams are delivered from 5-9

nominally-spaced coplanar orientations around the patient. These coplanar orientations

are obtained from rotating the gantry only. Several components of a linear accelerator can

rotate and translate to achieve more orientations than those obtained from rotating the

gantry. The available orientations consist of the orientations obtained from rotation of the

gantry, collimator and couch, as well as the three translation directions of the couch.

Beam orientation optimization (BOO) is the problem of selecting from the available

beam orientations the best set to use in delivering a treatment plan. Given a fixed set

of beams, different fluence maps (radiation intensities of beamlets) yield treatment plans

with different qualities. Therefore, the quality of an optimized fluence map should be

considered when selecting a set of beam orientations to use in a treatment plan. Optimal

fluence maps may be difficult to obtain depending on the FMO model. Thus, it is common

in the literature for scoring approximations and other heuristics to be used to estimate the

quality of a beam solution.

Regardless of the objective function used in the BOO problem, the problem is

fundamentally nonlinear as the physics of dose deposition change with direction. Because

nonlinear programming problems are difficult to solve, most approaches to the BOO

14

problem rely on global search algorithms to obtain a solution, which may or may not be

optimal.

1.2.3 Fractionation

An important subproblem related to the FMO problem which has not yet received

much attention is the fractionation problem. Rather than deliver an entire treatment plan

in one session, a treatment plan is divided into several sessions, called fractions. This

is done to take advantage of the fact that normal, healthy cells recover faster from the

radiation than cancerous cells. To obtain the treatment plans for the fractions, in practice,

a single FMO treatment plan is developed and then divided into the desired number of

fractions, usually around 35. This division of a treatment plan is a non-trivial task, as

the target voxels, geometric cubes of tissue, must receive 1.8-2.0 Gy of radiation in each

fraction.

With a single IMRT treatment plan, it is practically impossible to devise a constant

dose-per-fraction delivery technique because only a single FMO problem is solved to

obtain the treatment plan, which is then simply divided into a number of daily fractions.

If a single plan is optimized to deliver doses to multiple target-dose levels, then the dose

per fraction delivered to each target must change in the ratio of a given dose level to the

maximum dose level. For example, say PTV1 has a prescription dose of 70 Gy, PTV2 has

a prescription dose of 50 Gy, and the number of fractions is 35. If a single treatment plan

is divided among the 35 fractions, then PTV1 will receive 70/35 = 2.0 Gy in each fraction,

but PTV2 will only receive 50/35 = 1.4 Gy, and thus any cancerous cells in PTV2 may

not be eradicated by the treatment. Similarly, if only 25 fractions are used in order to

ensure that PTV2 receives 2.0 Gy per fraction, then PTV1 receives 70/25 = 2.8 Gy per

fraction, well above the desired dose.

1.2.4 Modeling the Dose Deposition of a Beam

The FMO problem is arguably the most significant in determining the quality

of the treatment plan. The FMO problem depends heavily on the calculation of dose

15

received in each voxel of a patient. This dose is typically approximated by assuming a

linear relationship with the radiation intensities of the beamlets delivering the radiation.

Although this approximation is accepted as satisfactory, it is not truly accurate.

The dose in a voxel is determined by the paths the photons in the radiation beams

follow through the patient. Some photons may collide with particles inside the patient

and scatter in any direction, while others may collide with particles and be absorbed.

Still other photons may pass entirely through the patient with no collisions. Due to the

unpredictable nature of the radiation beam inside the patient, the dose received in a

voxel can only be accurately obtained through Monte Carlo simulations. A simple linear

relationship is assumed between total dose and beamlet fluences and is commonly accepted

as a satisfactory dose approximation in IMRT optimization. Errors of as much as 30%

have been reported for photon beams near tissue inhomogeneities (Ma et al. [5]).

For IMRT optimization, particularly with advent of image-guided IMRT (IGIMRT),

or 4D IMRT, the FMO problem must be solved extremely quickly to create real-time

treatment plans. Thus, the speed of the FMO problem is paramount. Lengthy Monte

Carlo simulation can provide an accurate measure of the dose deposited in a voxel,

but this technique is time intensive and impractical for clinical use and particularly for

treatment planning optimization.

1.3 Contribution Summary

1.3.1 Fluence map optimization

Nonlinear functions to approximate biological behavior and desired dose distributions

are common in the previously proposed FMO models in the literature, as are mixed-integer

programming models. These models can be difficult and computationally expensive to

solve. To make the FMO problem more tractable, we employ a model with a convex

objective function and linear constraints. This desirable structure allows our model to be

solved quickly and to optimality with the primal-dual interior point algorithm we have

developed specifically for this problem.

16

One of the greatest benefits of an interior point algorithm is that a globally

optimal solution can be found for many problem structures, and in particular, convex

problem structures. As our FMO model is convex, the interior point algorithm can

locate the globally optimal solution to within a specified duality gap. While there are

other algorithms that can theoretically return a globally optimal solution to a convex

problem (and many algorithms that cannot), interior point methods have the advantage of

providing a known duality gap and generally fast computation times. Because the duality

gap is known in each iteration, the user can make knowledgeable trade-offs between

computation time and solution optimality without having to guess how far from the

optimum the final solution may be. This allows for a scientific comparison of different

IMRT delivery techniques as we can solve the different problems to a specific duality gap.

Several alterations to the standard primal-dual interior point method were made

to improve its performance. Beamlets that are likely to have little or no contribution to

the treatment plan are removed a priori and different approximations to the objective

function Hessian are tested to save time in calculating the true Hessian in each iteration.

The use of warm starts to initialize the interior point method is also examined. The

solutions obtained provide quality treatment plans in a clinically feasible amount of time.

The incorporation of spatial information into the FMO model is also considered.

The probability of tumor metastasis increases with proximity to gross tumor mass. By

using the distances of voxels from target structures, the voxels can be weighted according

to their importance in the treatment plan. For example, it should be less important to

spare saliva gland voxels near a target structure than it should be to spare saliva gland

voxels far from a target. The use of spatial coefficients will help the model identify quality

treatment plans that will prevent future metastasis.

1.3.2 Beam Orientation Optimization

For head-and-neck cancers, typical IMRT treatment plans use 5-9 equi-spaced

coplanar beams. Coplanar beams are those beams obtained from the rotation of only

17

the gantry of the linear accelerator, the machine which delivers radiation beams to the

patient. If all other components of the linear accelerator are fixed, the rotation of the

gantry sweeps out a set of coplanar beams. The couch can rotate and translate in three

dimensions, and the head of the gantry can rotate independently, creating an even larger

set of beams. Beams obtained from the movement of more than one component from the

linear accelerator are known as non-coplanar beams.

Intuitively, one may expect that the number of beams required for a high-quality

treatment plan can be reduced, or the quality of the treatment plan for a given number

of beams can be improved, if the beam orientations are chosen optimally and/or from

a larger set. In particular, we investigate the effect of considering more coplanar or

non-coplanar beams. A treatment plan consisting of fewer beams is preferable because

the number of beams used in a plan directly affects the length of the actual treatment.

If fewer beams are used to treat a patient, then each treatment takes less time and more

patients can be treated in a day, which is beneficial from both a clinical and economic

perspective. Longer treatment times also allow for more errors due to possible patient

motion.

We view the BOO problem in IMRT treatment planning as a global optimization

problem with expensive objective function evaluations, each of which involves solving

a FMO problem. We propose a response surface method that, unlike other approaches,

allows for the generation of problem data only for promising beam orientations on-the-fly

as the algorithm progresses, enabling the consideration of far more candidate orientations

than is currently feasible. Our response surface approach to BOO allows us to develop

high quality plans using just four beams for head-and-neck cases, in contrast to the

current practice of using 5-9 beams. The response surface method also provides for

convergence to the globally optimal solution.

We have developed neighborhood search methods to solve our BOO model. One

method is simulated annealing, a proper global optimization algorithm, and the other

18

is a local search heuristic designed specifically for the BOO problem. The local search

heuristic, which we call the Add/Drop method, returns a locally optimal solution in a

small amount of time. The simulated annealing algorithm has the ability to escape local

minima, and is theoretically able to return a globally optimal solution given enough time.

For each of these algorithms, we have devised a new neighborhood structure based on

observations of known optimal BOO solutions compared to the simulated annealing and

Add/Drop BOO solutions. This new neighborhood structure provides faster objective

function value convergence in both algorithms.

1.3.3 Fractionation

In practice, a single FMO treatment plan is developed and then divided into the

number of desired fractions. Dividing a single FMO into multiple treatments is a

non-trivial task, owing to the need of maintaining a constant dose-per-fraction to each

the target structures, which may have different prescription doses. Therefore, any division

of a single FMO plan into multiple fractions can lead to suboptimal treatments. We

propose a new method of formulating the fractionation problem which yields optimal

fluence maps for each cancerous target structure. These fluence maps can then be easily

divided into optimal fractions.

The proposed fractionation model is solved using the same primal-dual interior point

method presented for the FMO problem. The solutions provide high quality fluence maps

for each target, and in a clinically acceptable amount of time.

1.3.4 Modeling the Dose Deposition of a Beam

We propose obtaining a limited number of Monte Carlo histories to obtain a noisy

dose distribution which can then be transformed into a very accurate, smooth dose

distribution suitable for optimization techniques in a reasonable amount of time.

Because the particles in a beamlet scatter in three dimensional space, multiple

dose distributions must be considered to satisfactorily model the beamlet’s affect on

the patient’s tissue. These distributions arise from the amount of radiation the beamlet

19

deposits as a function of depth (the depth-dose curve), and from the amount of radiation

radiating outward from the center of the beamlet (the lateral penumbra). The depth-dose

curve is modeled using a high-degree polynomial and the lateral penumbra is modeled as

the sum of error functions. The parameters of the error functions are determined using a

Levenberg-Marquardt quasi-Newton minimization method.

Using these techniques, dose distributions with satisfactory accuracy can be obtained

using at least a factor of 10 fewer Monte Carlo histories than would otherwise be required.

This can greatly decrease the amount of time required to obtain dose data for beamlets in

the FMO problem of IMRT treatment planning without sacrificing accuracy.

20

CHAPTER 2FLUENCE MAP OPTIMIZATION

2.1 Introduction

IMRT is differentiated from conformal radiation therapy by the dose distributions

that can be delivered by each beam. Rather that just delivering a uniform radiation field

of radiation, the dose distribution of a beam can be any desired distribution. This ability

allows for greater flexibility and accuracy in targeting the target structures while avoiding

the critical structures.

The dose distribution of a beam is achieved as follows. In IMRT, each beam can

be thought of as consisting of several hundred smaller beamlets, each of which can have

its own radiation intensity (fluence) independent of its neighbors. By modulating the

intensities of these beamlets, any dose distribution can be achieved. Given a fixed set of

beams, the optimization of these intensities is called fluence map optimization.

2.2 Literature Review

Because the FMO problem is one of the most studied problems of IMRT, many

different approaches have been taken to formulate the FMO problem, based on both

“physical” (Bortfeld [6]) and “biological” (Alber and Nusslin [7], Jones and Hoban [8],

Kallman et al. [9], Mavroidis et al. [10], Niemierko et al. [11], Niemierko [12], Wu et al. [13,

14]) objective functions and constraints. Linear programming (LP)-based multi-criteria

optimization (Hamacher and Kufer [15]) and mixed-integer linear programming (MILP)

(Bednarz et al. [16], Ferris et al. [17], Langer et al. [18, 19], Lee et al. [20, 21], Shepard et

al. [22]) models have been proposed for FMO.

Constraints to enforce various measures of treatment quality are also taken into

account in different FMO models. Hamacher and Kufer [15] include the homogeneity

of the dose received by the targets as a constraint in their FMO model. Full-volume

constraints, which require that the dose in every voxel of a structure be within pre-determined

upper and lower bounds, are common for controlling the dose in each structure. Models

21

employing full-volume constraints are found in Bednarz et al. [16], Hamacher and Kufer

[15], Lee et al. [20, 21], Romeijn et al. [23] and many others. Models containing partial

volume constraints, constraints requiring that dose in only a subset of voxels be within

pre-determined upper and/or lower bounds, are also common. Formulations with partial

volume constraints are found in Lee et al. [20, 21], Romeijn et al. [23, 24] and Shepard et

al. [22].

In addition to varying constraints, there are many competing methods of formulating

the FMO objective function to reflect the quality of the treatment plan. Shepard et al. [22]

describe several different objective formulations. These formulations include minimizing

the sum of doses received at all voxels; minimizing a weighted combination of doses

received at each voxel, where the weights depend on the structure in which the voxel

resides; and minimizing the deviation of the dose in each voxel from the recommended

prescription.

Romeijn et al. [25] showed that most of the treatment plan evaluation criteria

proposed in the medical physics literature are equivalent to convex penalty function

criteria when viewed as a multicriteria optimization problem. For each set of treatment

plan evaluation criteria from a very large class, there exists a class of convex penalty

functions that produces an identical Pareto efficient frontier. Therefore, a convex penalty

function-based approach to evaluating treatment plans is used to investigate the BOO

problem. Although this approach could be used in a multicriteria setting, Romeijn

et al. [23, 26] suggest that it is possible to quantify a trade-off between the different

evaluation criteria that produces high-quality treatment plans for a population of patients,

eliminating the need to solve the FMO problem as a multicriteria optimization problem for

each individual patient.

2.3 Model Formulation

A convex penalty function-based approach to the FMO model as described in

Romeijn et al. [23] is employed to quantify the quality of the treatment plan by appropriately

22

making the trade-off between delivering the prescribed radiation dose to the target

structures while sparing the critical structures. Using this approach, the FMO problem

can formulated as a quadratic programming problem with linear constraints as follows.

Denote the set of all potential beam orientations as B. The structures (both targets

and critical structures) are irradiated using a predetermined set of beam angles, denoted

θ, where each beam θh ∈ B, h = 1, . . . , k and k is the number of beams in θ. Each beam

is decomposed into a rectangular grid of beamlets with m rows and n columns, yielding

typically 100-400 beamlets per beam. The position and intensity of all beamlets in a beam

can be represented by a vector of values representing the beamlet intensities, called bixels.

The set of all bixels in beam θh is denoted by Bθh. The core task in IMRT treatment

planning is finding radiation intensities for all beamlets.

Denote the total number of structures by S and the number of targets by T . Each

structure s is discretized into a finite number vs of volume cubes, known as voxels.

Typically, around 350,000 voxels are required to accurately represent the targets and

surrounding structures of a head-and-neck cancer site.

Because a beamlet must pass through a certain amount of tissue to reach a voxel, the

dose received in a voxel from a beamlet may not be the full delivered intensity. Denote

Dijs as the dose received by voxel j in structure s from beamlet i at unit intensity. The

Dijs values are known as dose deposition coefficients. Let xi denote the intensity of bixel i.

This brings us to the following expression for the dose zjs received by voxel j in structure

s:

zjs =k∑

h=1

∑i∈Bθh

Dijsxi j = 1, . . . , vs, s = 1, . . . , S

Although the goal of IMRT treatment planning is to control the dose received by

each structure, if hard constraints are imposed on the amount of dose received by each

structure because such a solution may not exist. In some cases, it may be necessary to

sacrifice organs in order to treat targets, and if that possibility is not allowed in the model,

then a feasible or a satisfactory solution may not exist. Thus, in our model, a penalty is

23

assigned to each voxel based on the dose it receives for a given set of beamlet intensities.

Let Fjs denote a convex penalty function for voxel j in structure s of the follwing form:

Fjs(zjs) =1

vs

(ws

[(Ts − zjs)

+]p

s + ws

[(zjs − Ts)

+]ps

),

where Ts is the dose threshold value for structure s, ws and ps

are weighting factors for

underdosing, and ws and ps are weighting factors for overdosing. The expression (·)+

denotes max{0, ·}. The function is normalized over the number of voxels in the structure

using the coefficient 1/vs. By setting ws, ws ≥ 0 and ps, ps≥ 1, convexity is ensured.

A basic formulation of the FMO problem is then:

minimizeS∑

s=1

vs∑j=1

Fjs(zjs)

subject to zjs =k∑

h=1

∑i∈Bθh

Dijsxi j = 1, . . . , vs, s = 1, . . . , S

xi ≥ 0 i ∈ Bθh, h = 1, . . . , k

The FMO problem is the black-box function F (θ) in the BOO model to quantify the

quality of beam vector θ. In contrast with the methods presented by all of the previously

cited FMO studies except for Das and Marks [27], Haas et al. [28] and Schreibmann [29],

this measure of beam vector quality is an exact measure of the FMO problem, rather than

using heuristic methods or scoring approaches which cannot accurately optimize the beam

orientations.

2.4 Spatial Considerations

With IMRT optimization, it is possible to generate treatment plans with similar FMO

objective function values but very different levels of clinical treatment quality. Chao et

al. 2003 [30] illustrate this possibility with two treatment plans that have nearly identical

target coverage when plotted on a dose-volume histogram, but while one plan delivers

an acceptable homogeneous dose, the other plan results in significant underdosing of the

target structure.

24

Chao et al. 2003 [30] show that the probability of microscopic tumor extension

decreases linearly with distance from the gross tumor volume, implying that cold spots

located near the gross tumor volume are far more likely to allow for tumor metastasis after

treatment. Likewise, cold spots located far from the gross tumor volume are unlikely to

result in tumor metastasis.

To reduce the likelihood of obtaining an unsatisfactory plan with a good dose-volume

histograms, spatial coefficients are introduced into the FMO model. For each voxel, we

consider its position relative to the primary target as a measure of how acceptable/unacceptable

overdosing or underdosing may be. Voxels further from the gross tumor volume are

penalized more heavily than voxels closer to the gross tumor because it is less acceptable

for a voxel far away from the actual tumor to receive an overdose, as the cancerous cells

are unlikely to spread very far from the tumor location (Chao et al. [30]). This additional

penalization is called the spatial coefficient, and is denoted cjs for voxel j in structure s.

For voxels inside the target structures, the probability of cancer spread is 1, as cancer

already exists in those voxels. Let S ′ denote the set of gross tumor structures. Let d`js be

the minimum distance from voxel j in structure s to structure `. The spatial coefficient cjs

for voxel j in structure s is

cjs =

1 j = 1, . . . , vs, s /∈ S ′

min{

1, max{

0.001,∑|S′|

`=1 [exp (−λ`d`js) + µ`d`js + β`]}}

j = 1, . . . , vs, s ∈ S ′,

where λ`, µ` and β` are weighting coefficients. The objective function for the FMO

problem becomes

Fspatial(x) =S∑

s=1

vs∑j=1

cjsFjs(zjs)

2.5 A Primal-Dual Interior Point Algorithm for FMO

To solve the FMO and fractionated FMO models, a primal-dual interior point method

is employed. For a convex problem such as the FMO model presented in the preceding

section, this method yields an optimal solution in short amount of time.

25

The primal-dual interior point algorithm moves through the interior of the solution

space along a central path (a path through the interior of the solution space) toward the

optimal solution. The central path is defined by perturbing the KKT conditions described

below. These conditions ensure primal feasibility, dual feasibility and complementary

slackness. If these conditions are satisfied for a convex programming problem with linearly

independent constraints, they yield the optimal solution. Thus, we only need to solve this

system to obtain an optimal solution to our FMO model (which has a convex objective

function and linear, linearly independent constraints). The KKT system can be difficult to

solve, so the conditions are perturbed in order to obtain a solution.

The general idea of the primal-dual interior point algorithm is to start from an initial

feasible solution, use the perturbed KKT conditions to obtain a step direction close to the

central path, and then move the current solution some step length along that direction.

The amount of pertubation in the KKT conditions is gradually decreased so that in each

step, the solution becomes closer to the optimum. The interior point method allows for

the duality gap, the gap between the objective functions of the primal and dual problems,

to be calculated, thus providing a measure of how close the current solution is to the

optimum. For a problem with continuous variables, when the objective functions of the

primal and dual problems are equal (duality gap of zero), the solution is optimal.

A mathematical description of the primal-dual interior point method can be found

in Nocedal and Wright [31]. Further explanation is provided only as needed to define

variables in the algorithm. In the FMO problem, G(x) = −Ix, so the KKT conditions for

the FMO formulation are

∑s∈S

1

vs

∑j∈Vs

DijF′j

(∑`∈N

D`jx`

)− si = 0 i ∈ N. (2–1)

sixi = 0 i ∈ N. (2–2)

si ≥ 0 i ∈ N (2–3)

xi ≥ 0 i ∈ N, (2–4)

26

where the Equation (2–4) ensures that the solution is feasible, as the only constraints

in the FMO problem are nonnegativity. The complimentary slackness constraint (2–2)

forces the solution to the above conditions to be on the boundary of the solution space.

Since a point in the interior of the solution space is desired, the complimentary slackness

constraint must be relaxed.

The complimentary slackness constraint (2–2) is relaxed by changing each sixi = 0 to

sixi = µ, where µ > 0. This, along with requiring that x > 0 and s > 0 for feasibility,

ensures that a solution to the perturbed KKT conditions is an interior point.

Let n be the size of decision variable vector x. A solution is “close enough” to the

central path if the duality measure µ in iteration k is

µk =(xk)>s

n(2–5)

and ||XkSk − µke|| ≤ θµk, where Xk is a matrix with xki values as diagonals and zeros

elsewhere, and Sk is a matrix with ski values as diagonals and zeros elsewhere.

As the algorithm progresses, µ is reduced to zero until the solution is sufficiently close

to optimality. To reduce µ, in each iteration we set µ = µσ, where σ ∈ [0, 1] is called the

centering parameter. If the duality gap is very large, σ can be reduced so that µ is reduced

faster.

In each iteration, the current solution (x, s) is moved in a direction (∆x, ∆s) for some

step length α is given by xk+1

sk+1

=

xk

sk

+ α

∆xk

∆sk

Let Xk = diag(xk), Sk = diag(sk), H(xk) = ∇2φ(xk). The directions ∆xk and ∆sk

can be determined by solving the following equations:

[(Xk)−1

Sk + H(xk)]∆xk = −rDF −

(Xk)−1

rxs (2–6)

∆sk = −(Xk)−1 (

rxs + Sk∆xk)

(2–7)

27

In order to solve this system, we must obtain ∆xk from Equation (2–6) by taking the

inverse of [(Xk)−1Sk + H]. Because computing the inverse of such a large dense matrix is

very time consuming, a Cholesky factorization to solve this system quickly.

The primal-dual interior point method requires a feasible (x, s) solution in each step.

Thus, a maximum step length αmax must be imposed on each step direction to ensure that

x ≥ 0 and s ≥ 0:

αmax = min

{min

i=1,...,n{−xi/∆xi} , min

i=1,...,n{−si/∆si}

}Because the inverse of each xi is required to determine the step directions, it is

undesirable to have any xi = 0, which would result from using step length αmax. Instead,

only a percentage η < 1 of αmax is used:

α = min{1, ηαmax} (2–8)

The benefit of this primal-dual method is that in each step, we can calculate the

objective of the dual problem (simply s>x), thus providing a bound on how far the current

solution is from optimality.

2.5.1 Primal-Dual Interior Point Algorithm

The primal-dual interior point algorithm is as follows:

• Initialization

1. Select initial values for ε, σ and η (we use ε = 5, σ = 0.01, and η = 0.95).2. Set x0 = 0.05 (very close to 0) and calculate ∇φ(x0) and H(x0) = ∇2φ(x0).3. Set s0 = µ(X0)−1.4. Set µ0 = (

∑ni=1∇φ(x0)i)/100.

5. Set k = 0.

• Algorithm

1. If the duality gap is very large ((xk+1)>sk+1 > 107ε), set σ = 0.01σ.

2. Set µk = σµk.

28

3. Solve for the step direction (∆xk, ∆sk) as described in Equations (2–6) and(2–7). Note that this involves calculating the Hessian H(xk).

4. Solve for the step length α as described in Equation (2–8).

5. Set xk+1 = xk + α∆x and sk+1 = sk + α∆s.

6. If the duality gap (xk+1)>sk+1 < ε, stop. Otherwise, set µk+1 = (xk+1)>sk+1/nand k ← k + 1 and repeat.

2.5.2 Hessian Approximations

The most time-consuming step in the primal-dual interior point algorithm is

calculating the Hessian of the objective function in each iteration. For clarity, let∑

denote∑

s∈S 1/vs

∑j∈Vs

and F ′′j (x) denote F ′′

j (∑

l∈N Dljxl). The Hessian of the FMO

problem is then given by

H(x) =

∑

F ′′j (x)D2

1j . . .∑

F ′′j (x)D1jDnj

.... . .

...∑F ′′

j (x)DnjD1j . . .∑

F ′′j (x)D2

nj

Note that only the pairwise Dij products differ in each element of the Hessian. By

precomputing these cross products, only∑

s∈S 1/vs

∑j∈Vs

F ′′j (∑

l∈N Dljxl) has to be

recomputed in each iteration. The matrix of the Dij products yields the sparsity (or

density) pattern of the Hessian, which stays constant throughout the algorithm. Because

the Hessian is symmetric, the matrix values only need to be computed for half of the

matrix, further improving efficiency.

Despite these observations, computing the Hessian is still so expensive that it renders

the algorithm impractical. Methods of approximating the Hessian are implemented to

speed up the algorithm.

2.5.2.1 Single Hessian Approximation

One way of speeding up the algorithm is to compute the Hessian just once during

initialization to obtain H(x0), and then rather than re-compute the Hessian in each

iteration, use H(x0) as an approximation to H(xk). We call this the Single Hessian

29

approximation. Although the convergence of such an approximation has not yet been

mathematically proven, tests run on several head-and-neck cases for 5-beam and 7-beam

plans show that the Single Hessian does in fact converge to the known optimal solution.

2.5.2.2 BFGS Hessian Update

Another Hessian approximation is the Broyden-Fletcher-Goldfarb-Shanno (BFGS)

Hessian update. The approximation to the Hessian in iteration k is Bk, with B0 = H(x0).

The update to the approximated Hessian in each iteration is

Bk+1 = Bk +qkq

>k

q>k pk

− Bkpkp>k Bk

p>k Bkpk

,

where

pk = xk+1 − xk

qk = ∇φ(xk+1)−∇φ(xk)

Note that this update ensures that Bk is always symmetric and positive definite,

so the Cholesky factorization can still be applied to obtain the step direction. This

approximation also empirically converges to the known optimal solution for 5- and 7-beam

head-and-neck cases.

2.5.3 Insignificant Beamlets

Insignificant beamlets are those that bear little contribution to the quality of the

FMO plan. Letting d denote the diagonal elements of the initial Hessian H(x0), the set of

insignificant beamlets BI is defined as

BI =

{i :

|di|max{|d|}

< 0.001

}These beamlets are removed by removing the ith row and the ith column in H(x0) for

every i ∈ BI , and then updating the number of bixels to the number of remaining bixels.

The insignificant beamlets must be re-inserted into the solution xk in order to calculate

30

the voxel doses, objective function, gradient and Hessian, but the inversion of the Hessian

is done to the Hessian with the bad beamlets removed, providing significant time savings.

2.5.4 Warm Start

For the sake of theoretical accuracy, a truly optimal solution cannot have the bad

beamlets described in Section 2.5.3 removed. Without removing the bad beamlets a priori,

the interior point method must be run for an impractical amount of time to obtain a

near-optimal solution, say, ε = 0.001. The interior point method is typically started with

a decision variable vector x equal to almost zero. If the algorithm were to be started at a

point closer to the final solution, denoted xwarm, time savings could be gained, allowing all

beamlets to be considered in the interior point algorithm in a reasonable amount of time.

Such an approach is a called a “warm start”.

One difficulty in using a warm start with the interior point method is that a warm

start solution may have some xwarmi = 0, which is not allowed because the inverse of each

xi must be taken. To correct this problem, any xwarmi = 0 is simply replaced with some

very small value γ. Because these zero-valued variables are less important to the problem

than nonzero variables, γ should be less than the minimum nonzero value of xwarm. Let

γ = mini=1,...,n{xwarmi : xwarm

i > 0}. Then, γ = min{0.001, γ}.

x0i =

xwarmi i /∈ BI

γ i ∈ BI

An additional problem with warm starts in the interior point method is that the KKT

variable vector s is unknown at the warm start point. Depending on the algorithm used

to obtain the warm start, some information about swarm and µwarm, s and µ at the warm

start point, respectively, may not be available. If no information is available about s from

the warm start, then s0 = 0. If an interior point algorithm is used to obtain the warm

start, then swarm is available. If the warm start did not include the insignificant beamlets,

some corrections must be made to account for the insignificant beamlets which will be

31

optimized in the final solution. Let s0 be the initial s used in the interior point method

after the warm start has been obtained. Then,

s0i =

swarmi i /∈ BI

µwarm/γ i ∈ BI ,

where the value chosen for s0i corresponding to insignficant beamlets arises from the

general initialization s = µ(X0)−1.

2.6 Results

The true Hessian, Single Hessian approximation, and BFGS update implementations

of the primal-dual interior point algorithm are tested on six cases head-and-neck cases to

obtain coplanar, equi-spaced 5-beam plans. The tests are run on a 2.33GHz Intel Core 2

Duo processor with 2GB of RAM. The method is tested for both leaving in and removing

the insignificant beamlets, as well as the proposed alternative to computing the Hessian.

The optimality of the interior point method solutions is verified by comparison to the

known optimal solutions obtained by Java with CPLEX (ILOG).

An acceptable duality gap must be determined in order to implement the interior

point method. While we consider a duality gap of ε = 0.001 to be acceptably close to

optimal, it may be unnecessary to achieve such a small duality gap to obtain a quality

solution. A duality gap of 0.001 may be sufficiently small to ensure optimal solutions given

objective function values using certain weighting parameters, depending on the parameters

used in the FMO objective function, the value of the objective function may vary widely.

Because of the potential range of values, a stopping criteria based on a relative duality gap

rather than an absolute duality gap is preferable. Say the objective function value in an

iteration is f . Define the relative duality gap in an iteration to be ε′ = ε/f .

An examination of the relative duality gap necessary is presented in Section 2.6.1.

Computational results are presented in Section 2.6.2 and clinical comparisons are provided

in Section 2.6.3.

32

2.6.1 How Small of a Duality Gap is Necessary?

Because the run time of the algorithm is dependent on the required duality gap, it

is desirable to only require the algorithm to achieve as small a duality gap as necessary

to ensure a clincally good solution. The duality gap decreases quickly in the first few

iterations, and then subsequently decreases by only a small amount per iteration, as

shown in Figure 2-1A. If these iterations with only marginal improvements are found to

be unnecessary in terms of clinical quality, significant time can be saved by stopping the

algorithm once the duality gap is reasonably small, as opposed to waiting until the duality

gap is very small.

To check the importance of the duality gap, the FMO value and dose delivered to the

targets and the saliva glands were plotted against the duality gap in each iteration using

the true Hessian and without removing insignificant beamlets. For a representative case,

the FMO values per duality gap are shown in Figure 2-1B. It is clear that the duality gap

decreases rapidly in the first few iterations, but subsequent iterations yield increasingly

smaller drops in the duality gap.

Similarly, the amount of dose received by the targets and critical structures does not

change significantly toward the end of the algorithm. Figure 2-2 plots the dose received by

the two targets, PTV1 and PTV2, starting from a duality gap of 0.15%. The prescription

doses are 70 Gy for PTV1 and 50 Gy for PTV2, common dose values used in the cancer

clinic at Shands Hospital at the University of Florida. Neither the dose received by 95% of

the targets nor the size of the hotspots and coldspots changes significantly in this duality

gap range (Figure 2-2A). The hotspots are measured by the percent of the target receiving

110% and 120% of the prescription dose, while the coldspots are measured by the percent

of the target receiving at least 93% of the prescription dose (Figure 2-2B).

Figure 2-3 shows for two representative cases the amount of dose received by the

saliva glands starting from a relative duality gap of 0.15%. Both cases show that the

33

0 5 10 15 20 250

1

2

3

4

5

x 104

FM

O v

alue

0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

iterations

rela

tive

dual

ity g

ap

Objective function and relative duality gap v. iteration

Figure 2-1. The duality gap drops sharply in early iterations, but very slowly thereafter.The relative duality gap monotonically decreases after several iterations.

change in dose received by the saliva glands as the duality gap decreases is not clinically

relevant.

From these figures, it appears that a duality gap as large as 0.1% could provide

clinically acceptable plans. Since the algorithm may terminate with a duality gap less than

the one specified as the stopping criteria, a duality gap larger than 0.1% will also be tested

for acceptability.

2.6.2 Computational Results

Table 4-1 shows the average run times for each of the implementations of the

algorithm. Relative duality gaps of 0.15%, 0.10%, 0.05% and 0.01%. are compared.

The value of θ used to define the central path is 0.5. As expected, using the Single

Approximation Hessian alternative with the insignificant beamlets removed is the fastest

method, while using the true Hessian is the slowest method, regardless of whether the

insignificant beamlets are removed. Interestingly, for large duality gaps, it is slightly faster

to leave the insignificant beamlets in the model when using the true Hessian. Otherwise, it

is faster to remove the insignificant beamlets.

The final FMO values are displayed for each of the tested methods using a duality

gap of 0.001, which is sufficiently small to ensure optimal solutions given typical objective

function values (Table 2-2). For each case, the final FMO value is nearly identical,

34

00.050.150

55

60

65

70

75

relative duality gap (%)

dose

(G

y)

Target coverage at 95%

PTV1PTV2

00.050.10

20

40

60

80

100


Per

cent

of t

arge

t

Target hotspots and coldspots

PTV1 at 1.10PTV1 at 1.20PTV1 at 0.93PTV2 at 1.10PTV2 at 1.20PTV2 at 0.93

A BFigure 2-2. Dose received by targets as a function of the duality gap. A) The amount of

dose received by at least 95% of each target is used to assess proper targetcoverage. B) The percent of each target receiving 110% and 120% of theprescription dose indicates hotspots, while 93% of the prescription doseindicates coldspots.

00.050.110

15

20

25

30


dose

(G

y)

Saliva gland dose at 50%

L. parotid glandR. parotid glandL. SMB glandR. SMB gland

00.050.122

24

26

28

30

32


dose

(G

y)

Saliva gland dose at 50%

R. parotid glandL. parotid glandR. SMB glandL. SMB gland

Figure 2-3. The amount of dose received by at least 50% of each saliva gland remainsrelatively constant even for large duality gaps. Two representative cases areshown.

35

Table 2-1. Average run times for 5-beam treatment plans.

Remove insig. Average run time (s)Hessian type beamlets? ε = 0.001 ε′ = 0.15 ε′ = 0.1 ε′ = 0.05 ε′ = 0.01True no 113.8 55.48 55.48 58.58 71.75True yes 105.6 55.25 56.29 59.09 70.56BFGS no 43.9 13.59 14.17 14.66 16.67BFGS yes 40.9 13.19 13.66 14.30 15.88Single Approx. no 18.1 8.83 8.98 9.29 10.13Single Approx. yes 16.8 8.69 8.84 9.14 9.90

Table 2-2. FMO value from using ε = 0.001.

Remove insig.Hessian type beamlets? Case 1 Case 2 Case 3 Case 4 Case 5 Case 6True Hessian no 2546.22 2200.70 2289.95 2566.38 5024.97 2585.40True Hessian yes 2546.22 2200.70 2289.95 2566.38 5024.97 2585.40BFGS update no 2546.23 2200.70 2289.95 2566.39 5024.97 2585.40BFGS update yes 2546.24 2200.70 2289.95 2566.39 5024.97 2585.40Single Approx. no 2546.38 2201.11 2290.40 2566.56 5025.06 2585.82Single Approx. yes 2546.38 2201.15 2290.44 2566.62 5025.14 2585.82

indicating that the Hessian alternatives and the removal of the insignificant beamlets still

provide for convergence to the optimal solution.

The percentage increases in the FMO values using an absolute duality gap of 0.001

and relative duality gaps of 0.15%, 0.10%, 0.05% and 0.01% are shown in Table 2-3.

2.6.3 Clinical Results

For each of the duality gaps tested, the DVHs of the solutions obtained using the

Single Approximation Hessian with the insignificant beamlets removed are compared.

Since the each of the interior point implementations obtains nearly identical solutions, it

does not matter which implementation is used to produce the DVHs.

As previously stated, the prescription doses used are 70 Gy for PTV1 and 50 Gy for

PTV2, marked by a vertical line in Figure 2-4A. As saliva glands are the most difficult

organs to spare in head-and-neck cases, the only critical structures shown are the saliva

glands (Figure 2-4B). All other glands are spared in every implementation. The sparing

criteria used for saliva glands is that no more than 50% percent of the saliva gland can

36

Table 2-3. Percent increase in objective function value from various relative duality gapsas opposed to an absolute duality gap of ε = 0.001.

Remove insig. Avg. increase in obj. fn. (%)Hessian type beamlets? ε′ = 0.15 ε′ = 0.1 ε′ = 0.05 ε′ = 0.01True no 0.58 0.58 0.27 0.05True yes 0.58 0.48 0.25 0.06BFGS no 0.99 0.54 0.30 0.05BFGS yes 0.94 0.57 0.26 0.07Single Approx. no 1.26 0.89 0.60 0.19Single Approx. yes 1.21 0.87 0.57 0.16

0 1020304050607080900

20

40

60

80

100

Interior point method: Target DVHs

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

ε’=0.15%ε’=0.10%ε’=0.05%ε’=0.01%

0 1020304050607080900

20

40

60

80

100

Interior point method: Saliva DVHs

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

ε=0.15%ε=0.10%ε=0.05%ε=0.01%

A BFigure 2-4. Quality of DVHs for duality gaps ε′=0.01%, 0.05%, 0.1% and 0.15%. A) The

target coverage is nearly identical. B) The saliva gland sparing for the differentduality gaps is similar, but the solution for ε′=0.15% sacrifices one salivagland. The sparing criteria is marked by a star.

receive more than 30 Gy in order to be spared. This point is marked by a star in Figure

2-4B.

Each of the duality gaps achieves good target coverage. While they each provide

similar saliva gland dosage, the plan obtained using ε′ = 0.15% slightly surpasses the

sparing criteria used for saliva glands.

2.6.4 Spatial Coefficient Results

To assess the possible treatment plan improvement afforded by spatial coefficients,

spatial parameters were tuned and then compared to treatment plans obtained without

using spatial information. To demonstrate the spatial coefficients, Figure 2-5 displays the

37

10 20 30 40 50 60 70

10

20

30

40

50

60

5 10 15 20 25 30 35 40 45 50 55

10

20

30

40

50

60

Figure 2-5. The spatial coefficients used for two cases.

coefficients used for two cases. In addition to tuning λ, µ and β to values of 1.07, -0.32

and 0.77, respectively, a minimum spatial coefficient of 0.025 was also set for target voxels.

By definition, the maximum value of a spatial coefficient is 1.

These spatial parameters generally produce treatment plans of nearly identical

quality to the best plans obtained without using spatial information, though with the

added benefit of preventing misleading dose-volume histograms. In some cases, the spatial

coefficients were able to outperform the non-spatial plans. Figures 2-6 and 2-7 illustrates

two such cases.

In Figure 2-6, the spatial coefficients yield improved target coverage and spare all

saliva glands, as opposed to the non-spatial plan which only spares three of the four saliva

glands. There is less dose outside the desired target in the plan using spatial coefficients.

In Figure 2-7, the spatial coefficients reduce the amount of overdose in the primary

targets. In this patient, both the spatial and non-spatial plans spare all saliva glands.

2.6.5 Warm Start Results

Warm start solutions were obtained using the interior point method and the projected

gradient algorithm (Nocedal and Wright [31]). The interior point method warm starts

were tested with each Hessian possibility and a large duality gap of 200, both with and

without insignificant beamlets removed. The projected gradient algorithm was tested using

38

0 1020304050607080900

20

40

60

80

100

Target DVHs: Non−spatial

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100

Target DVHs: Spatial

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100

Saliva gland DVHs: Non−spatial

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

left parotid glandleft submandibular glandright parotid glandright submandibular gland

0 1020304050607080900

20

40

60

80

100

Saliva gland DVHs: Spatial

Dose [Gy]

Vol

ume

[Fra

ctio

nal]


A BFigure 2-6. Comparison of spatial and non-spatial treatment plans. A) Non-spatial

parameters result in slightly low target dosage and fail to spare one salivagland. B) Spatial parameters allow for improved target coverage and spare allsaliva glands.

39

0 1020304050607080900

20

40

60

80

100

Target DVHs: Non−spatial

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100

Target DVHs: Spatial

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100

Saliva gland DVHs: Non−spatial

Dose [Gy]

Vol

ume

[Fra

ctio

nal]


0 1020304050607080900

20

40

60

80

100

Saliva gland DVHs: Spatial

Dose [Gy]

Vol

ume

[Fra

ctio

nal]


A BFigure 2-7. A) Non-spatial parameters result in slightly low target dosage and fail to spare

one saliva gland. B) Spatial parameters allow for improved target coverage andspare all saliva glands.

40

several stopping criteria and without insignificant beamlets removed. It was observed

that the projected gradient algorithm is fast enough that the time required to remove and

re-insert the insignificant beamlets as necessary caused the algorithm to slow down. To

be theoretically close to optimal, the interior point method used after the warm start has

duality gap of 0.001 and no beamlets removed.

The determine the how close the warm start solution is to the final solution, the

percent improvement in objective function value the final solution obtains over the warm

start is measured. To assess how close to optimality the final solutions using a warm start

are, the percentage by which their objective function values are greater than the objective

function value of a near-optimal solution is measured. Lastly, the decrease in run times

over obtaining a near-optimal solutions are provided. These results for the interior point

and projected gradient warm starts are displayed in Tables 2-4 and 2-5, respectively.

From Table 2-4, it is clear that using an interior point warm start can provide

significant time savings over the near-optimal solution times. There is also a significant

increase in the FMO objective function value. From the amount of increase in the

objective function value, the interior point warm start does not appear to converge to

the optimal solution, and is unlikely to provide acceptable solutions. It is interesting to

note that the improvement from the warm start solution to the final solution is very small.

This indicates that KKT information obtained from the warm start and used in the final

algorithm were unhelpful in improving the solution.

For the projected gradient algorithm, once there is less than δ percent decreases from

one iteration to the next, the algorithm terminates. Several δ values are tested. As with

the interior point warm starts, the projected gradient warm starts also provided significant

time savings, as shown in Table 2-5. The final solutions from the projected gradient warm

start methods are nearly identical to the near-optimal solutions. The final interior point

method also significantly improves the objective value of the warm start solution. This

implies that despite not having KKT information about the warm start, the interior point

41

algorithm is still able to converge to the optimal, or at a least near-optimal, solution using

the KKT value approximations and adjustments to the warm start vector described in

Section 2.5.4.

42

Tab

le2-

4.Per

form

ance

mea

sure

sof

inte

rior

poi

nt

met

hod

war

mst

arts

.

inte

rior

poin

twar

mst

art

final

inte

rior

poin

tal

gori

thm

Impro

vem

ent

Incr

ease

Rem

ove

insi

g.R

emov

ein

sig.

over

war

mst

art

infinal

Avg.

tim

eH

essi

anty

pe

bea

mle

ts?

ε′H

essi

anty

pe

bea

mle

ts?

ε′ob

j.fn

.(%

)ob

j.fn

.(%

)sa

vin

gs(s

)Tru

eno

5tr

ue

no

0.01

0.00

4.46

64.7

5Tru

eye

s5

true

no

0.01

0.19

4.48

65.2

0Tru

eno

5B

FG

Sno

0.01

0.00

4.79

27.9

4Tru

eye

s5

BFG

Sno

0.01

0.20

4.84

28.4

7Tru

eno

5Sin

gle

Appro

x.

no

0.01

0.00

5.06

6.85

Tru

eye

s5

Sin

gle

Appro

x.

no

0.01

0.76

4.49

6.93

BFG

Sno

5tr

ue

no

0.01

0.00

4.46

64.9

7B

FG

Sye

s5

true

no

0.01

0.19

4.48

65.0

9B

FG

Sno

5B

FG

Sno

0.01

0.00

4.79

27.8

3B

FG

Sye

s5

BFG

Sno

0.01

0.20

4.84

28.5

5B

FG

Sno

5Sin

gle

Appro

x.

no

0.01

0.00

5.06

6.90

BFG

Sye

s5

Sin

gle

Appro

x.

no

0.01

0.76

4.49

6.87

Sin

gle

Appro

x.

no

5tr

ue

no

0.01

0.00

4.46

64.9

9Sin

gle

Appro

x.

yes

5tr

ue

no

0.01

0.19

4.48

65.0

0Sin

gle

Appro

x.

no

5B

FG

Sno

0.01

0.00

4.79

27.9

5Sin

gle

Appro

x.

yes

5B

FG

Sno

0.01

0.20

4.84

28.5

4Sin

gle

Appro

x.

no

5Sin

gle

Appro

x.

no

0.01

0.00

5.06

6.88

Sin

gle

Appro

x.

yes

5Sin

gle

Appro

x.

no

0.01

0.76

4.49

6.88

43

Tab

le2-

5.Per

form

ance

mea

sure

sof

pro

ject

edgr

adie

nt

met

hod

war

mst

arts

.

inte

rior

poin

twar

mst

art

final

inte

rior

poin

tal

gori

thm

Impro

vem

ent

Incr

ease

Rem

ove

insi

g.R

emov

ein

sig.

over

war

mst

art

infinal

Avg.

tim

ebea

mle

ts?

δH

essi

anty

pe

bea

mle

ts?

ε′ob

j.fn

.(%

)ob

j.fn

.(%

)sa

vin

gs(s

)no

1Tru

eno

0.01

19.8

30.

0036

.63

no

5Tru

eno

0.01

31.7

80.

0010

.98

no

10Tru

eno

0.01

36.4

30.

0019

.16

no

100

Tru

eno

0.01

56.5

90.

0139

.28

no

500

Tru

eno

0.01

89.4

60.

0956

.88

no

1B

FG

Sno

0.01

19.8

30.

009.

27no

5B

FG

Sno

0.01

31.7

80.

0012

.53

no

10B

FG

Sno

0.01

36.4

30.

0019

.30

no

100

BFG

Sno

0.01

56.5

90.

0327

.79

no

500

BFG

Sno

0.01

89.4

60.

1330

.30

no

1Sin

gle

Appro

x.

no

0.01

19.8

20.

003.

40no

5Sin

gle

Appro

x.

no

0.01

31.7

70.

013.

95no

10Sin

gle

Appro

x.

no

0.01

36.4

20.

014.

28no

100

Sin

gle

Appro

x.

no

0.01

56.5

60.

089.

28no

500

Sin

gle

Appro

x.

no

0.01

89.4

40.

2710

.04

44

2.7 Conclusions

The primal-dual interior point method is an effective algorithm for obtaining fluence

maps that deliver quality treatment plans. The proposed Hessian alternatives appear

to converge to the optimal solution, even when insignificant beamlets are removed. The

removal of the insignificant beamlets provides significant time savings in all instances. The

interior point method may also be run with a duality gap as large as 20 and still achieve

quality treatment plans, thus decreasing the amount of time required to run the algorithm.

Of the implementations tested, the fastest method that still provides quality solutions

without using a warm start is to use the Single Approximation Hessian alternative, remove

insignificant beamlets and employ a relative duality gap of 0.1%.

When the interior point method is started with one of the warm starts discussed, time

savings were again significant. Although the interior point warm starts generally provided

more improvement in computation time than the project gradient warm starts, the final

solutions using the projected gradient warm starts were much closer to optimality. The

fastest and most effective warm start method is to use the projected gradient algorithm

with δ = 500, followed by the interior point method with ε = 0.1% and the Single

Approximation Hessian. This combination results in a near-optimal solution with an

average total computation time of 8.32 seconds.

45

CHAPTER 3BEAM ORIENTATION OPTIMIZATION

3.1 Introduction

In a typical head-and-neck treatment plan, radiation beams are delivered from 5-9

nominally-spaced coplanar orientations around the patient. These coplanar orientations

are obtained from rotating the gantry only. As shown in Figure 3-1, several components

of a linear accelerator can rotate and translate to achieve more orientations than those

obtained from rotating the gantry. The available orientations consist of the orientations

obtained from rotation of the gantry, collimator and couch, as well as the three translation

directions of the couch.

Figure 3-1. A linear accelerator and the available movements; the gantry rotation ishighlighted.

BOO is the problem of selecting from the available beam orientations the best set

to use in delivering a treatment plan. Given a fixed set of beams, different fluence maps

(radiation intensities of beamlets) yield treatment plans with different qualities. Thus, the

quality of an optimized fluence map should be considered when selecting a set of beam

orientations to use in a treatment plan.

46

3.2 Literature Review

Many approaches have been taken to solve the BOO problem. Evolutionary

algorithms (Schreibmann [29]) and variants of evolutionary algorithms, particularly

genetic algorithms (Ezzell [32], Haas et al. [28], Li et al. [33]) have been employed. Li

et al. [34] use a particle swarm optimization method, which is conceptually based on

evolutionary algorithms. Bortfeld and Schlegel [35], Djajaputra et al. [36], Lu et al. [37],

Pugachev and Xing [38], Rowbottom et al. [39] and Stein et al. [40] have all employed

variations of simulated annealing to determine a beam solution. Soderstrom and Brahme

[41] selected coplanar beam orientations using two measures, entropy and the integral

of the low frequency part of the Fourier transform of the optimal beam profiles, both of

which are based on the size and shape of the target structure. Soderstrom and Brahme

[42] also use an iterative technique to determine the optimal number of coplanar beams

required using BOO. Das and Marks [27] use a quasi-Newton method. Rowbottom et al.

[43] use artificial neural network algorithms to select beam orietations. Gokhale et al.

[44] use a measure of each beam’s “path of least resistance” from the patient surface to

the target location to determine the best beam directions. Meedt et al. [45] use a fast

exhaustive search to obtain a non-coplanar solution. The concept of beam’s-eye view

(BEV) has also been commonly used to approach the BOO problem (Chen et al. [46], Cho

et al. [47], Goitein et al. [48], Lu et al. [37], Pugachev and Xing [38, 49, 50]).

Despite the varying techniques to quantify the quality of a beam solution, it is widely

accepted that the optimal solution to the FMO problem presents the most relevant

measure (Bortfeld and Schlegel [35], Djajaputra et al. [36], Holder and Salter [51], Lee

et al. [20, 21], Li et al. [33, 34], Meedt et al. [45], Morrill et al. [52], Oldham et al. [53],

Rowbottom et al. [39, 43, 54], Schreibmann et al. [29], Soderstrom and Brahme [41],

Stein et al. [40], Wang et al. [55, 56], Woudstra and Heijman [57]). Given this accepted

measure of treatment quality, the shortcoming of the previous works is twofold. First,

they predominantly only consider coplanar angles, and not necessarily even the entire

47

coplanar solution space, while those that do consider non-coplanar beams only consider

a hand-selected subset of the available orientations. Second, the majority of the previous

studies do not select beam solutions using the FMO problem as a model for determining

quality; instead, the beam solutions are chosen based on scoring methods (e.g., BEV, path

of least resistance) or approximations to the FMO. By not optimizing the beam solution

with respect to the exact FMO problem, the BOO methods cannot guarantee convergence

to an optimal solution.

Of the previously cited works, only Das and Marks [27], Gokhale et al. [44], Meedt et

al. [45], Lu et al. [37], Rowbottom et al. [39] and Wang et al. [56] consider non-coplanar

orientations. This is likely due to the computational difficulties associated with the

inclusion of non-coplanar orientations as well as the widespread belief that non-coplanar

orientations do not improve the quality of a treatment plan.

Also, of those works that addressed non-coplanar beams, Das and Marks [27] require

that the beam distances be maximized, essentially requiring that beam solutions must

be equi-distant and thus restricting the size of the solution space; Meedt et al. [45] only

consider 3,500 beams (a minute subset of orientations available by rotation of the couch

and the gantry); and Wang et al. [56] use only nine pre-selected non-coplanar beams.

With the exception of Das and Marks [27], Haas et al. [28] and Schreibmann [29],

the previous studies have based their BOO approaches not on a beam solution’s optimal

solution to the FMO problem, but on locally optimal FMO solutions or on various scoring

techniques. Without basing BOO on the optimal FMO solutions, the resulting beam

solutions have no guarantee of optimality, or even of local optimality.


The goal of radiation therapy treatment planning is to design a treatment plan that

delivers a prescribed level of radiation dose to the targets while simultaneously sparing

critical structures by ensuring that the level of radiation dose received by these structures

is less than a structure-specific radiation dose. These two goals are contradictory if the

48

targets are located near critical structures. This is especially problematic for certain

cancers, such as tumors in the head-and-neck area, which are often located very close

to, for instance, the spinal cord, brain stem and salivary glands. In order to model the

BOO problem, a quantitative measure that appropriately makes trade-offs between

these contradictory goals must be developed. Let F (θ) be a black-box function that

quantifies the quality of the treatment plan if radiation is delivered from beam vector

θ = (θ1, . . . , θk), where k is the user-specified number of orientations that may be used. F

is formulated in such a way that the optimal plan yields the minimum function value.

For k beams orientations to be optimized in the treatment plan, the vector of decision

variables representing the beam orientations is defined as θ = (θ1, . . . , θk)T . The decision

vector θ is used as input into the black-box function F (θ) to determine the ability of the

beam vector to deliver the prescribed treatment without unduly damaging normal tissue

and critical structures. The BOO problem is then formulated as

min F (θ)

subject to θh ∈ B h = 1, . . . , k,

where B is the set of candidate beams. The candidate set of beams can be selected

according to any user-specified criteria; for example, the beams can be coplanar or

non-coplanar, continuous or discrete, or only represent a subset of the available beams.

It is also possible to fix some beams and only optimize a subset of the total number of

beams to be used. Theoretically, the linear accelerator is able to capture a continuous set

of orientations, but due to machine tolerances, the actual beams delivered may not be

exactly the desired beams. Therefore, it is common to only consider a discretized set of

beam orientations.

In our BOO model, the black-box function F (θ) is the convex FMO problem

described in Section 2.3, thus ensuring an exact measure of the quality of each beam

vector. Even though F (θ) is convex, this formulation of the BOO problem is fundamentally

49

nonlinear because the physics of dose deposition change with each beam orientation; that

is, the effect of a beam on each patient can be drastically different than the effect of a

neighboring beam. To illustrate the nonlinearity of the problem, Figure 3-2 shows the

FMO problem as a function of just two coplanar beam angles. From this illustration, it is

evident that the FMO function, particularly in higher, more realistic dimensions, is likely

to also be multi-modal.

Although the FMO problem itself can be solved quickly using the convex model

presented in Section 2.3, in order to perform the FMO, lengthy calculations must be made

in order to determine each candidate beam’s effect on the patient. These calculations,

described in Section 3.5, require ≈ 13 minutes per beam to calculate, and thus make each

evaluation of the FMO problem expensive. Despite the time required for each function

evaluation, the limiting factor in beam orientation optimization is the hard drive space

required to store the beam data for each candidate beam. If the candidate set of beams is

small, this data can be pre-computed and stored, allowing the FMO problem to be solved

quickly in the BOO problem. But, if the candidate set of beams is large—for example,

consisting of non-coplanar orientations—then the data cannot be pre-computed due to

storage requirements.

Because of these difficulties with the BOO problem, previous studies have been largely

unable to consider the entire solution space of available beams. By using the response

method, which is specifically designed to model expensive nonlinear black-box functions,

we can iteratively identify promising beam vector solutions and generate beam data for

these solutions on-the-fly, thus circumventing the issue of storage space and allowing for

the consideration of all deliverable beam orientations.

3.4 Mixed-Integer Model Formulation

As an alternative to the BOO model given in Section 3.3, if the set of beam

orientations B is finite, the BOO and FMO problems can be formulated together and

solved simultaneously as a mixed-integer linear or nonlinear program (D’Souza et al. [58],

50

060

120180

240300

350

060

120180

240300

3500

1000

2000

3000

4000

Angle 1Angle 2

FM

O v

alue

Figure 3-2. FMO value as a function of two angles.

Ehrgott and Johnston [59], Ferris et al. [17], Lee et al. [20, 21], Lim et al. [60], Shepard

et al. [22], Wang et al. [61]). The FMO formulation can be combined with BOO in the

following model. Let yθ be a binary variable indicating whether or not beam θ ∈ B is used.

If beam θ is used in the treatment plan, then all the beamlets in θ, Bθ, are “turned on”;

that is, they can have positive fluences up to some pre-determined maximum intensity M .

The simultaneous BOO+FMO MIP model is then

minimize F (z)

subject to zjs =k∑

h=1

∑i∈Bθk

Dijsxi j = 1, . . . , vs, s = 1, . . . , S

xi ≤Myθ i ∈ Bθ, θ ∈ B∑θ∈B

yθ ≤ k

xi ≥ 0 i ∈ Bθ, θ ∈ B

yθ ∈ {0, 1} θ ∈ B

51

In order to solve such a problem, all beam data must be pre-computed for every beam

orientation. As described in Section 3.5, beam data requires a tremendous amount of

time and space to compute and store. Because of this requirement, only a small subset of

all possible beam orientations can be considered due to time and space constraints for a

BOO+FMO MIP formulation.

3.5 Beam Data Generation

For each beam orientation that is considered, lengthy calculations must be made to

determine the beam’s effect on the patient’s tissue and organs. This includes determining

in which structure each voxel lies, which voxels are hit by which beamlets and the amount

of intensity of each beamlet is deposited in each voxel through which it passes.

Beamlet dose computation models used in IMRT rely heavily on ray-tracing

algorithms for voxel classification and determination of the radiological path (Fox et

al. [62]). Voxel classification (Siddon [63]) establishes whether voxels are inside or outside

the path of a radiation beam and classifies voxel centers as inside or outside of segmented

targets and critical structures. The radiological path is the effective distance traveled by

a beamlet when the effect of traveling through tissues of different densities is considered.

The exact radiological path of a beamlet through the patient is required to correct for

tissue heterogeneities in determining the dose deposition coefficients (Siddon [64]).

Siddon’s ray-tracing algorithms (Siddon [63, 64]) have been the standard methods

used for ray-tracing in radiotherapy since the 1980s. In Siddon’s polygon and voxel

ray-tracing algorithms for voxel classification (point-in-polygon testing), structures

are represented as 3D polygonal objects, known as Siddon Prisms, and the signs of

cross-products of rays passing through the polygons are used to determine whether a voxel

lies inside or outside a structure. Despite its overwhelming use, Siddon’s algorithm for

polygon ray-tracing becomes very costly due to the number of voxels in a patient. Fox et

al. [62] developed a novel approach to polygon ray-tracing that circumvents the need for

cross-products by translating the polygon structure onto a coordinate system, replacing

52

the need for a cross-product by the sign of the second coordinate of each voxel in the

coordinate system.

In Siddon’s algorithm for determining radiological paths (Siddon [64]), the radiological

path must be determined for each voxel for every beamlet. This involves computations

for millions of beamlet-voxel combinations. As reported by Jacobs et al. [65] a significant

amount of computational time is required for these repeated calculations. Fox et al. [62]

combine the incremental voxel ray-tracing algorithm presented by Jacobs et al. [65] with a

method of virtual stereographic projection to significantly reduce the computational cost

of obtaining radiological path lengths.

Using their polygon translation and incremental ray-tracing algorithms, Fox et al. [62]

achieve a 100-300 fold improvement in computation time over Siddon’s point-in-polygon

algorithm. Because of the significant reduction in computation time, these methods are

used to generate beam data.

Because these beam data calculations must be performed for each of millions of

beamlet-voxel combinations, beam data generation is a lengthy process, requiring ≈ 13

minutes per beam using the algorithms described by Fox et al. [62]. In a typical FMO

formulation, the beam vector is pre-determined and the beam data for the beam vector

is calculated once and stored a priori. For a 5-beam case, this requires ≈150 MB of space

to store. As with a typical FMO problem, in a simultaneous FMO+BOO mixed-integer

programming (MIP) formulation, beam data for each of the candidate beams in B must

be generated a priori. If candidate beams are considered only for coplanar angles on a 10◦

grid, that is, only every 10th angle, beam data would have to be computed for 36 beams,

which requires ≈5 hours to compute and ≈800 MB of space to store. If we also wanted to

consider the possibility of rotating the couch on a 10◦ grid in addition to the gantry, beam

data would then have to be computed for 362 beams, which would require ≈170 hours and

≈ 60 GB of space for just one plan.

53

Clearly, the storage space requirements for each beam restricts the number of beams

that can be considered in a simultaneous FMO+BOO MIP formulation. This issue is

typically addressed by simply restricting the number of candidate beams in B. Lee et al.

[20] restrict the set B to only contain 18 pre-selected beam orientations, which can be

coplanar or non-coplanar. If only gantry and couch rotations are allowed on a 10◦ grid, a

beam set of 18 beams comprises only a small percent of the available beam orientations.

As more ranges of motion are allowed, this percentage falls even further. The inclusion

of all possible beam orientations significantly increases the size of the solution space and

could possibly allow for improved treatment plans, but the beam data for all orientations

cannot be pre-computed. In order to consider these orientations, we use a method that

allows us to generate the beam data on-the-fly only as necessary.

3.6 A Response Surface Approach to BOO

The shortcoming of the previous works on BOO is twofold. First, they predominantly

only consider coplanar angles, and not necessarily even the entire coplanar solution space,

while those that do consider non-coplanar beams only consider a hand-selected subset

of the available orientations. Second, the majority of the previous studies do not select

beam solutions using the FMO problem as a model for determining quality; instead, the

beam solutions are chosen based on scoring methods (e.g., BEV, path of least resistance)

or approximations to the FMO. By not optimizing the beam solution with respect to

the exact FMO problem, the BOO methods cannot guarantee convergence to an optimal

solution.

Of the previously cited works, only Das and Marks [27], Gokhale et al. [44], Meedt et

al. [45], Lu et al. [37], Rowbottom et al. [39] and Wang et al. [56] consider non-coplanar

orientations. Of these works, Das and Marks [27] require that the beam distances be

maximized, essentially requiring that beam solutions must be equi-distant and thus

restricting the size of the solution space; Meedt et al. [45] only consider 3,500 beams (a

54

minute subset of orientations available by rotation of the couch and the gantry); and

Wang et al. [56] use only nine pre-selected non-coplanar beams.

With the exception of Das and Marks [27], Haas et al. [28] and Schreibmann [29],

the previous studies have based their BOO approaches not on a beam solution’s optimal

solution to the FMO problem, but on locally optimal FMO solutions or on various scoring

techniques. Without basing BOO on the optimal FMO solutions, the resulting beam

solutions have no guarantee of optimality, or even of local optimality.

Because beam data generation is costly, a method that iteratively identifies only

promising beam orientations is required. The response surface (RS) method is such an

algorithm. In contrast to the previous studies, our approach to the BOO problem allows

for the inclusion of all possible beam orientations which are measured according to the

exact FMO problem, thus ensuring convergence to optimality due to the properties of the

response surface method.

The RS method is designed to efficiently model expensive black-box functions. In this

application, the FMO solver is our black box and the set of beams to be used is the input.

As in Aleman et al. [66, 67], we employ the response surface method as detailed in Jones

[68] and Jones et al. [69].

3.6.1 Overview of Response Surfaces

The response surface method identifies promising solutions based on the performance

of previous solutions. The function value and expected improvement over the current

best solution of a certain point is estimated based on the function behavior learned from

previously sampled points and their calculated objective function values. The function

values of points are related by correlation functions that depend on each point’s distance

from the previously sampled points. From the correlation functions, the algorithm predicts

the probability that the best solution will improve at unexplored points in the solution

space. Using this probability, a promising solution is identified. For the BOO problem,

55

beam data only needs to be generated for these promising solutions, thus saving both

computation time and storage space.

The response surface method models the objective function as a stochastic process of

the form

F (θ) = µ + ε(θ), (3–1)

where µ is a constant representing an average of the function F and ε(θ) is a random error

term associated with the point θ. In the general case, the error terms between two points,

say θ(1) and θ(2), are correlated by

Corr(ε(θ(1))

, ε(θ(2)))

= exp[−d(θ(1), θ(2)

)], (3–2)

where d(θ(1), θ(2)) is a weighted distance measure between θ(1) and θ(2). Intuitively, if two

points are very close together, the correlation between them will be close to one; similarly,

if two points are very far apart, the correlation between them will approach zero. Jones et

al. [69] propose the following weighted distance measure in general:

d(θ(1), θ(2)

)=

k∑h=1

ch

∣∣∣θ(1)h − θ

(2)h

∣∣∣ph

,

where the parameters ch and ph are weighting factors corresponding to the importance

of each variable h and the smoothness of the function F in the direction of variable h,

respectively. If small changes in variable h cause large changes in the function F , then ch

should be large to reflect that two points with relatively small differences in the value of

variable h should be “far” apart due to the large difference in their function values, and

thus have a low correlation. The parameter ch can take on any value, whereas 1 ≤ ph ≤ 2,

with ph = 2 corresponding to objective function smoothness and ph = 1 corresponding to

less objective function smoothness.

In the application to BOO, θ = (θ1, . . . , θk) is the vector of k angles from which

radiation will be delivered. Because no beam is more important than another beam, each

beam orientation h contributes equally to the FMO function, so ch = c and ph = p for

56

all h = 1, . . . , k. To maintain tractability of the subproblems described in the following

sections, the angles are treated as though they are points on a line rather than points on

a circle and so a Euclidean distance metric is used to determine the distance between two

points. The weighted distance measure for BOO is then

d(θ(1), θ(2)

)= c

∥∥∥θ(1) − θ(2)∥∥∥p

p, (3–3)

where ‖ · ‖p denotes the `p-norm. To ensure tractability of the subproblems described in

Section 3.6.2, the value p = 2 is used.

The idea of the RS method is to iteratively evaluate the true function F at certain

beam vectors θ, and then construct the conditional stochastic process given these function

values. This conditional stochastic process is then used to decide where to evaluate the

function F next. Due to the time and space required to generate the beam data necessary

to evaluate the function F , it is desirable to only evaluate points that will either improve

the best solution with a significant probability or significantly increase our knowledge of

the function. The optimization models to determine the next observation are described in

Section 3.6.2.

Let θ(1), . . . ,θ(n) be n previously sampled points. Rn is the matrix of correlations

between the previously sampled points, yn is the vector of function values F (θ(i)) of the

previously sampled points and µn and σn be estimators of the average and variance of the

function F , respectively. The response surface algorithm is given by:

• Initialization:

1. Choose values for the parameters c and p.

2. Choose an initial sample size, n, and a set of angles θ(i), i = 1, . . . , n. Evaluatethe function F at each of these points, yielding the values yi, i = 1, . . . , n.

• Iteration:

1. Compute or update the values of Rn, R−1n , µn, σn, and F n, the minimum

observed objective function value.

57

2. Determine the next point to observe using one of the methods described inSection 3.6.2 and call this point θ(n+1).

3. Find the value yn+1 = F (θ(n+1)), set n← n + 1, and repeat.

3.6.2 Determining the Next Observation

Because the function F is expensive to evaluate, we want to sample as few points

as possible. Thus, in each iteration, an optimization problem is solved that determines

the “best” next point at which to observe the true function F . Some of the optimization

problems that have been proposed in the literature depend on the uncertainty of the

predictor as a function of θ, as well as the expected improvement over the current best

solution (Jones [68], Jones et al. [69]).

Let rn(θ) be the vector correlations between θ and the n previously sampled points.

The uncertainty is then given by

s2n(θ) = σ2

n

[1− rn(θ)>R−1

n rn(θ) +

[1− 1>R−1

n rn(θ)]2

1>R−1n 1

],

where

σ2n =

1

n(yn − 1µn)>R−1

n (yn − 1µn)

is the estimator of the variance σ2n based on the n observations. The expected improvement,

denoted In(θ), is given by

In(θ) = sn(θ) [zΦ (z) + φ (z)] (3–4)

where

z =

(F n − Fn(θ)

sn(θ)

)(3–5)

and F n = min{y1, . . . , yn} is the current best solution and Fn(θ) is the estimated function

value of θ given the n previously sampled points. Φ and φ are the c.d.f. and p.d.f. of a

standard normal random variable, respectively.

58

The selection of the next point will be based on selecting the point that maximizes

either the uncertainty or the expected improvement, or a combination of both. Denote the

beam vector to be chosen as the vector θ.

3.6.2.1 Maximizing the expected improvement

Jones [68] and Jones et al. [69] recommend selecting the next point to sample as the

point θ for which the expected improvement over the current best solution value, In(θ), is

largest. This corresponds to solving the following optimization problem:

max In(θ)

subject to θh ∈ B h = 1, . . . , k

Although this is a difficult optimization problem, it can be solved using a branch-and-bound

technique, but in order to do so, an upper bound on In(θ) must be obtained. This can

be done by solving for the expected improvement in equation (3–4) while substituting

an upper bound on the uncertainty and a lower bound on Fn(θ), used in equation (3–5)

to determine the value z. The method of bounding Fn(θ) is taken directly from Jones

[68] and Jones et al. [69] and is not discussed further here. The method of bounding

s2n(θ) is improved from the original formulation in Jones et al. [69] to overcome numerical

instabilities, and is presented in Section 3.6.2.2. The branch-and-bound algorithm used to

maximize In(θ) is described in Section 3.6.2.3.

3.6.2.2 Obtaining an upper bound on the uncertainty

Due to the complexity of the s2n(θ) function, maximizing the uncertainty is a difficult

problem to solve. It can be relaxed into a linearly constrained quadratic programming

problem as follows (Jones et al. [69]). The resulting solution to the relaxed uncertainty

maximization problem is an upper bound on the uncertainty that can be used in

determining an upper bound on In(θ) as described in Section 3.6.2.1.

Let r = {r1, . . . , rn}, where r is a vector of decision variables independent of θ. By

treating both r and θ as decision variables, a quadratic objective function is obtained.

59

Because r is now a decision variable independent of θ, an equality constraint must be

added to the problem to ensure that r assumes the correct correlation values according

to the correlation definition in equation (3–2). This constraint is nonlinear, but it can be

relaxed by expressing the single equality as two inequalities (≤ and ≥) and then replacing

the nonlinear terms generated by ln(ri) and c‖θ − θ(i)‖22 with linear underestimators

ai + biri and pi,h + qi,hθh, respectively. The different types of linear estimators require

different values for ai, bi, pi,h and qi,h, and are differentiated by a superscript c for the

chord underestimators and a superscript t for the tangent line underestimators in the

model formulation, denoted Problem s2-UB.

Unfortunately, this relaxation provided by Jones et al. [69] can become numerically

unstable if two sampled points are very close together. If such a situation arises, the

bounds of the corresponding correlation value can become so close that due to round-off

error, the lower bound rLi can become slightly larger than the upper bound rU

i , resulting

in infeasibility. To avoid such an instability, instead of bounding ri using constraints, the

amount by which ri is outside of its feasible range is penalized by adding penalization

terms wLi = min{0, ri − rL

i } and wUi = min{0, rU

i − ri}. This final formulation is given in

Problem s2-UB. This formulation has only two more variables and two more constraints

for each sampled point, so the increased problem size does not significantly increase the

amount of time required to solve the problem.

60

PROBLEM s2-UB: Choose r and θ to

min − σ2n

[1− r>R−1

n r +

[1− 1>R−1

n r]2

1>R−1n 1

]+

n∑i=1

(wL

i

)2+

n∑i=1

(wU

i

)2subject to (ac

i + bciri) + c

k∑h=1

(pt

i,h + qti,hθh

)≤ 0 i = 1, . . . , n

(at

i + btiri

)+ c

k∑h=1

(pc

i,h + qci,hθh

)≤ 0 i = 1, . . . , n

wLi ≤ 0 i = 1, . . . , n

wLi ≤ ri − rL

i i = 1, . . . , n

wUi ≤ 0 i = 1, . . . , n

wUi ≤ rU

i − ri i = 1, . . . , n

lh ≤ θh ≤ uh h = 1, . . . , k

Using the upper bound on the uncertainty provided by Problem s2-UB, the point

yielding the maximum uncertainty is obtained by using the same branch-and-bound

method described in 3.6.2.3, except that s2n(θ) is maximized rather than In(θ).

Alternatively, another approach would be to choose the next point based on

maximizing uncertainty rather than the expected improvement. The branch-and-bound

approach described in Section 3.6.2.3 can be adapted to solve that problem rather than

maximizing the expected improvement.

3.6.2.3 Branch-and-Bound

A branch-and-bound method is used to determine the maximum expected improvement

in each iteration. At some point in the algorithm, n points, θ(1), . . . ,θ(n), have already

been observed. The solution space is divided into regions based on these previously

sampled points and consider each region as a separate subproblem.

Each of these subproblems is solved using branch and bound. First, the upper bound

on the uncertainty is determined as described in Section 3.6.2.2 using the subregion’s

61

lower and upper bounds on θ. Next, the lower bound FL on Fn(θ) is determined using the

method in Jones [68] and Jones et al. [69].

The upper bound on s2n(θ) and lower bound on F are now used to determine an

upper bound on In(θ) over the current subregion by solving for In(θ) substituting

Fn(θ) = FL and sn(θ) = sU as described in Jones [68] and Jones et al. [69]. In addition,

the θ that yielded the maximum uncertainty can be used to evaluate the function In(θ),

yielding a lower bound on In(θ) over the interval lh ≤ θh ≤ uh, h = 1, . . . , k. This value is

used to update the current best lower bound found (i.e., if the current best lower bound is

less than the new lower bound found, the current best lower bound is replaced by the new

one; otherwise, the current best lower bound is unchanged).

If the upper bound is less than the current best lower bound, the subregion is

discarded as not interesting. If the lower and upper bound are very close, we say that

we have found the optimum over the current subregion. Otherwise, the upper bound is

significantly larger than the current lower bound, so the subregion is further divided into

subregions as described below and the procedure is repeated for each of the new regions.

This is the branching step.

At some point, there are no more subregions to consider, as we have either decided

they are not interesting or have found the optimal solution for that subregion. Then, the

algorithm terminates and the current best lower bound is the optimal solution for In(θ)

over the current region.

This branch-and-bound procedure is applied to each of the regions, and the overall

largest In(θ) value is then the maximum In(θ), and the corresponding θ is the next point

at which to evaluate the FMO function.

Selecting the subregions. An important component of the branch-and-bound

algorithm is the method of selecting the subregions. The definition of these subregions,

as well asl the order in which they are explored, can have significant impact on both the

amount of time and memory required to perform the algorithm. As our implementation

62

of the branch-and-bound method requires that the entire solution space be divided into

subregions before the branch-and-bound algorithm begins, the selection of these initial

regions may also affect the speed of the algorithm.

Initial regions. Before beginning the branch-and-bound process, the solution space

of the decision variables, θh ∈ [0, 360] for all h = 1, . . . , k, is divided into a set of initial

regions. If θ represents non-coplanar orientations, we consider two ways of selecting the

regions defined by the non-coplanar orientations. First, we consider the entire solution

space as the only region, that is, instead of dividing the solution space into several

subregions, we only consider one subregion that encompasses the entire solution space (see

Figure 3-4A).

Second, denote a subset of variable indices H ⊆ {1, . . . , k}. For each index h ∈ H,

order the n previously sampled points increasingly by h. For each previously sampled

point i = 1, . . . , n − 1, consider the regions defined by lh = 0 and uh = 360 for h /∈ H,

and lh = θ(i)

hand uh = θ

(i+1)

h. Also consider the region defined by lh = 0 and uh = 360

for h /∈ H, and lh = 0 and uh = θ(1)

h. Similarly, consider the region defined by lh = 0 and

uh = 360 for h /∈ H, and lh = θ(n)

hand uh = 360. Figures 3-4A-3-4D illustrate the initial

regions for different H values where k = 2. Denote the initial region set where H = ∅

as B0 (Figure 3-4A), H = {1} as B1 (Figure 3-4B), H = {2} as B2 (Figure 3-4C) and

H = {1, 2} as B2 (Figure 3-4D).

Note that in the coplanar case, it is only necessary to test the initial region scheme for

one angle because the angles are interchangeable.

Bounds for discrete and continuous variables. If θ is discrete, the points on the

boundary between between the two subregions will be contained in both subregions, thus

creating an inefficiency. This can be seen in Figure ??, where θ(1)b is the point at which

we branch and the blue line represents the division of the region into two subregions. The

boundary line is contained in both the top interval and the bottom interval. This overlap

can be avoided when θ is integral by adjusting the bounds between subregions in such a

63

0 60 120 180 240 300 3600

60

120

180

240

300

360

Cou

ch a

ngle

Gantry angle

Initial region scheme B0

0 60 120 180 240 300 3600

60

120

180

240

300

360

Cou

ch a

ngle

Gantry angle


A B

0 60 120 180 240 300 3600

60

120

180

240

300

360

Cou

ch a

ngle

Gantry angle


0 60 120 180 240 300 3600

60

120

180

240

300

360

Cou

ch a

ngle

Gantry angle


C D

Figure 3-3. Initial regions in the branch-and-bound algorithm. A) Initial regions withH = ∅ (B0). B) Initial regions with H = {1} (B1). C) Initial regions withH = {2} (B2). D) Initial regions with H = {1, 2} (B3).

64

way as to prevent overlapping between any subregions. If the lower bound lh on θh in a

subregion is fractional, then we discard the non-integral solutions by setting lh = dlhe.

Similarly, if the upper bound uh on θh in a subregion is fractional, then uh = buhc. If the

lh and uh bounds are integral and lh = uh, overlapping is avoided by setting lh = lh − 1

(see Figure ??). If θ is continuous, the bounds cannot be adjusted.

Branching scheme. The basic principle of the branch-and-bound method is to

decompose regions into smaller subregions in such a way that as many subregions as

possible can be discarded as uninteresting, leaving a reduced number of subregions that

must actually be searched. The branch-and-bound method is a well studied problem,

and as such, there are numerous methods of selecting the subregions. Regions may be

divided into two equal subregions (bisection), or more generally, into multiple subregions

which may or may not be equal in size (multisection) (Csallner et al. [70], Lagouanelle and

Soubry [71]). Some other common methods include selecting only a subset of variables on

which to branch (Epperly et al. [72]), using Langrangian duality to obtain lower bounds

(Barrientos and Correa [73], Thoai [74], Tuy [75]) and applying decomposition algorithms

(Phong et al. [76], Bomze [77], Cambini and Sodini [78]).

In our branching step, we form the subregions based on some point in the region. The

region is divided at this point along one of the indices. In Figure 3-4A, θ(1)b is the point

at which we branch. We branch by dividing the region horizontally into two subregions

at θ(1)b , taking into account the adjustments to the bounds described above so as to avoid

overlapping regions. For k = 2, in each branching step, we alternately divide the region

horizontally (along index 2) and vertically (along index 1) as shown in Figures 3-4B–3-4D.

After branching horizontally once at θ(1)b as shown in Figure 3-4B, we examine the top

region and select θ(2)b as the point at which we branch. We then branch by dividing this

subregion vertically at θ(2)b . We proceed in the same manner for θ

(3)b , where we branch

horizontally, and so on until the convergence criteria is met.

65

In the general case, we divide the region into two subregions along the branching

index while cycling through each of the indices h = 1, . . . , k sequentially. For the

branching index h, the bounds for one new subregion are lh = lh and uh = θb,h − 1,

and the bounds for the other new subregion are lh = θb,h and uh = uh. The lower

and upper bounds on the region for the remaining indices are unchanged for both new

subregions, i.e. lh = lh and uh = uh for h 6= h.

In the non-coplanar case, a beam in θ may be represented by more than one index.

For example, if a single non-coplanar beam consisting of couch and gantry rotation is

optimized, the vector θ consists of θ1 representing the gantry angle and θ2 representing

the couch angle. The branching index h ∈ {1, 2} represents branching on either the

gantry angle or on the couch angle. If two such non-coplanar beams are optimized,

then θ consists of θ1 and θ2 representing the gantry and couch angles of the first beam,

respectively, and θ3 and θ4 representing the gantry and couch angles of the second beam,

respectively. The branching index h ∈ {1, 2, 3, 4} then represents branching on a single

component of a single beam.

Accounting for symmetry. In the case where θ represents a set of coplanar beam

angles, the ordering of the variables in θ is irrelevent to the FMO value obtained at θ. For

example, if θ(1) = (10, 20, 30, 40) and θ(2) = (20, 30, 40, 10), then F (θ(1)) = F (θ(2)). Thus,

it is redundant to consider both θ(1) and θ(2), and elimination of these redundant regions

can greatly decrease the size of the solution space.

For example, if we consider the two-dimensional case (k = 2), the solution space is a

square region with 0 ≤ θ1 ≤ 360 and 0 ≤ θ2 ≤ 360. The points above the line θ1 ≤ θ2 are

equivalent to the points below the line, so we only need to consider one of these regions.

Say we branch by splitting the region into four equal quadrants, as shown in Figure 3-5A.

If we arbitrarily choose to only examine the points above the line θ1 ≤ θ2, then quadrant 4

can be eliminated.

66

Cou

ch a

ngle

Gantry angle

Branching scheme

l1 u

1

u2

l2

θb(1)

Cou

ch a

ngle

Gantry angle

Branching scheme

l1 u

1

u2

l2

θb(1)

A B

Cou

ch a

ngle

Gantry angle

Branching scheme

l1 u

1

u2

l2

θb(2)

θb(1)

Cou

ch a

ngle

Gantry angle

Branching scheme

l1 u

1

u2

l2

θb(2)

θb(1)

θb(3)

C D

Figure 3-4. Partitioning a region into subregions. A) Partitioning a region into subregionswithout accounting for overlap. B) Preventing overlapping regions. C) Regionsafter two branches. D) Regions after three branches.

67

A B

Figure 3-5. Accounting for symmetry. A) Accounting for symmetry in 2D. B) Accountingfor symmetry in 3D.

In three dimensions, the solution space is a cube. If we branch by splitting the cube

into eight equal cubes, the region to be examined is shown in Figure 3-5B, where the

origin is the back bottom left corner of the cube. From this figure, we can see that a

sizable portion of the solution space can be discarded.

In regions where there are both viable and redundant solutions (for example,

quadrants 2 and 3 in Figure 3-5A), the addition of constraints requiring that θ1 ≤ . . . ≤ θk

in the problem of maximizing the expected improvement ensure that only the unique

portion of the region is considered.

If more than one non-coplanar orientation is optimized, a similar symmetry to the

multiple coplanar orientation symmetry exists. Consider an implementation where two

non-coplanar beam orientations are optimized, and these orientations are obtained from

rotating both the gantry and the couch. Each beam is represented by two variables

in the solution vector: one variable indicating the degree of gantry rotation, and one

variable indicating the degree of couch rotation. Let θ1 and θ2 be the gantry rotation and

couch rotation of the first beam, respectively, and θ3 and θ4 be the gantry rotation and

couch rotation of the second beam, respectively. Then, the solution vector {θ1, θ2, θ3, θ4}

68

is identical to the solution vector {θ3, θ4, θ1, θ2}. Because the couch angle selected is

dependent on the gantry angle (and vice versa), this symmetry can be exploited by only

removing redundant solutions from one of the beam variables, that is, by requiring that

θ1 ≤ θ3 (removing redundancy from the gantry angles) or θ2 ≤ θ4 (removing redundancy

from the couch angles). In general, if d degrees of motion are used to obtain m beam

orientations, and the linear accelerator motion variables are in the same order for each

beam, then θk ≤ θk+d ≤ θk+2d ≤ . . . ≤ θk+(m−1)d for some k ∈ {1, . . . , d}.

3.6.3 Method of Obtaining the Next Observation

The RS algorithm allows for two methods of selecting the next point to observe:

by maximizing the expected improvement or by maximizing the uncertainty. In these

tests, the point to observe is obtained by first selecting the point that maximizes the

expected improvement until the maximum expected improvement falls below a certain

threshold, and then switching to the point that maximizes the uncertainty. Once the

maximum uncertainty also falls below a certain threshold, the algorithm terminates. By

first selecting according to the expected improvement, the method quickly obtains a good

solution. By then selecting according to uncertainty, theoretical convergence to the global

minimum is ensured.

3.7 Neighborhood Search

3.7.1 Introduction

From Aleman et al. [79], we test the simulated annealing algorithm on the BOO

problem, as well as existing and new variants of a greedy neighborhood search heuristic

called the Add/Drop algorithm (see Kumar [80]) to obtain a good solution to the BOO

problem. In each step of the Add/Drop algorithm, a beam in the current beam set is

replaced by a neighboring beam that yields an improving solution. As with the simulated

annealing implementation, we also apply our new neighborhood to the Add/Drop

algorithm and compare its performance to a commonly used neighborhood structure.

69

3.7.2 Neighborhood Search Approaches

Neighborhood search approaches are common methods of obtaining solutions to

global optimization problems. For a vector of decision variables, a neighbor is obtained by

perturbing one or more of the decision variables. A neighborhood for a particular vector

of decision variables is the set of all its neighbors for a given method of perturbating the

decision variable vector. A solution is considered to be locally optimal if there are no

improving solutions in its neighborhood.

Both deterministic and stochastic neighborhood search algorithms have been applied

to a wide variety of optimization problems. A deterministic neighborhood search algorithm

is one in which the entire neighborhood, or a pre-defined subset of the neighborhood,

is enumerated in each iteration to find an improving solution. Stochastic versions of

neighborhood search approaches, for example, simulated annealing, randomly select

neighboring solutions in an attempt to find an improving solution in each iteration.

For the BOO problem, we consider two neighborhood search methods. The first is

a deterministic neighborhood search algorithm that finds a locally optimal solution, and

the second is the simulated annealing algorithm, which, although based on neighborhood

searches, provably converges to the globally optimal solution for certain neighborhood

structures.

3.7.3 A Deterministic Neighborhood Search Method for BOO

Deterministic neighborhood search methods are optimization algorithms that

start from a given solution and then iteratively select the best point in the current

neighborhood as the next iterate. The best point in the neighborhood can be found by

complete enumeration if the neighborhood is small, or by optimization is the neighborhood

is large or if objective function evaluations are expensive. Due to the complexity of

the BOO problem, even when only a subset of available orientations is considered, we

will focus on smaller neighborhoods and use enumeration. The neighborhood could

alternatively be searched heuristically, for example by searching the neighborhood until

70

the first improving solution is found, rather than the best improving solution. If no

improved solution can be found the current solution is a local optimum.

In our implementation of the Add/Drop algorithm, a small neighborhood is desired

for enumeration purposes. In each iteration, a neighborhood for just a single beam is

considered. Say a beam set consisting of k beams is desired. Letting the neighborhood of a

single beam θh in θ be denoted as Nh(θ), the Add/Drop algorithm is as follows:

• Initialization:

1. Choose an initial starting solution θ(0).

2. Set θ∗ = θ(0) and i = 0.

• Iteration:

1. Select h ∈ {1, . . . , k}, then generate θ ∈ Nh(θ(i)).

2. If F (θ) < F (θ∗), set θ∗ = θ(i+1) = θ and set i← i + 1.

3. If all points in ∪kh=1Nh(θ

(i)) have been sampled without improvement, stop withθ∗ as a local minimum. Otherwise, repeat Step 1.

3.7.3.1 Neighborhood Definition

In each step of the Add/Drop algorithm, a beam in the current solution is replaced

with an improving beam in its neighborhood. Rather than define a neighbor as related

to an entire beam vector, the neighborhoods of individual beams are considered. The

neighborhood of a single beam θh in θ is defined as

Nh(θ) ={

(θ1, . . . , θh−1, θ mod 360, θh+1, . . . , θk)

∈ Bk : θh − δ ≤ θ ≤ θh + δ}

.

In other words, the neighborhood of a beam is all beams within ± δ degrees taking into

account the cyclic nature of the angles. The cyclicality of the angles refers to the fact

that all angles can be represented by degrees in [0,360]. For example, 400◦ = 40◦ and

−100◦ = 260◦. The expression θ mod 360 captures this cyclicality.

71

3.7.3.2 Neighbor Selection

The process of selecting a neighboring point in each iteration consists of two steps:

selecting the index h to change and then selecting an improving angle in Nh(θ) to replace

θh. If h is selected as i mod k + 1, the algorithm will cycle through each index sequentially,

similar to a Gibbs Sampler (see, for example, Geman and Geman [81] and Gelfand and

Smith [82]). The Gibbs Sampler also uses a similar two-step approach to generating a

new point by sequentially generating a new value for each variable in turn. If h is selected

randomly in each iteration, the resulting algorithm is similar to a Hit-and-Run method

(see, for example, Smith [83] and Belisle [84]), in which a variable to be changed is selected

randomly, and then a new value for that variable is also selected randomly within a

neighborhood.

Once h is selected, the new value for θh can be generated by enumeration or by a

heuristic method. The Add/Drop algorithm compares the quality of the new solution to

the current solution, and then only accepts improving solutions. This greedy approach

results in a locally optimal solution.

3.7.3.3 Implementation

The index of beam angle to be changed in each iteration, h in Step 1 of the algorithm

in Section 3.7.3, is chosen as h = i mod k + 1 to cycle through each index in a sequential

manner. In the Add/Drop implementation, once h is determined, θ in iteration i is

chosen as θ = arg minθ∈Nh(θ(i)){F (θ)}. By replacing each beam by the most improving

neighbor, the Add/Drop algorithm is a greedy heuristic which terminates when there is no

improving neighbor for any beam.

A multi-start aspect is added by repeating the algorithm with multiple initial starting

points. For example, one strategy to select starting points would be to select a random

starting point according to a particular distribution. Another strategy would be to select

an equi-spaced solution and rotate it a fixed number of times to obtain new starting points

until the initial equi-spaced solution is repeated. Equi-spaced beam solutions are common

72

in clinical practice for an odd number of beams. The reason that such a method is not

generally used in practice for even-numbered beams is that the resulting beam set would

contain parallel-opposed beams (beams that lie on the same line), which are not used by

convention as it is believed that the effect of a parallel-opposed beam is very similar to

simply doubling the radiation delivered from a beam. If an equi-spaced solution is not

possible given a beam set of k beams and the discretization level of the candidate beam

set B, then the solution can be rounded so that θ(0)h ∈ B, h = 1, . . . , k.

3.7.4 Simulated Annealing

The simulated annealing algorithm used is similar to the classical simulated annealing

approach proposed in Kirkpatrick et al. [85]. The simulated annealing algorithm is based

on the Metropolis algorithm, wherein a neighboring solution to the current iterate is

generated, and if it is an improving point, it becomes the current iterate. Otherwise, it

becomes the current iterate with probability exp{∆F/T}, where ∆F is the difference

in FMO value between the current iterate and the newly generated point and T is the

temperature, a measure of the randomness of the algorithm. If T = 0, then only improving

points are selected. If T is very large, then any move is accepted, which is essentially a

random search.

The simulated annealing algorithm starts with an initial temperature T0 and performs

a number of iterations of the Metropolis algorithm using T = T0. Then, the temperature is

decreased according to some cooling schedule such that {Ti} → 0.

Obvious parallels can be drawn between the simulated annealing algorithm and the

Add/Drop neighborhood search method described in Section 3.7.3. While the Add/Drop

algorithm deterministically searches the neighborhood for improving solutions, the

simulated annealing algorithm randomly selects neighboring solutions. Rather than

being limited by the ability to only move to improving solutions, the simulated annealing

algorithm may still move to a non-improving solution with a certain probability, thus

73

allowing for the escape from local minima. The Add/Drop algorithm, on the other hand,

is a greedy algorithm that is specifically designed to find local minima.

The simulated annealing algorithm is essentially a randomization of the Add/Drop

algorithm. In addition to the added randomness, the possibility of changing more than

one beam in each iteration is allowed by selecting a set of indices H ⊆ {1, . . . , k} to

change, rather than just selecting a single index h. The simulated annealing algorithm is

as follows:

• Initialization:

1. Choose an initial beam set θ(0) and calculate its FMO objective function valueF0.

2. Set θ = θ(0), F = F0, i = 0.

• Iteration:

1. Select H ⊆ {1, . . . , k}, generate θ ∈ ∪h∈HNh(θ(i)), and calculate its FMO

objective function value F .

2. If F < F , set F = F , Fi+1 = F , θ(i+1) = θ and θ = θ. Otherwise, set Fi+1 = Fand θ(i+1) = θ with probability exp{(Fi − F )/Ti}.

3. Set i← i + 1 and repeat Step 1.

The simulated annealing algorithm has been previously applied to the BOO problem.

Bortfeld and Schlegel [35] use the “fast” simulated annealing algorithm described by Szu

and Hartley [86] which employs a Cauchy distribution in generating neighboring points.

Stein et al. [40], Rowbottom [39] and Djajaputra et al. [36] also use a Cauchy distribution

in generating neighoring solutions. Lu et al. [37] randomly select new points satisfying

BEV and conventional wisdom criteria and Pugachev and Xing [38] randomly generate

new points and then vary them according to an exponential distribution. All accept

improving solutions, and with the exception of Rowbottom et al. [39] who only accept

improving solutions (essentially Ti = 0 for all i), all accept non-improving solutions with a

74

Boltzmann probability. None of the previous BOO studies employing simulated annealing

use the exact FMO as a measure of the quality of a beam set.

3.7.4.1 Neighborhood Definition

Two neighborhood structures are explored. The first neighborhood is similar to that

described in Section 3.7.3.1 in that a neighborhood Nh(θ) is considered for only a single

beam index h ∈ {1, . . . , k}, just as in the Add/Drop method.

As an extension to changining a single angle in each iteration, we also consider

a neighborhood that involves changing all beams in each iteration, corresponding to

H = {1, . . . , k} in Step 1 of the simulated annealing algorithm in Section 3.7.4. This

neighborhood is defined as N (θ) = ∪kh=1Nh(θ). Again, the neighborhoods for the

individual beams are defined as in the first method, with bounds of ± δ degrees.

3.7.4.2 Neighbor Selection

The method of selecting a neighbor depends on the neighborhood structure as

described in Section 3.7.4.1. In the first method where only one beam is changed at a

time, a neighbor is selected using the randomized approach described in Section 3.7.3.2.

Once h is selected, the probability of selecting a particular solution in Nh(θ) where the

new θ is d degrees from θh is P{D = d}, where D is the realization of a random variable of

some probability distribution defined on the interval [−δ,−δ + 1, . . . , δ].

For the neighborhood N (θ) where all beams are changed in an iteration, the new

value for each beam h ∈ {1, . . . , k} is generated from Nh(θ) in the same manner described

above.

3.7.4.3 Implementation

In addition to basing our algorithms on the exact FMO solution rather than on

heuristics or scoring measures, our simulated annealing approach differs from the previous

studies in the distribution used to generate neighbors, the definition of the neighborhood,

the cooling schedule and the number of iterations/restarts used. Not only do we use a

new neighborhood structure, but also a geometric probability distribution rather than a

75

uniform or Cauchy distribution on the neighborhood. The geometric distribution is similar

in shape to the Cauchy distribution in that they both can have fat tails depending on

the choice of probability parameters. The fat tails of these distributions allow for points

far away from the current solution to be selected as successive iterates, which potentially

increases the likelihood of finding a globally optimal solution. The geometric distribution

has the added attractiveness of producing discrete solutions, which is desirable for the

BOO problem in which discrete solutions are preferred.

By using the cooling schedule Ti+1 = αTi with α < 1, the sequence of temperatures

{Ti} converges to zero as the number of iterations increases. In our approach, the

neighborhood of a beam for both the Nh(θ) and N (θ) neighborhoods is defined using

δ = 180, that is, Nh(θ) = B. By defining the neighborhood of each beam to be the entire

single-beam solution space, the simulated annealing algorithm converges to the global

optimum when using the neighborhood N (θ) defined in Section 3.7.4.1. Though Nh(θ)

is large, each beam in Nh(θ) is assigned a probability so that only the beams closest to

θh have a significant probability of being selected. Figure 3-7A shows the probability of

replacing θh with beams at varying distances using probability p = 0.25 for the geometric

distribution. Note that the current beam cannot be selected as a replacement.

As with the Add/Drop method, a multi-start aspect is added to the simulated

annealing algorithm by repeating the algorithm using several different starting points.

3.7.4.4 Convergence

Unlike many previously proposed simulated annealing algorithms, our algorithm

converges to the globally optimal solution to the BOO problem under mild conditions.

The following theorem summarizes these conditions.

Theorem 3.7.1. Suppose that

• H = {1, . . . , k}

• limi→∞ Ti = 0

• δ = 180

76

• There is a positive probability of generating any solution in the neighborhood.

Then our simulated annealing algorithm converges to the global optimum solution in the

sense that

limi→∞

Fi = F ∗ in probability

where F ∗ is the global optimum value of the BOO problem.

Proof. This follows from Theorem 1 in Belisle et al. [87].

3.7.5 A New Neighborhood Structure

For the BOO problem, the neighborhood structure that is typically used for a vector

of beam orientations is simply the collection of beam vectors obtained from changing one

or more of the beams to a neighboring beam, where each beam has its own neighborhood

Nh(θ).

In addition to Nh(θ), we consider a new neighborhood which we call a “flip”

neighborhood. The flip neighborhood of a beam index h consists of Nh(θ) plus a

neighborhood around the parallel opposed beam of h. The parallel opposed beam is

the beam 180◦ away, that is,

h′ = (θh + 180) mod 360

The flip neighborhood can be defined as

N Fh (θ) =

{(θ1, . . . , θh−1, θ mod 360, θh+1, . . . , θk) ∈ Bk

: θ ∈ [θh − δ, θh + δ] ∪[θh + 180− δF , θ + 180 + δF

] }Note that the values δ and δF may be different. Figure 3-6 depicts a flip neighborhood

for a beam located at 0◦ degrees, the center of the top shaded wedge representing Nh(θ),

where θh = 0.

The motivation for the flip neighborhoods arises from the observation that many

of the 3-beam simulated annealing plans generated using the regular neighborhood

contained two beams very close to two beams in the optimal solution (obtained by explicit

77

enumeration), while the third beam was very close to the parallel opposed beam of

the third beam in the optimal solution. Given this observation, it is intuitive that the

inclusion of the neighborhood around the parallel beam should provide improved solutions.

The neighborhoods Nh(θ) and N Fh (θ) with varying δF values are applied to both

the Add/Drop and the simulated annealing frameworks. For the geometric probability

distribution used in the simulated annealing method, Figure 3-7B shows the probability of

selecting beams at different distances using a flip neighborhood with probability p = 0.25.

Note that the current beam cannot be selected as its own neighbor.

Figure 3-6. Nh(θ) (top shaded area) and N Fh (θ) (top and bottom shaded areas) for θh=0.

−6 −4 −2 0 2 4 60

0.05

0.1

0.15

distance from current beam

sele

ctio

n pr

obab

ility

Geometric probability distribution for standard neighborhood

p=0.25

−180 0 +1800

0.01

0.02

0.03

0.04

0.05

0.06

0.07

distance from current beam

sele

ctio

n pr

obab

ility

Geometric probability distribution for flip neighborhood

p=0.25

A B

Figure 3-7. Selection probabilities. A) Nh(θ). B) N Fh (θ).

78

3.8 Results

In addition to judging the BOO algorithms according to their computational time, the

plans must also be evaluated for clinical viability. All criteria used are those employed at

the Davis Cancer Center at Shands Hospital at the University of Florida.

3.8.1 Evaluating Plan Quality

In order to formulate an optimization problem, a quantitative measure of the

treatment plan quality is needed. This measure, the FMO function value, needs to

appropriately make the trade-off between the contradictory goals of covering targets and

sparing critical structures.

Typically, a good plan ensures that at least a certain percent of each target receives

the prescription dose. A coldspot occurs where less than a certain percent of the target

receives the prescription dose. Similarly, a hotspot occurs if a significant percentage of the

target receives more than the prescription dose.

3.8.1.1 Target coverage

Each of the plans contains two target structures, or planning tumor volumes (PTV):

one is the tumor mass observed from imaging scans, which we will call PTV2, and the

other is the PTV2 plus some margin specified by the physician, which we will call PTV1.

The PTV1 structure is used by physicians in case there are elements of the tumor mass

that cannot be seen from the imaging scans. The dose prescribed for PTV1 is less than

the dose prescribed for PTV2.

For target structures, we require that at least 95% of the target receives the full

prescription dose, so the dose that is received by at least 95% of each of the targets is

measured. We want to restrict the amount of the target that receives more than the

prescription dose. Because PTV2 is contained inside PTV1, PTV2 will necessarily have a

sizable, but less important, area receiving an overdose. Thus, we are only concerned with

PTV2 overdose. To evaluate the size of the hotspot, we check the percent volume of PTV2

that receives more than 110% of the prescription dose. To evaluate the coldspots, we check

79

Table 3-1. Sparing criteria varies for each critical structure

Structure Percent (%) ≤ Dose (Gy)brain stem 100 55eyes 50 30mandible 100 70optic chiasm 100 55optic nerves 100 50parotid glands 50 30skin 100 60spinal cord 100 45submandibular glands 50 30

the percent volume of both PTV1 and PTV2 that receives at least 93% of the prescription

dose. The prescription doses are set to 54 Gy for PTV1 and 73.8 Gy for PTV2, which are

the dose values used at Shands Hospital at the University of Florida.

3.8.1.2 Critical structure sparing

The critical structures involved in each case vary, depending on their proximity to

the tumor. The critical structures can be classified into two general groups according

to their ability to survive radiation dose. Parallel structures, e.g., saliva glands, will

continue to function as long a certain percentage of the organ receives less than a certain

amount of dose. Serial structures, on the other hand, will cease to function if any of the

organ receives over a certain amount of dose. The spinal cord is one example of a serial

structure—if it receives too much dose, the effect is equivalent to cutting it in half, leaving

the patient paralyzed. The sparing criteria for each of the common critical structures in

head-and-neck cases are listed in Table 4-2. The critical structures involved in each case

vary, depending on their proximity to the tumor.

There are four saliva glands: one submandibular and one parotid gland on each of

the right and left sides. The saliva glands are of particular importance because their loss

can greatly decrease the patient’s quality of life, but because of their location relative

to the usual tumor positions, they can be difficult to spare. Studies show that a patient

can lead a relatively normal life with three of the four glands spared. The loss of other

80

organs, especially the spinal cord, will also greatly affect the patient’s quality of life, but

head-and-neck tumors are usually situated in such a way that other organs can be easily

spared in the FMO optimzation. Thus, the results presented place particular emphasis on

the sparing of saliva glands.

Rather than relying strictly on FMO value, a tool commonly used by physicians to

judge the quality of a treatment plan is the dose-volume histogram (DVH). This histogram

is a measure of the cumulative dose received by a given structure. It specifies the fraction

of each structure’s volume that receives at least a certain amount of dose. Although there

are several critical structures to be considered in head-and-neck cases, the saliva glands are

notoriously the most difficult to spare due to their proximity to common tumor locations.

Thus, for clarity, the DVH results provided include only target structures and saliva

glands. Each of the treatment plans spares all organs not shown in the DVHs.

In the DVH results provided, vertical lines indicate target prescription doses, and

asterisks mark the sparing criteria for the saliva glands.

3.8.2 Response Surface Method Results

The response surface method was tested on six head-and-neck cases using a Windows

XP computer with a 3.2 GHz Pentium IV processor and 2 GB of RAM. The sizes of the

test cases for plans with three beams are shown in Table 3-2. Each algorithm was allowed

to run for 12 hours, which is not an unreasonable run length because BOO will not be

performed on a day-to-day basis. It is anticipated that BOO will be performed once

overnight between the time the patient is imaged and the time the patient begins radiation

therapy. A good beam vector chosen before treatment begins should continue to provide

quality treatment plans throughout the patient’s treatment, which is typically 35 days.

The beam orientations from which linear accelerators are capable of delivering

radiation are not restricted to integer value degrees. In this study, integral beam

orientations are desired to account for setup tolerances. For the same reasons, beam

orientations are considered on a 10◦ grid. To obtain integral solutions, in the subproblem

81

Table 3-2. Sizes of test cases.

Case # bixels # voxels1 514 345,6292 546 352,2843 613 347,2334 549 268,8235 423 271,1566 585 389,565Avg. 538 329,115Low 423 268,823High 613 389,565

of maximizing I(θ), the integer constraint is relaxed in the problem of determining an

upper bound on s2(θ), and the resulting solution is rounded to integer values.

The branching scheme used treats the rounded solution as integral and branches so

as to avoid overlapping subregions as described in Section 3.6.2.3. Results are provided

for each possible initial region scheme. The point at which branching is performed in each

region, θb in Section 3.6.2.3, is chosen as the midpoint of the region. Also, ri and θh in the

underestimating terms in Problem s2-UB in Section 3.6.2.2 are taken to be the midpoints

of their respective intervals.

It is anticipated that the weighted distance measure in equation 3–3 will have

an significant impact on the algorithm’s performance. Intuitively, a small weighted

distance corresponds to a small correlation between points, which will cause the algorithm

to behave locally. In order to induce the algorithm to behave globally, the algorithm

must assume less correlation between two points. If the points are less correlated, the

algorithm will be less likely to stay in the neighborhood of previously sampled points.

The correlation between two points can be decreased by increasing the weighted distance

between the points, which can be done by increasing c or p. If c becomes sufficiently large,

the correlation between points will be effectively zero, thus yielding an effectively random

search algorithm. To test these expectations, c was tested with values of 10.0, 100.0 and

82

500.0. In each test, five randomly selected starting points were used to initialize the RS

algorithm.

To evaluate the algorithm’s performance across all of the tested cases, the relative

improvements in FMO value over a 5-beam equispaced plan (denoted 5 equi), a 7-beam

equispaced plan (denoted 7 equi) and a locally optimal 3-beam coplanar plan obtained

using a local search algorithm called the Add/Drop local search heuristic introduced by

Kumar [80] and denoted 3 A/D are compared.

3.8.2.1 Proof of concept

To test the accuracy of the RS method, a single case was tested wherein the problem

of adding a single coplanar beam to an equi-spaced, coplanar 3-beam solution over a 1◦

grid was considered. The algorithm was initialized with two randomly selected starting

points. By considering such a small scale problem, the solution space in each iteration can

be explicitly enumerated in order to exactly obtain the next best point to sample. The

ability to enumerate the solution space will also allow us to determine how accurately the

RS method models the FMO objective function.

At each point that has been sampled, both the uncertainty and the expected

improvement will be zero. This result is not only theoretically true, but also intuitive

because once the FMO value at a certain point is known, there will be no improvement

over the current best FMO value by sampling that point again. It is also expected that as

the algorithm progresses, the approximation of the FMO function will become increasingly

accurate, with the approximation obtaining the exact FMO values at sampled points.

Figures 3-8A-3-8D demonstrate how the RS method behaved as predicted at different

points in the RS algorithm. The expected value is zero at sampled points and the

approximation of the FMO function almost perfectly fits the true FMO function by

the time the algorithm terminates.

The importance of the starting points, the points sampled before the algorithm begins

to give the method some baseline information about the FMO function, was also tested.

83

0 60 120 180 240 300 3605

6

7

8

9

10

11

12

13

θ

FM

O v

alue

F(θ

)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

2 points sampled

Exp

ecte

d Im

prov

emen

t I(θ

)

True FMO valueFMO appromixation

0 60 120 180 240 300 3605

6

7

8

9

10

11

12

13

θ

FM

O v

alue

F(θ

)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

20 points sampled

Exp

ecte

d Im

prov

emen

t I(θ

)


A B

0 60 120 180 240 300 3605

6

7

8

9

10

11

12

13

θ

FM

O v

alue

F(θ

)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

80 points sampled

Exp

ecte

d Im

prov

emen

t I(θ

)


0 60 120 180 240 300 3605

6

7

8

9

10

11

12

13

θ

FM

O v

alue

F(θ

)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

148 points sampled − algorithm terminates

Exp

ecte

d Im

prov

emen

t I(θ

)


C D

Figure 3-8. Proof of concept results at various stages of the RS algorithm. A) After twopoints. B) After 20 points. C) After 80 points. D) After 148 points, when thealgorithm terminates.

The RS method was run with 100 randomly generated sets of starting points, and the RS

method obtained the global optimum in 90.6% of trials, indicating that the performance of

the algorithm is not significantly dependent on the starting points.

3.8.2.2 Adding a non-coplanar beam to a coplanar solution

Next, the problem of adding a non-coplanar beam to a 3-beam locally optimal

coplanar solution was considered. The locally optimal solution is obtained using the

Add/Drop algorithm. The beam data for the non-coplanar beam being optimized is

generated on-the-fly, and consists of gantry and couch rotations, where the both gantry

and couch are allowed to rotate a full 360◦ on a 10◦ grid. As the final solution of the

non-coplanar RS plan will be a 4-beam plan, the results from the response surface solution

84

are compared to the locally optimal coplanar 4-beam Add/Drop plan, denoted 4 A/D.

The plans will also be compared to an equi-spaced, coplanar 7-beam plan, denoted 7 equi,

which is commonly used in practice to treat head-and-neck cancers.

There is relatively little deviation in the final solutions between the different

parameter choices and initial regions schemes, as shown by Table 3-3. The results also

indicate that the starting points chosen do not significantly affect the outcome of the

algorithm. This implies that the response surface algorithm is robust with respect to

varying implementations.

Although the 4 RS solutions obtained an average of 5.44% decrease in FMO value

from the 7 equi plans, the 4 RS solutions did in fact obtain an average of 16.12%

improvement in FMO value over the 4 AD solutions. Despite the differences in FMO

value, all treatment plans examined were similar in clinical quality, as discussed in Section

3.8.2.3.

Although the algorithm was allowed to run for 12 hours in each scenario, the

minimum FMO value obtained by the RS method was found early on. On average,

the best FMO value found was obtained in 6.15 hours after sampling 27-40 points.

For each of the RS method variations tested, both the number of points sampled

and the relative improvements in FMO value are nearly identical. This indicates that the

algorithm is robust with respect to parameter and implementation changes. The time

spent generating beam data comprises approximately 84% of the algorithm’s run time,

while the response surface portion on average accounts for only 13%. Thus, it is expected

that changes to the RS method, including improvements to the branch-and-bound routine,

will not have a very strong impact on the number of points the algorithm will sample in

its allotted run time.

3.8.2.3 Clinical results

The target coverage achieved by the different treatment plans are displayed in Table

3-4. On average, the 7 equi plan was able to deliver the most amount of dose to PTV2,

85

Table 3-3. Minimum FMO value obtained and time required to obtain it.

Min. FMO value Time (hrs)Case Avg. St. Dev. Avg. St. Dev.1 565.24 8.82 5.35 5.072 570.51 12.83 7.49 3.783 927.34 20.60 7.05 2.214 710.92 7.72 6.54 3.395 512.22 20.04 6.96 3.336 799.95 34.07 3.48 3.40

Table 3-4. Target coverage achieved by the treatment plans

4 RS 4 A/D 7 equiPTV2 dose at 95% volume 73.16 Gy 72.56 Gy 73.81 GyPTV2 % receiving > 110% of Rx 23.18 % 15.07 % 31.63 %PTV2 % receiving > 93% of Rx 98.87 % 98.67 % 99.57 %PTV1 dose at 95% volume 54.71 Gy 54.41 Gy 55.09 GyPTV1 % receiving > 93% of Rx 97.95 % 98.01 % 97.46 %

but the 4 RS plan is very close. Both of the 4-beam plans obtain smaller hotspots and

better PTV1 target coverage than the 7 equi plan. The 4 A/D plan on average underdoses

PTV2, which could lead to recurrence of the cancer. This underdosage could also account

for the smaller hotspot in the 4 A/D plans.

Figures 3-10 and ?? illustrate two representative cases where the 4 RS, 4 A/D and

7 equi plans each have clinically acceptable target coverage. The vertical line at 73.8 Gy

indicates the prescription dose for PTV2.

The ability of each of the treatment plans to spare the organs in the cases tested is

shown in Table 3-5. Surprisingly, both the 4 RS and the 4 A/D plan are equivalent to

or outperform the 7 equi plan in terms of organ sparing. In the 4-beam plans, the left

submandibular gland is spared in 83% of the treatment plans developed, whereas it is only

spared in 67% of the 7 equi plans. One case illustrating equivalent organ sparing is shown

in Figure 3-10, and one case demonstrating improved organ sparing over the 7 equi plan is

shown in Figure ??. Just as PTV2 underdosage in the 4 A/D plans likely contributed to

the smaller hotspots, it is possible that the improved organ sparing in the 4 A/D plans is

also a result of the underdosage.

86

Table 3-5. Percentage of plans in which an organ is spared.

Structure 4 RS 4 A/D 7 equibrain stem 100% 100% 100%mandible 100% 100% 100%left optic nerve 100% 100% 100%right optic nerve 100% 100% 100%left eye 100% 100% 100%right eye 100% 100% 100%optic chiasm 100% 100% 100%left parotid gland 100% 100% 100%right parotid gland 67% 67% 67%left SMB gland 83% 83% 67%right SMB gland 50% 50% 50%spinal cord 100% 100% 100%skin 100% 100% 100%

0 10 20 30 40 50 60 70 80 900

20

40

60

80

100

case001 Target DVHs

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

7−beam equi4−beam A/D4−beam RS

PTVGTV

0 10 20 30 40 50 60 70 80 900

20

40

60

80

100

case001 Saliva gland DVHs

Dose [Gy]

Vol

ume

[Fra

ctio

nal]


r. parotidr. SMB

A B

Figure 3-9. 7-beam equi-spaced (dotted), 4-beam Add/Drop (dashed) and 4-beam RSnon-coplanar (solid) target and select saliva gland DVHs. A) Target coverageis nearly identical. B) The tumor surrounds the right submandibular gland, sothe FMO solver recognizes that it cannot be spared and allows it to receive asmuch dose as necessary to ensure good target coverage in all plans. All othersaliva glands are spared in all plans.

87

0 10 20 30 40 50 60 70 80 900

20

40

60

80

100

case005 Target DVHs

Dose [Gy]

Vol

ume

[Fra

ctio

nal]


PTVGTV

0 10 20 30 40 50 60 70 80 900

20

40

60

80

100

case005 Saliva gland DVHs

Dose [Gy]

Vol

ume

[Fra

ctio

nal]


l. SMB

A B

Figure 3-10. 7-beam equi-spaced (dotted), 4-beam Add/Drop (dashed) and 4-beam RSnon-coplanar (solid) target and select saliva gland DVHs. A) Target coverageis nearly identical. B) The left submandibular gland is spared by the two4-beam plans, but not by the 7-beam plan. All other saliva glands are sparedin all plans.

3.8.3 Neighborhood Search Method Results

The simulated annealing method was tested on six head-and-neck cases using a

Windows XP computer with a 2.13 GHz Pentium M processor and 2 GB of RAM. On

average, ≈ 340 FMOs were calculated in 30-minute run time allowed for the simulated

annealing and Add/Drop algorithms. Beams were selected on a 5-degree grid, yielding 72

candidate coplanar beams.

The simulated annealing and Add/Drop algorithms were used to obtain 4-beam

coplanar plans using regular and flip neighborhoods. In order to compare the quality of

the treatment across different plans, the plans are compared in terms of the percentage

improvements of each plan’s FMO value improvement over the FMO value of the locally

optimal 3-beam plan obtained from the Add/Drop local search heuristic described

by Kumar [80]. The Add/Drop plans are denoted 3 A/D and 4 A/D for the 3- and

4-beam plans, respectively. The 4-beam plans generated by the simulated annealing and

Add/Drop algorithms are compared to the typical 5- and 7-beam equi-spaced plans,

88

denoted 5 equi and 7 equi, respectively. The simulated annealing plans are denoted by the

implementation numbers, which refer to the parameters used, given in Table 3-6.

Figure 3-11 demonstrates the improved convergence times possible using the flip

neighborhood.

3.8.3.1 Add/Drop algorithm results

The Add/Drop algorithm was allowed to run for 30 minutes to generate a 4-beam

plan. The Nh(θ) neighborhood with δ = 20 and the N Fh (θ) with δF = 0 and δF = 20

neighborhoods are tested for the Add/Drop algorithm. The value δ = 20 is chosen

to approximate the neighborhood size that is expected from the simulated annealing

implementation using a large flip neighborhood, where δF = 180. More details on the

simulated annealing implementations are provided in Section 3.8.3.2.

Using Nh(θ), the 4-beam Add/Drop solution is nearly identical identical to the

7-beam equi-spaced plan, while the flip neighborhoods allow the Add/Drop algorithm

to find 4-beam solutions that exceed the quality of the 7-beam plans. Figure 3-12

demonstrates the quality of the solutions, while Figure 3-11(a) illustrates that the flip

neighborhoods provide faster FMO convergence than that of Nh(θ).

3.8.3.2 Simulated Annealing results

Several parameter sets were tested for the simulated annealing algorithm. For

simplicity, each of the parameter sets and methods of generating a neighboring solution

are numbered according to Table 3-6. Each implementation contains a total of 500

iterations, i.e., 500 sampled points, thus yielding a fair comparison between the parameters.

To ensure clinical practicality, the algorithm was allowed to run for a maximum of 30

minutes or 500 iterations, whichever came first.

For the cooling schedule, we update the temperature according to an exponential

cooling schedule, Ti+1 = αTi, where α < 1. Due to the random nature of the algorithm,

the algorithm is restarted five times, each time with a different initial starting point. The

first initial starting point is an equi-spaced solution, and each subsequent starting point is

89

0 5 10 15 20 25 30550

600

650

700

750

time (minutes)

min

imum

FM

O v

alue

4−beam Add/Drop

Regular neigborhoodflip, δF=0

flip, δF=20

0 5 10 15 20 25 30450

500

550

600

650

700

time (minutes)

min

imum

FM

O v

alue

4−beam simulated annealing

Regular neigborhoodflip, δF=0

flip, δF=180

A B

Figure 3-11. Comparison of FMO convergence. A) Add/Drop. B) Simulated annealing.

the previous initial solution rotated by d degrees, where candidate angles are considered on

a d-degree grid, that is, every dth angle is considered. The number of simulated annealing

and Metropolis iterations are chosen such that the total number of iterations is 500.

The initial temperature values tested are T0 = 0 and T0 = 75. T0 = 0 results in

the acceptance of only improving solutions, while the initial temperature value 75 was

selected as the value that would approximately yield a 50 percent probability of selecting a

non-improving solution for the initial iterations of the algorithm.

For both the Nh(θ) and N Fh (θ) neighborhoods, δ = δF = 180 is used so that the

entire solution space is considered as a neighborhood. As shown in Figure 3-7A, the

probability of selecting a beam 20◦ away using the Nh(θ) neighborhood with geometric

distribution with p = 0.25 is only 0.39% on a 5◦ grid. We consider this sufficiently small to

not consider neighborhoods larger than δ = 20 for Nh(θ) and δF = 20 for N Fh (θ) in the

Add/Drop algorithm. Just as in the Add/Drop implementation, the neighborhood N Fh (θ)

with δF = 0 is also considered.

90

Table 3-6. Definitions of implementations.

Number n m N α T0

1 100 1 1 0.9 02 10 10 1 0.9 03 100 1 1 0.99 04 10 10 1 0.99 05 100 1 1 0.9 756 10 10 1 0.9 757 100 1 1 0.99 758 10 10 1 0.99 759 100 1 all 0.9 0

10 10 10 all 0.9 011 100 1 all 0.99 012 10 10 all 0.99 013 100 1 all 0.9 7514 10 10 all 0.9 7515 100 1 all 0.99 7516 10 10 all 0.99 75

.

Figure 3-11(b) shows that the flip neighborhoods converge in FMO value significantly

faster than does the Nh(θ) neighborhood, while Figure 3-13 shows that the flip neighborhoods

provide comparable solution quality to both the non-flip simulated annealing and 7-beam

equi-spaced solutions.

3.8.3.3 Clinical results

Because there is no fundamental way of quantifying a treatment plan, a tool

commonly used by physicians to judge the quality of a treatment plan is the dose-volume

histogram (DVH). A DVH is a graphical measure of the cumulative dose received by a

given structure. It specifies the percentage of each structure’s volume that receives at least

a certain amount of dose, thus providing an intuitive means of assessing the quality of a

treatment plan.

The plans tested plans each contain two target structures. The gross tumor volume

(GTV) is the tumor mass observed from imaging scans. The clinical tumor volume (CTV)

is the GTV plus some margin specified by the physician. The CTV is used by physicians

91

in case there are elements of the tumor mass that cannot be seen from the imaging scans,

and the dose prescribed for the CTV is less than the dose prescribed by the GTV.

DVHs for a representative case comparing the 7-beam equi-spaced plan with the

simulated annealing plans obtained using a regular neighborhood and flip neighborhoods

with δF = 0 and δF = 180 are shown in Figure 3-13. Comparison of the 7-beam

equi-spaced plan and the Add/Drop plans using a regular neighborhood and flip

neighborhoods with δF = 0 and δF = 20 are shown in Figure 3-12. The sparing criteria

used for the saliva glands, no more than 50% of the gland receiving 30 Gy, is marked by

the star in Figures 3-13 and 3-12. The prescription dose for the GTV is 73.8 Gy, which

is marked by the vertical line in Figures 3-13 and 3-12. As previously stated, for target

structures, we require that at least 95% of the target receives the full prescription dose.

Figure 3-13 reveals that the 7-beam equi-spaced plan actually overdoses the target

and has a larger hotspot than the 4-beam simulated annealing plans. The 7-beam

equi-spaced plan only spares three of the four saliva glands, whereas the 4-beam simulated

annealing plans spare three or more saliva glands. The simulated annealing plans obtained

using the flip neighborhoods spare all four saliva glands, while the plan obtained how the

Nh(θ) neighborhood only spares three saliva glands, indicating that the flip neighborhoods

do in fact find superior solutions in terms of clinical quality.

Figure 3-12 shows that the 4-beam Add/Drop plans obtain nearly identical solutions

when compared to the 7-beam equi-spaced DVHs. The flip neighborhoods perform

clinically comparably to the regular neighborhood plans, and all of the Add/Drop plans

are comparable to the 7-beam equi-spaced plan in terms of saliva gland sparing and target

coverage.

3.9 Conclusions and Future Directions

3.9.1 Response Surface Conclusions

We have shown that for head-and-neck cases, quality plans with fewer beams than

a standard treatment plan can be obtained if BOO is applied. The response surface

92

0 10 20 30 40 50 60 70 80 900

20

40

60

80

100

Add/Drop: Target DVHs

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

7−beam equi−spaced

flip, δF=20

flip, δF=0Regular neighborhood

0 10 20 30 40 50 60 70 80 900

20

40

60

80

100

Add/Drop: Saliva gland DVHs

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

7−beam equi−spacedflip, δF=20


A B

Figure 3-12. Comparison of Add/Drop and 7-beam equi-spaced plans. A) The Add/Dropplans achieve nearly identical target coverage when compared to the 7-beamequi-spaced plan. B) The saliva gland sparing in the Add/Drop plans and the7-beam equi-spaced plan is clinically equivalent.

0 10 20 30 40 50 60 70 80 900

20

40

60

80

100

Simulated Annealing: Target DVHs

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

7−beam equi−spacedflip, δF=180


0 10 20 30 40 50 60 70 80 900

20

40

60

80

100

Simulated Annealing: Saliva gland DVHs

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

7−beam equi−spaced

flip, δF=180


A B

Figure 3-13. Comparison of Add/Drop and 7-beam equi-spaced plans. A) Unlike the7-beam equi-spaced plan, the 4-beam simulated annealing plans do notoverdose the target. B) The simulated annealing plans are also capable ofsparing more saliva glands than the 7-beam equi-spaced plan.

93

algorithm operates in a clinically reasonable time frame, and is generally successful in

selecting non-coplanar beam orientations to improve the FMO value over that of locally

optimal coplanar solutions. The FMO value of the 4-beam response surface plans was

also only slightly larger than that of the 7-beam equi-spaced coplanar treatment plans,

indicating comparable treatment plans despite the decrease in the number of beams used.

In terms of clinical results, the most significant benefit of the non-coplanar solutions

over the locally optimal coplanar solutions was the ability to deliver a higher amount

of dose to the target structures. Both the non-coplanar and locally optimal coplanar

solutions were able to obtain treatment plans with organ sparing that is comparable to or

improved upon the 7-beam equi-spaced coplanar treatment plans.

While the inclusion of non-coplanar orientations in BOO is useful in terms of FMO

value and target coverage, the resulting improvements in the treatment plan may not

always be clinically significant. With better parameter tuning or neighborhood structure,

it is possible that the Add/Drop algorithm can obtain coplanar treatment plans with more

desirable target coverage, thus making the response surface plans and the Add/Drop plans

clinically equivalent. This suggests that the inclusion of non-coplanar beam orientations

does not significantly improve the quality of a treatment plan. Although most BOO

research is restricted to coplanar orientations, there has not yet been a study assessing the

solution quality of coplanar versus non-coplanar solutions. With this study as evidence,

both researchers and practioners now have a basis for restricting the solution space to the

smaller, more tractable set of coplanar beams for head-and-neck beam optimization.

The patient cases in this work were all head-and-neck cases. Different tumor sites,

e.g., breast, lung and prostate, could also benefit from BOO, and perhaps may experience

greater improvements in treatment plan quality. In future work, these sites will be tested

to assess the general clinical usefulness of non-coplanar orientations and the response

surface method.

94

3.9.2 Neighborhood Search Conclusions

We have shown that for head-and-neck cases, quality plans with fewer beams than

a standard treatment plan can be obtained if BOO is applied. The simulated annealing

and Add/Drop algorithms both regularly obtained quality treatment plans with as few

as four beams in only 30 minutes. The use of the flip neighborhood improves the rate of

FMO convergence in both algorithms, and even has the ability to improve organ sparing

as shown in the simulated annealing results. The simulated annealing and Add/Drop

algorithms performed comparably to each other, with neither algorithm indicating a

significant benefit over the other.

It is possible to incorporate flip neighborhoods into other BOO algorithms that rely

on neighborhood searches to yield improved treatment plans in clinically acceptable time

frames.

95

CHAPTER 4FRACTIONATION

4.1 Introduction

Typically, head-and-neck treatment plans each contain two target structures, or

planning tumor volumes(PTV): PTV1 and PTV2. Let PTV1 be the tumor mass observed

from imaging scans, and let PTV2 be PTV1 plus some margin specified by the physician.

Rather than deliver an entire treatment plan in one session, a treatment plan is

divided into several sessions, called fractions. This is done to take advantage of the fact

that normal, healthy cells recover faster from the radiation than cancerous cells. To

obtain the treatment plans for the fractions, in practice, a single FMO treatment plan is

developed and then divided into the desired number of fractions, usually around 35. This

division of a treatment plan is a non-trivial task, as the target voxels must receive 1.8-2.0

Gy of radiation in each fraction.

With a single IMRT treatment plan, it is practically impossible to devise a constant

dose-per-fraction delivery technique because only a single FMO problem is solved to

obtain the treatment plan, which is then simply divided into a number of daily fractions.

If a single plan is optimized to deliver doses to multiple target-dose levels, then the dose

per fraction delivered to each target must change in the ratio of a given dose level to the

maximum dose level. For example, say PTV1 has a prescription dose of 70 Gy, PTV2 has

a prescription dose of 50 Gy, and the number of fractions is 35. If a single treatment plan

is divided among the 35 fractions, then PTV1 will receive 70/35 = 2.0 Gy in each fraction,

but PTV2 will only receive 50/35 = 1.4 Gy, and thus any cancerous cells in PTV2 may

not be eradicated by the treatment. Similarly, if only 25 fractions are used in order to

ensure that PTV2 receives 2.0 Gy per fraction, then PTV1 receives 70/25 = 2.8 Gy per

fraction, well above the desired dose.

We propose a new method of approaching the fractionation subproblem wherein an

FMO treatment plan is developed for each target structure, rather than developing a

96

single treatment plan for all target structures. The individual treatment plans can then be

easily divided into optimal fractions.

The primal-dual interior point algorithm presented by Aleman et al. [88] is used to

solve the FMO and fractionation models to optimality.


The fractionation model builds on the FMO model described in Chapter 2. To solve

the fractionation problem, we consider developing an individual fluence map solution for

each target. For a case with two targets, two plans must be developed: (1) a plan that

delivers the prescription dose to PTV1 and PTV2, and (2) a plan that “boosts” the dose

received by PTV1 to reach the prescribed dose level. These two fluence maps can then be

divided into the appropriate number of fractions easily. For the example of 50 Gy and 70

Gy prescription doses for PTV2 and PTV1, respectively, this would yield 25 fractions of

treating both PTV1 and PTV2 to 50/25 = 2.0 Gy, and another 10 treatments of treating

just PTV1 to (70− 50)/10 = 2.0 Gy.

For simplicity, we call these individual fluence maps “fractions”, rather than using the

term to describe the daily treatments. The development of these fluence maps separately

would result in suboptimal solutions. To optimize these fluence map sets simultaneously,

we consider each bixel in each fraction as an individual decision variable. As there number

of fractions is equal to the number of targets (T ), this results in a fluence map developed

for each target.

In the single FMO formulation, dose to voxel j in structure s is defined as zjs =∑Ni=1 Dijxi, s = 1, . . . , T , and the penalty associated with it as Fs(zjs). Because the

fractionation model will be concerned with dose-per-fraction as well as cumulative dose,

new variables must be defined to express these values.

97

Define xfi , f = 1, . . . , T , as the fluence of beamlet i in fraction f . The amount of dose

received by a voxel j in structure s in fraction f is defined as

zfjs =

N∑i=1

Dijxfi , j = 1, . . . , vs, s = 1, . . . , T, f = 1, . . . , T (4–1)

Critical structures are thought to be affected by only the cumulative dose received

from all treatments, rather than the just the dose in any one particular fraction. This

cumulative dose received by a voxel is

zjs =T∑

f=1

N∑i=1

Dijxfi , j = 1, . . . , vs, s = T + 1, . . . , S (4–2)

Critical structures are penalized in the same manner as in the original FMO model, that

is, Fs(zjs), s = T + 1, . . . , S.

Targets require a more complex treatment in the fractionation model. In each

fraction, we are primarily concerned with dose received by the targets in that particular

fraction. Thus, new variables are needed to express the amount of dose per fraction

received by a voxel (zfjs in Equation (4–1)).

Since we must also ensure that the cumulative dose received by each target reaches

the prescribed dose, variables to express the cumulative dose received by a voxel are

required. Intuitively, this cumulative dose should be the sum of all the doses received in all

fractions. If the cumulative dose for targets is defined this way, then over/underdosing

in one fraction can result in under/overdosing in another to compensate, which is

undesirable. To prevent such a scenario, another new variable called the artificial dose

is required (zjs in Equation (4–3)). Rather than simply summing up the dose received

in each fraction, we will assume that in the previous fraction, the target voxel received

exactly the correct prescription dose for the previous fraction. Thus, no compensating will

be necessary. The artificial dose is just the prescription dose from the previous fraction

98

(Pf−1) plus the dose received in the current fraction:

zjs = Pf−1 + zfjs j = 1, . . . , vs, s = 1, . . . , T, f = 1, . . . , T (4–3)

Since each of the target voxels being irradiated in fraction f is treated as target f , the

penalty functions for these voxels is

T∑s=f

∑j∈Vs

Ff (zfjs)

Once a target has received its prescription dose, ideally, it should not receive any

further dose. As target f is treated in fraction f , for all fractions after f , target f should

be treated as normal tissue. Specifically, targets that no longer require dose will be treated

as skin, denoted structure S. Therefore, these target voxels, along with actual skin voxels,

will be penalized with penalty function FS. The dose received by these target voxels is

the prescription dose of the voxel (Ps) plus the dose received in all subsequent fractions

(∑T

`=s+1 z`js). This leads to the following penalty functions for voxels penalized as normal

tissue in fraction f :f−1∑s=1

∑j∈Vs

FS

(Ps +

T∑`=s+1

z`js

)+∑j∈VS

FS(zjS)

As with the traditional FMO model, penalty functions are normalized according to

the number of voxels in the structure. For critical structures, this normalization factor

is still 1/vs since there are always vs voxels being treated as critical structure s. In each

fraction, the number of target voxels depends on which targets still need to be treated.

Each fluence map set will only “see” the target voxels that are included in its prescription

dose level. Thus, define the number of target voxels treated in fluence map f as

vf =T∑

s=f

vs f = 1, . . . , T

99

The number of voxels treated as skin in each iteration can be expressed by v1 − vf + vS,

where v1 − vf is the number of target voxels being treated as skin and vS is the number of

actual skin/unspecified tissue voxels.

Identical to the traditional FMO, the critical structures are normalized and penalized

byS−1∑

s=T+1

1

vs

∑j∈Vs

Fs(zjs)

Let z be a vector of all zjs, zfjs and zf

js variables. The objective function is obtained

by summing the normalized penalty functions:

Ffrac(z) =T∑

f=1

{1

v1 − vf + vS

[f−1∑s=1

∑j∈Vs

FS

(Ps +

T∑`=s+1

z`js

)+∑j∈VS

FS(zjS)

]

+1

vf

T∑s=f

∑j∈Vs

Ff (zfjs) +

S−1∑s=T+1

1

vs

∑j∈Vs

Fs(zjs)

}

The fractionation model is then formulated as

minimize Ffrac(z)

subject to zfjs =

N∑i=1

Dijxfi j = 1, . . . , vs, s = 1, . . . , T, f = 1, . . . , T

zjs =T∑

f=1

N∑i=1

Dijxfi j = 1, . . . , vs, s = T + 1, . . . , S

zjs = Pf−1 + zfjs j = 1, . . . , vs, s = 1, . . . , T, f = 1, . . . , T

x ≥ 0

As the objective function is the sum of quadratic functions and the constraints are all

linear, the fractionation formulation, just like the basic FMO formulation.

4.3 Results

The fractionation model is tested using the primal-dual interior point algorithm

in Aleman et al. [88]. One significant benefit of employing a primal-dual interior point

algorithm is that the solution generated is guaranteed to be optimal to within a certain

100

tolerance that can be specified by the user. Thirteen head-and-neck cases using five

equi-spaced beams are tested. Each test case consists of two targets, PTV1 and PTV2,

with prescription dose levels of 70 Gy and 50 Gy, respectively.

According the suggestions made on algorithm parameters in Aleman et al. [88],

the primal-dual interior point algorithm was implemented with a Single Approximation

Hessian and a stopping criteria of a relative duality gap of 0.1%. Although it was also

recommend to remove “insignificant” beamlets, these removal of these beamlets actually

increases run time in the fractionation model. Thus, insignificant beamlets are left in the

fractionation model.

4.3.1 Computational Results

The tests are run in Matlab (MathWorks, Inc.) on a 2.33GHz Intel Core 2 Duo

processor with 2GB of RAM. Table 4-1 shows the sizes of each case in terms of the

number of decision variables (the number of bixels) and the size of the patient area being

treated (the number of voxels). The computation times obtained are display in Table 4-1.

On average, the fractionation model was solved in 22.03 seconds. With the same algorithm

parameters and weighting parameters, a single FMO treatment plan can be determined

in an average of 16.28 seconds, thus there is only a 35% increase in computation time

required to develop two FMO plans for the fractionation model. This relatively small

increase in time could be accounted for by the fact that the weighting parameters used

in the objective function were specifically tuned for the fractionation model. Using

parameters specifically tuned to the single-FMO model, the single-FMO model can be

solved on average in 9.36 seconds. Compared to this average run time, the FMO model

requires 2.4 times as much computation time to develop two models as opposed to one,

which is a more intuitive expectation of the interior point method’s performance.

101

Table 4-1. Case sizes and run times using identical algorithm and weighting parameters.

Single FMO FractionationCase Bixels Voxels Iterations Time (s) Iterations Time (s)

1 813 85,017 16 8.39 16 19.602 1320 189,234 103 82.69 14 55.343 935 86,255 24 11.75 11 18.794 692 58,636 15 6.87 11 11.475 1044 102,262 14 13.16 12 29.706 1005 84,369 13 10.31 12 25.587 822 71,873 17 9.14 14 18.888 802 92,307 59 22.92 14 20.199 911 65,541 18 10.84 17 26.12

10 642 66,634 25 7.94 16 12.4411 279 56,847 29 2.75 14 2.9912 994 96,105 17 12.30 12 27.1313 823 72,729 33 12.55 14 18.15

Average 852 86,755 29 16.28 14 22.03

4.3.2 Clinical Results

Because there is no fundamental way of quantifying a treatment plan, DVHs are

examined in addition to objective function values to assess the quality of a treatment

plan..

The prescription doses used are 70 Gy for PTV1 and 50 Gy for PTV2. These are

common prescriptions used in the cancer center at Shands Hospital at the University of

Florida. Figures 4-1-4-7 show both dose volume histograms (DVHs) and axial slices for

several cases. The DVHs show that in the first fraction, both PTV1 and PTV2 are treated

to 50 Gy, and in the second fraction, only PTV1 is treated to an additional 20 Gy. The

prescription dose for the fraction is marked by a vertical line. The amount of dose received

by each target in each fraction is clinically acceptable.

As this study focuses on head-and-neck cases where the most conflict lies in treating

the targets while sparing the saliva glands, only DVHs of the saliva glands are shown. All

other organs, including skin/unspecified tissue, receive a low enough amount of dose to be

spared in the treatment. The sparing criteria for each of the common critical structures in

head-and-neck cases are listed in Table 4-2. The critical structures involved in each case

102

Table 4-2. Sparing criteria varies for each critical structure

Structure Percent (%) ≤ Dose (Gy)brain stem 100 55eyes 50 30mandible 100 70optic chiasm 100 55optic nerves 100 50parotid glands 50 30skin 100 60spinal cord 100 45submandibular glands 50 30

vary, depending on their proximity to the tumor, and thus DVHs for some cases do not

include all saliva glands.

DVHs of the saliva gland doses in Fraction 1 show that the saliva glands receive the

majority of dose in the first fraction. Because the cumulative amount of dose received

determines whether or not critical structures can be spared, the DVHs for Fraction 2

depict the cumulative dose of these organs. The sparing criteria used for saliva glands is

that no more than 50% of the gland can receive more than 30 Gy. This point is marked as

a star. For most cases, all of the saliva glands are spared.

Figures 4-1-4-7 also show the dose received in each fraction as a colorwash of a slice

of the patient. Fraction 1 delivers a homogeneous dose of 50 Gy to both PTV1 and PTV2

while generally avoiding overdosing any of the marked critical structures. In Fraction 2,

the dose to PTV1 is boosted by 20 Gy without delivering any unnecessary dose.

4.3.3 Spatial Coefficient Results

The concept of employing spatial information as described in Section 2.4 is also

applied to the fractionation model. One set of spatial coefficients is used to obtain both

fractions. For the fractionation treatment plans, the spatial coefficients are λ = 1.02,

µ = −0.92, β = 0.97 and the minimum coefficient for target voxels is 0.6.

Generally, the DVHs for both targets and critical structures using spatial coefficients

are similar to those obtained without using spatial coefficients. In fact, in the cases tested,

103

0 1020304050607080900

20

40

60

80

100

Target DVHs: Fraction 1 of 2

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100

Target DVHs: Cumulative dose

Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100

Saliva gland DVHs: Cumulative dose

Dose [Gy]

Vol

ume

[Fra

ctio

nal]


Figure 4-1. Target DVHs, saliva DVHs and axial slices in Fractions 1 (left) and 2 (right).

104

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]



105

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]



106

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]



107

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]



108

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]



109

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

left parotid glandright parotid gland


110

there were no instances of either the spatial treatment plans or the non-spatial treatment

plans yielding clinically significant changes in the DVHs. The slices show that there is

improved homogeneity in the target doses when spatial coefficients are used.

The slices also indicate that since the use of spatial coefficients results in the target

voxels weighing more heavily than other voxels, the model is more willing to deliver dose

to critical structures rather than overdose or underdose the target. This helps provide a

uniform dose in the target, and should still be acceptable as the cumulative dose for all

critical structures remains within acceptable levels and there are no instances of sacrificing

organs that were not already sacrificed in the non-spatial plan.

Because more critical structure voxels receive dose in the spatial plans, the dose

deposited in the target structures is more spread out, and thus the maximum dose

received by the critical structure voxels is less than in the non-spatial plans. This of course

means that more voxels are exposed to radiation, but the levels are lower and the amount

of radiation still falls within clinically acceptable limits. The resulting improvement in

homogeneity is evident for each of the cases, but the effect of the more spread out dose is

best illustrated in the second fraction of each case.

Figures 4-8–4-14 show the DVHs and slices for some of the tested cases. In particular,

Figures 4-9, 4-10, 4-11 and 4-14 demonstrate that the spatial coefficients reduce the

amount of dose delivered outside of the targets when compared to their respective

non-spatial plans in Figures 4-2, 4-3, 4-4 and 4-7.


The fractionation model presented allows for the creation of guaranteed optimal

fluence maps for each fraction of a patient’s treatment. These fluence maps can be easily

divided into the appropriate number of fractions without sacrificing optimality. Using the

primal dual interior point method, the fractionation model obtains fluence maps for each

target in a clinically feasible amount of time. As expected, the computation time required

to generate two fluence maps for a two-target case is more than the time necessary to

111

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]


Figure 4-8. Target DVHs, saliva DVHs and axial slices in Fractions 1 (left) and 2 (right)using spatial coefficients.

112

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]



113

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]



114

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]



115

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]



116

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]



117

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1 ∪ PTV2

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

PTV2PTV1

0 1020304050607080900

20

40

60

80

100


Dose [Gy]

Vol

ume

[Fra

ctio

nal]

left parotid glandright parotid gland


118

generate a single FMO plan, but the computation times are still acceptable. Further

parameter tuning could possibly yield better results.

The addition of spatial coefficients in the model allows for improved homogeneity,

but does not seem likely to provide additional organ sparing. The improved homogeneity

alone is enough to warrant the inclusion of spatial information in the model. The model is

sensitive to the changes in the spatial coefficients, so further parameter tuning will have to

be performed in small incremental changes.

Currently, the model assumes that prior to each fraction, each target voxel has

received exactly the prescribed amount of dose up to that point in time. While we

have assumed that over/underdose in one fraction should not be compensated by

under/overdose in another fraction, it may in fact be advantageous to allow for some

degree of compensation. The fractionation formulation proposed affords enough flexibility

to model such a scenario. For example, say a physician would like to allow underdose in

target s in previous fractions to be compensated by up to ξ Gy of overdose in the current

fraction. Then, for target structure s, the Ps term in the objective function would be

replaced by the expression max{zjs, Ps − ξ}. As this type of discontinuity already exists in

the model, the structure of the model would not be altered by making this modification.

Other future research possibilities include further parameter testing to employ the

model on other cancer site treatments, for example, lung and prostate cancers.

119

CHAPTER 5A MONTE CARLO METHOD FOR MODELING DOSE DEPOSITION

5.1 Introduction

The FMO problem relies on the calculation of the amount of total radiation dose

received in each voxel. The dose in a voxel is determined by the paths the photons in

the radiation beams follow through the patient. Some photons may collide with particles

inside the patient and scatter in any direction, while others may collide with particles

and be absorbed. Still other photons may pass entirely through the patient with no

collisions. Due to the unpredictable nature of the radiation beam inside the patient, the

dose received in a voxel can only be accurately obtained through Monte Carlo simulations.

A simple linear relationship is assumed between total dose and beamlet fluences and is

commonly accepted as a satisfactory dose approximation in IMRT optimization. Errors of

as much as 30% have been reported for photon beams near tissue inhomogeneities (Ma et

al. [5]).

For IMRT optimization, particularly with advent of image-guided IMRT (IGIMRT),

or 4D IMRT, the FMO problem must be solved extremely quickly to create real-time

treatment plans. Thus, the speed of the FMO problem is paramount. While Monte Carlo

simulation may provide the most accurate measure of dose, the lengthy computation

time renders the method impractical for clinical use. We propose a Monte Carlo method

that performs a limited number of histories to obtain a noisy approximation of the dose

distribution of each beamlet to which a smoothing function can be applied in order to

determine an accurate dose distribution. The anticipation is that few histories will be

required, and that this approach can be clinically feasible.

Recently, a similar approach has been taken by Jelen and Alber [89] and Jelen et al.

[90] with good results. Jelen et al. [90] acknowledge that there is some loss of accuracy

at the beam’s edge due to a lack of lateral density correction and the effects arising from

MLC systems, for example, tongue-and-groove and inter-leaf scatter. Jelen and Alber

120

[89] pursue the issue of density scaling, but the MLC effects have not yet been addressed.

Section 5.6 proposes some possible methods of accounting for such MLC effects.

5.2 Monte Carlo Engine

The “Dose Planning Method” (DPM) (Sempau et al. [91]) program will be used

to perform the Monte Carlo simulations. DPM is designed to simulate the transport of

photons in radiotherapy class problems. DPM is primarily based on the public domain

code PENELOPE (Baro et al. [92], Sempau et al. [93]).

This study focuses on modeling a finite sized pencil beam emanating from a 6MV

linear accelerator. A finite sized pencil beam is a beam of finite sized that is parallel to

the point source of radiation. To determine a reasonably accurate measure of the dose of

a single beamlet in a given tissue, approximately one billion histories are run in DPM.

As fewer histories are run, the inaccuracies of the dose resulting from the Monte Carlo

experiment grow. Figure 5-4 shows how the noise in the depth-dose curve of the beamlet

becomes increasingly pronounced in relation to the number of histories. As shown by

Table 5-1, the amount of time required to run each experiment is approximately linear in

the number of histories recorded. Thus, it is impractical to run the number of histories

necessary for acceptable accuracy.

5.3 Dose Distribution of a Beamlet

The accuracy of a treatment plan is contingent upon the accuracy of the calculated

dose deposited by each beamlet in the plan. Because the particles in a beamlet scatter in

three dimensional space, multiple dose distributions must be considered to satisfactorily

model the beamlet’s affect on the patient’s tissue. These distributions arise from the

amount of radiation the beamlet deposits as a function of depth (the depth-dose curve),

and from the amount of radiation radiating outward from the center of the beamlet (the

lateral penumbra).

121

5.3.1 Depth-Dose Curve

The depth-dose curve represents the radiation intensity deposited by the beamlet

in the tissue through which it passes as function of depth. Figure 5-1 shows the dose

distribution of a single 6MV beamlet in various tissues obtained from the DPM simulations.

The dose distribution of a beamlet in water is empirically known, and the results from the

DPM simulation in water can be easily verified to be correct. Muscle, which has nearly

identical density as water (the densities of muscle and water are 1.04g/cm3 and 1.00g/cm3,

respectively), has nearly the same depth-dose distribution as water. As expected, a

beamlet passing through lung tissue, which is significantly less dense than water, does not

lose its intensity as quickly as it travels through the less dense tissue. Lastly, a simulation

with inhomogeneous tissue is considered. A simulation of muscle with a 10-cm thick

layer of lung located at a depth of 10cm shows a dose distribution that when the beamlet

reaches the less-dense segment of lung, its depth-dose curve becomes less steep, indicating

that less radiation intensity is lost through the lung than through the muscle. Once the

deeper layer of muscle is reached, the steepness of the depth-dose curve increases again.

0 5 10 15 20 25 300.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1Depth dose curve of beamlet in various tissues after 1B histories

depth (cm)

rela

tive

dose

(%

)

watermusclelungmuscle−lung−muscle

Figure 5-1. Dose distribution of a single beamlet in various tissues.

122

Although it may seem unintuitive that the depth-dose curve increases at shallow

depths, this behavior is called the build-up curve, and is explained by the likelihood of

electrons scattering out of the tissue and into air at shallow depths. Because the density

of air is extremely small, an electron that reaches air is likely to travel very far away from

the tissue, and therefore unlikely to return to the tissue and deposit radiation dose. Once

the depth increases passes a certain point, the electrons cannot leave the tissue and the

amount of dose received in the tissue increases. Once that point is reached, the amount

of radiation delivered by the beamlet decreases monotonically in depth, as would be

expected.

5.3.2 Lateral Penumbra

In addition to the dose distribution occuring as the beamlet penetrates the tissue,

there is a dose distribution spreading away from the beamlet. Just as light emanating

from a flashlight in a dark room does not have a discrete boundary between light and

dark, the radiation delivered by a beamlet also does not have a discrete boundary between

what is and is not irradiated. With a circular flashlight beam shown onto a flat surface,

it is apparent from the distribution of the illuminated portion of the surface that some of

the light is diffused into the surrounding darkness as a result of scatter. If the distribution

of light in the circular projection of the flashlight beam is plotted, a bell-shaped curve

describes the brightest point in the center of the illuminated disc decreasing in brightness

as the edge of the illuminated disc is approached, eventually reaching complete darkness.

This behavior is parallel to the behavior of a beamlet passing through any medium.

From The Physics of Radiation Therapy [94], the penumbra of a beam is the region

at the edge of a radiation beam, over which the dose rate changes rapidly as a function of

distance from the beam axis. Hence, the distribution of radiation dose originating from the

beamlet described above is called the lateral penumbra. Figure 5-2 shows the colorwash

of dose distribution consistituting the lateral penumbra, while Figure 5-3 shows the dose

123

distance from beam

Lateral penumbra of a finite sized pencil beam

0

0

1

2

3

4

5

6

7

8

9

10

11x 10

−4

Figure 5-2. Colorwash of the lateral penumbra of a finite sized pencil beam

distribution of the lateral penumbra at a fixed depth in one dimension obtained from one

billion Monte Carlo histories of a 5-cm finite sized pencil beam in water .

5.4 Methodology to Model a Beamlet

Modeling the dose distribution of a beamlet is relatively straightforward for a beamlet

in a single medium. The difficulty arises when multiple mediums are traversed by the

beamlet because the varying densities affect the particle scattering of the beam, thus

affecting both the depth-dose curve and the lateral penumbra. As previously stated, errors

of as much as 30% have been reported for photon beams near tissue inhomogeneities

(Ma et al. [5]). Because there are numerous inhomogeneities in most cancer treatment

sites, these inhomogeneities are of particular interest. The beamlet’s behaviour at the

boundary of different tissue types cannot be determined as easily, and thus requires Monte

Carlo simulation. In designing an IMRT treatment plan for a patient, there can be more

than a dozen different structures (tissue types) with complicated boundary geometries.

124

0 2 4 6 8 10 120

0.2

0.4

0.6

0.8

1

1.2x 10

−3 Lateral penumbra of 5−cm finite size pencil beam

distance (cm)

dose

(G

y)

Figure 5-3. Plot of the lateral penumbra of a finite sized pencil beam

Knowledge of a beamlet’s behaviour given certain tissue inhomogeneities can be very

useful in accurately determining dose in a voxel.

5.4.1 Modeling the Depth-Dose Curve

In the section, we analyze the behavior of the depth-dose curve under both single

tissue and multiple tissue scenarios. The goal of the analyzation is to determine the

minimum number of Monte Carlo histories required to obtain a reasonably accurate

approximating function of the dose deposited at each depth in the tissue. For both

the instances of only a single medium and multiple mediums, this is done by fitting

the depth-dose curve from Monte Carlo experiments with varying numbers of histories

to high-degree polynomial functions. The polynomial fits are then compared to the

polynomial fit of a very accurate measure of the depth-dose curve obtained from an

number of Monte Carlo histories accepted to be satisfactorily accurate.

The number of histories recorded in the Monte Carlo simulation can have a drastic

effect on the accuracy of the data collected. For example, Figure 5-4 demonstrates the vast

125

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

1.2

1.4

depth (cm)

rela

tive

dose

(%

)

Depth dose curve of beamlet in water

1B histories100M histories10M histories1M histories

Figure 5-4. Observed depth-dose curve in water for several histories.

variation observed in the depth-dose curve of a beamlet in water for histories ranging from

one million to one billion. It is hoped that after a certain number of histories, the function

approximation of the data will closely follow the function approximation of very accurate

data obtained from a large number of histories.

For a beamlet in both homogeneous and heterogeneous tissue, the depth-dose curve

can be modeled using a polynomial function of order k. Although the depth-dose curve

may exhibit changes in concavity in the presence of tissue inhomogeneity, a high degree

polynomial will capture the curve’s behavior.

The variation of a k-degree polynomial fitted to n-history Monte Carlo data is

measured by

vk,n,n′ =∥∥∥d(n′) − p(k,n)

∥∥∥2,

where d(n′) is the actual observed depth-dose curve from n′ Monte Carlo histories and

p(k,n) is the vector of approximated depth-dose values obtained from a polynomial fit of

126

0 5 10 15 20 25 300.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

depth (cm)

rela

tive

dose

(%

)

Polynomial fits compared to 1B−history data in water

1B histories: k=27, var=0.050278100M histories: k=23, var=0.08015810M histories: k=28, var=0.218771M histories: k=24, var=0.54071

Figure 5-5. Polynomial fits of several histories compared to the observed 1B-historydepth-dose curve in water.

degree k to data obtained from n Monte Carlo histories. It is desirable to have that n′ > n

to assess the quality of the polynomial fit compared to more accurate data.

In this study, the accuracy of the polynomial obtained is judged by its variation from

the observed data from a very large number of Monte Carlo histories, that is, n′ >> n

in the calculation of vk,n. Figure 5-5 shows that for the illustrated number of histories,

the polynomial fit from 100 million histories closely resembles not only the polynomial fit

from one billion histories, but also the actual data collected from one billion histories. The

polynomial fit to one million histories is clearly an unsatisfactory approximation to the

data collected from one billion histories.

For several numbers of Monte Carlo histories, the best approximating polynomial

function with degree in the range [k, k] is found, that is, k∗ = arg mink∈[k,k]{vk,n}. Several

degrees are tested because the degree of the polynomial can significantly affect the quality

of the fit, even for polynomials that are only one degree apart. Figure 5-6 illustrates the

amount of variation observed in the polynomial approximation as a function of the degree

127

5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

degree of polynomial (k)

varia

tion

(v k

,n,1

e9 )

Variation of polynomial fit (v k,n,1e9

) as function of degree (k)

Figure 5-6. Variation of polynomial fit as function of degree.

of the polynomial for polynomials fitted to the depth-dose curve of a beamlet in water

obtained from 1 billion histories.

5.4.2 Modeling the Lateral Penumbra

In the section, we analyze the behavior of the lateral penumbra under both single

tissue and multiple tissue scenarios. The lateral penumbra of a beam is a bell-shaped

curve that can be approximated as the sum of error function pairs. The error function,

erf(x), is twice the integral of the Gaussian distribution with mean 0 and variance of 1/2:

erf(x) =2√π

∫ x

0

e−t2dt.

Figure 5-7A demonstrates a sample error function. While a single side of the lateral

penumbra of a beamlet resembles an error function, a closer approximation to a single side

of the lateral penumbra is represented as the average of two error functions given by

a

2

[erf

(x + x0

σ

)− erf

(x− x0

σ

)],

128

−3 −2 −1 0 1 2 3−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Sample error function: erf(x)

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

Sample error function pair for lateral penumbra

A B

Figure 5-7. An error function and an error function pair. A) Error function. B) Errorfunction pair.

where a is the amplitude, x0 is the offset and σ is the variation of the two error functions.

The expression is divided by 2 to take the average of the error function pair. An example

of an error function pair is given in Figure ??B.

Because the lateral penumbra of a beamlet resembles an error function on both the

left- and right-hand sides of the beam center, the lateral penumbra L(x) is represented as

the sum of the average of N error function pairs, given by

L(x) =N∑

i=1

ai

2

[erf

(x + x0i

σi

)− erf

(x− x0i

σi

)],

where ai is the amplitude, x0iis the offset and σi is the variation of error function pair i,

i = 1, . . . , N .

To determine the parameters ai, x0iand σi for each of the N error function pairs, a

Levenberg-Marquardt quasi-Newton minimization method is employed. This method takes

as input N and an initial guess of the parameters and returns a locally optimal solution

to the problem of minimizing the variation between the real data and the sum of the error

function pairs.

At a given depth in the tissue, the amplitude of the error function is determined by

the value of the depth-dose curve at that depth. Thus, for each tissue type, it is only

129

0 2 4 6 8 10 120

0.2

0.4

0.6

0.8

1

distance (cm)

rela

tive

dose

(%

)

Lateral penumbra of 5−cm finite sized pencil beam in water


Figure 5-8. Lateral penumbra for several numbers of Monte Carlo histories.

necessary to model a single lateral penumbra, and then that model can be extended to

all depths simply by manipulating the amplitude according to the depth-dose curve.

Figure 5-3 shows the lateral penumbra of a 5-cm finite sized beamlet at a fixed depth

in water for a number of Monte Carlo histories deemed to yield a satisfactorily accurate

representation of the dose deposited in the tissue. Using the method described above, the

lateral penumbra was modeled to yield the approximation to the observed data collected

for the various Monte Carlo histories shown in Figure 5-8.

In a similar fashion to the method for modeling the depth-dose curve, the method

for modeling the lateral penumbra consists of fitting the sum of error function pairs to

the lateral penumbra data. The quality of these fits is judged by their variation from the

observed data for a sufficiently large number of Monte Carlo histories to obtain accurate

dose information.

130

0 2 4 6 8 10 120

0.2

0.4

0.6

0.8

1

1.2

distance (cm)

rela

tive

dose

(%

)

Error function pair fits compared to 1B−history data in water

1B histories: var=0.070667100M histories: var=0.07541410M histories: var=0.145111M histories: var=1.1829

Figure 5-9. Error function fits of several histories compared to the observed 1B-historylateral penumbra of a beamlet in water.

Just in as the method for determining the quality of the depth-dose curve approximation,

the variation of the error function fit from the actual lateral penumbra is calculated as

νn,n′ =∥∥∥L(n′) − L(n,N)

∥∥∥2,

where L(n′) is the observed lateral penumbra data from a simulation of n′ histories, and

L(n,N) is the approximated lateral penumbra obtained from the parameters fitted to the

expression LN(x). It is desirable to have that n′ > n.

Figure 5-9 displays the error function pair fits obtained from the Levenburg-Marquardt

method, as well as the variation of the fits from the observed data from one billion

histories. The variation is measured in the same manner as described in Section 5.4.1.

It is anticipated that although the lateral penumbra exhibits different dose distributions

in materials of different densities, the distribution will only show a fundamental change

in shape if the beam simultaneously hits multiple tissues of varying densities. In such a

situation, the penumbra, which is taken to be symmetric about the center of the beam in

131

Table 5-1. Computation times in minutes of Monte Carlo simulations

n Water Muscle Lung Muscle-Lung-Muscle1e9 222.184 211.887 111.318 186.894

100e6 20.543 21.256 11.239 18.70110e6 2.210 2.234 1.269 1.9861e6 0.244 0.339 0.233 0.309

homogeneous tissue, will no longer be symmetric. To model the lateral penumbra under

inhomogeneous material, a sum of error function pairs can still be employed, though it

may be necessary to increase the number of error function pairs required. The difficulty

will lie in correctly determining when the addition of additional error function pair will

be needed. A possible measure could be the variation between the lateral penumbra

approximation and the observed data.

5.5 Results

The homogeneous tissues tested are water, muscle and lung, and the heterogeneous

material tested consists of muscle and lung. Each scenario is considered to have a depth

of 30cm. The voxel sizes are 5mm × 5mm × 5mm, and a 5-cm finite sized pencil beam

is considered. For each simulation, tests were run with 1 billion, 100 million, 10 million

and 1 million Monte Carlo histories in DPM on a Mac OS X 10.4.6 machine with dual

2.3GHz PowerPC G5 processors and 8GB of RAM. Due to time constraints, the muscle

tests are run to a maximum of 100 million iterations, and all comparisons to the fit quality

are made to this 100-million-history data instead of the 1-billion-history data used for the

other simulations.

As can be seen from the computation times in Table 5-1, the run time of DPM is

approximately linear in the number of histories. Altough a larger number of Monte Carlo

histories yields improved accuracy, the maximum number of histories considered is one

billion because of time limitations and the satisfactory accuracy of the 1-billion-history

runs.

132

For each of the tested tissue types, the depth-dose curves and lateral penumbras

were modeled using the methods described in Section 5.4. For the polynomial fits of the

depth-dose curve, the values k and k are chosen as 10 and 45, respectively. By choosing

the polynomial approximation over such a large range of degree values, an acceptably

accurate fit is likely to be found.

For the lateral penumbra, N was chosen as 4 because in addition to the obvious need

for two error functions to model the sides of the lateral penumbra, an additional error

function is needed to model each tail with reasonable accuracy. For example, the four

error functions used to model the lateral penumbra of a beamlet in water (Figure 5-9) are

shown separately in Figure 5-10. The computation times required to obtain each of the

function approximations are displayed in Table 5-2.

The initial parameters ai, x0iand σi for each error function pair i, i = 1, . . . , N , used

to approximate the lateral penumbra are obtained by the following method. Of the four

error function pairs considered, two of the error functions—I = {1, 2}—are used to model

the steep sides of the lateral penumbra, and the other two error functions—I = {3, 4}—are

used to model the tails of the dose distribution. At a given depth z, the amplitude ai is

ai =

d(z) i ∈ I

d(z)/50 i ∈ I ,

where d(z) represents the value of the depth-dose curve approximation at a depth z. The

expression for the amplitude when i ∈ I was obtained by experimenting with several

different fractions of d(z).

The σ value of the error functions determines the shape of the error function curve.

As σ increases, the curve becomes increasingly spread out. Thus, it is desirable to have

a small σi value for i ∈ I since the error function in I only need to model the sides of

the lateral penumbra, and a larger σi value for i ∈ I since the error function in I need

to model the elongated tails of the lateral penumbra. For the tissues tested, the σi values

133

Table 5-2. Computation times in seconds of approximating function fits to the dosedistribution. The polynomial fits to the depth dose curve are represented byD.D., and the error function fits to the lateral penumbra are represented byLat.Pen.

Water Muscle Lung Muscle-Lung-Musclen D.D. Lat.Pen. D.D. Lat.Pen. D.D. Lat.Pen.. D.D. Lat.Pen.

1e9 0.078 2.640 0.078 2.422 0.094 1.062 0.078 n/a100e6 0.078 1.172 0.078 2.625 0.828 0.906 0.109 n/a10e6 0.110 3.454 0.109 1.390 2.609 2.594 0.093 n/a1e6 0.094 1.407 0.094 1.172 1.063 0.953 0.078 n/a

used are

σi =

0.4 i ∈ I

0.8 i ∈ I ,

These values were obtained through experimentation.

For the 5-cm finite sized pencil beams used in this experiment, the offsets x0iwere

empirically set at values of 8.5, -3.5, 11 and -1 for i = 1, . . . , N , respectively. A method of

identifying the locations of these offsets based on the Monte Carlo data can be developed

by basing the offsets on the slope of the observed data, and is planned for future research.

The results for the fits of both the depth-dose curve and the lateral penumbra of a

beamlet in water are shown in the examples in Section 5.4. Figures 5-11-5-12 show the

results of the fits for the muscle and lung tissues. From the computational results, it is

clear that the time to obtain fits to the Monte Carlo data is insignificant compared with

the amount of time required to run the Monte Carlo histories, even for as few as 1 million

histories.

To test the model in the presence of tissue inhomogeneity, a 10cm-thick layer of lung

between two 10cm-thick layers of muscle is considered. As expected, for the first 10cm,

the depth-dose curve of the muscle-lung-muscle case is identical to that of the muscle

depth-dose curve. Once the beamlet reaches the significantly less dense layer of lung (lung

has a density of 0.30g/cm3), a predominant change in the depth-dose curve is evident

(Figure 5-1). Once the layer of lung is reached, the rate of decrease in the amount of dose

deposited in the tissue decreases, that is, less radiation intensity is lost as the beamlet

134

0 2 4 6 8 10 12−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

distance (cm)

rela

tive

dose

(%

)

Error functions for 5−cm finite sized pencil beam in water

erf pair 1erf pair 2erf pair 3erf pair 4

Figure 5-10. Error function pairs summed to approximate a beamlet in water.

0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

1.2

1.4

depth (cm)

rela

tive

dose

(%

)

Depth dose curve of beamlet in muscle


0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

1.2

1.4

depth (cm)

rela

tive

dose

(%

)

Polynomial fits compared to 1B−history data in muscle


A B

Figure 5-11. Depth-dose curves in muscle tissue. A) Monte Carlo histories. B) Polynomialfits.

135

0 2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

distance (cm)

rela

tive

dose

(%

)

Lateral penumbra of 5−cm finite sized pencil beam in muscle


0 2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

distance (cm)

rela

tive

dose

(%

)

Error functions for 5−cm finite sized pencil beam in muscle


A B

Figure 5-12. Lateral penumbra curves in muscle tissue. A) Monte Carlo histories. B) Errorfunction fits.

0 5 10 15 20 25 300

0.5

1

1.5

depth (cm)

rela

tive

dose

(%

)

Depth dose curve of beamlet in lung


0 5 10 15 20 25 300

0.2

0.4

0.6

0.8

1

1.2

1.4

depth (cm)

rela

tive

dose

(%

)

Polynomial fits compared to 1B−history data in lung


A B

Figure 5-13. Depth-dose curves in lung tissue. A) Monte Carlo histories. B) Polynomialfits.

136

0 2 4 6 8 10 120

0.2

0.4

0.6

0.8

1

1.2

distance (cm)

rela

tive

dose

(%

)

Lateral penumbra of 5−cm finite sized pencil beam in lung


0 2 4 6 8 10 120

0.2

0.4

0.6

0.8

1

1.2

distance (cm)

rela

tive

dose

(%

)

Error function pair fits compared to 1B−history data in lung


A B

Figure 5-14. Lateral penumbra curves in lung tissue. A) Monte Carlo histories. B) Errorfunction fits.

passes through the lung tissue. When the beamlet reaches the second layer of muscle, this

rate increases again. The same approach used to model the depth-dose curve in a single

tissue continues to work well in multiple tissue. Figures 5-15A and 5-15B illustrate the

ability of a polynomial to approximate the depth-dose curve in inhomogeneous tissue.

Because testing the beamlet in a scenario where it could hit multiple tissues

simultaneously is reserved for future research, results for modeling the lateral penumbra in

the multiple-tissue scenario tested are identical to those for the single-tissue scenario. The

lateral penumbra at a given depth in a certain tissue can be modeled by using the dose

from the depth-dose curve at the given depth as the amplitude of the lateral penumbra.

The dose distribution in the lateral penumbra can then be modeled according to the same

error function pairs used in modeling the lateral penumbra in a single-tissue scenario of

the same medium.

Figure 5-16 illustrates the variations of the fits used to approximate the depth-dose

and lateral penumbra distributions of a beamlet in water as a function of the number

of histories. From this data, it is very clear that the accuracy of the beamlet model is

directly correlated with the number of Monte Carlo histories. It is interesting that there

is not a significant improvement in the beamlet model accuracy from 100 million to 1

137

0 5 10 15 20 25 300.2

0.4

0.6

0.8

1

1.2

1.4

1.6

depth (cm)

rela

tive

dose

(%

)

Depth dose curve of beamlet in muscle−lung−muscle


0 5 10 15 20 25 300.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

depth (cm)

rela

tive

dose

(%

)

Polynomial fits compared to 1B−history data in muscle−lung−muscle


A B

Figure 5-15. Depth-dose curves in heterogeneous muscle and lung tissue. A) Monte Carlohistories. B) Polynomial fits.

Table 5-3. Variation of fits to several numbers of histories with n′ = 1 billion.

Water Muscle Lung Muscle-Lung-Musclen vk∗,n,n′ νn,n′ vk∗,n,n′ νn,n′ vk∗,n,n′ νn,n′ vk∗,n,n′ νn,n′

1e9 0.050 0.071 0.046 0.105 0.057 0.097 0.052 n/a100e6 0.080 0.075 0.075 0.118 0.110 0.103 0.101 n/a10e6 0.219 0.145 0.203 0.124 0.330 0.125 0.213 n/a1e6 0.541 1.183 0.698 0.209 1.055 0.129 0.881 n/a

billion histories, and computing 100 million histories requires approximately one tenth of

the amount of time as computing 1 billion histories. Depending on the composition of the

tissue, it may be reasonably accurate to only require 10 million histories, particularly in

the depth-dose curve approximation.


In conclusion, the Monte Carlo approach presented is employed to model the dose

distribution of a beamlet using a limited number of histories. Using the polynomial

and error function pair fitting techniques described, dose distributions with satisfactory

accuracy can be obtained using at least a factor of 10 fewer Monte Carlo histories than

would otherwise be required. This can greatly decrease the amount of time required to

obtain dose data for beamlets in the FMO problem of IMRT treatment planning without

sacrificing accuracy.

138

Figure 5-16. Variations of the fits used to approximate the depth-dose and lateralpenumbra distributions as function of the number of histories.

106

107

108

109

0

0.2

0.4

0.6

0.8

1

1.2

1.4Variation of approximations as a function of number of histories

number of histories

varia

tion

Depth−dose: waterLateral penumbra: waterDepth−dose: muscleLateral penumbra: muscleDepth−dose: lungLateral penumbra: lungDepth−dose: muscle−lung−muscle

For future testing, more tests on the number of Monte Carlo histories needed will

be run as well, particularly with histories in the range of 10-100 million. More tests of

varying tissues, both homogeneous and heterogeneous, will be run to determine a smaller

range of degrees to be evaluated for the polynomial fit to the depth-dose curve. An

automated method of determining a quality set of initial parameters to model the lateral

penumbra will also be developed. Lastly, the scenario where a beamlet hits multiple

tissues simultaneously will be tested using our model for approximating the lateral

penumbra.

Jelen and Alber [89] and Jelen et al. [90] have demonstrated that a beamlet can

be modeled very effectively using an approach based on the one described here. This

approach was improved upon by scaling the modeling parameters according to tissue

density in Jelen and Alber [89]. Despite the sophistication of the density scaling method

employed, the model loses accuracy in the penumbra regions and at the edge of tissue

heterogeneities. This study also used a Levenberg-Marquardt algorithm to determine the

139

modeling parameters, and although the details of the implementation are not provided, it

is possible that with an improved initial guess or damping parameter, the algorithm could

converge to better modeling parameters, thus providing improved prediction of beamlet

behavior at the penumbra.

To further improve upon their work, the effects of the MLC must be considered. One

method of accounting for these effects could be to model the dose deposition of an entire

aperture rather than just the dose deposition of a single beamlet. As the number and

shape of apertures required to deliver an FMO-based IMRT optimization are unknown,

this method would be most practical if an aperture modulation approach—where aperture

fluences from a pre-defined set of apertures are chosen, instead of fluences from individual

beamlets—is employed instead of an FMO approach, as the number and shape of the

apertures in consideration are predetermined.

140

REFERENCES

[1] American Cancer Society. Cancer Facts and Figures Report. 2006.

[2] Murphy GP, Lawrence WL, Lenlard RE, eds. American Cancer Society Textbook onClinical Oncology. The American Cancer Society, 1995.

[3] Perez CA, Brady LW. Principles and Practice of Radiotherapy. Lippincott-Raven, 3edn., 1998.

[4] Steel GG. Basic Clinical Radiobiology for Radiation Oncologists. Edward ArnoldPublishers, 1994.

[5] Ma CM, Mok E, Kapur A, Findley D, Brain S, Boyer AL. Clinical implementation ofa monte carlo treatment planning system. Medical Physics 1999;26:2133–43.

[6] Bortfeld T. Optimized planning using physical objectives and constraints. SeminRadiat Oncol 1999;9:20–34.

[7] Alber M, Nusslin F. An objective function for radiation treatment optimization basedon local biological measures. Phys Med Biol 1999;44:479–493.

[8] Jones LC, Hoban PW. Treatment plan comparison using equivalent uniformbiologically effective dose (eubed). Phys Med Biol 2000;45:159–170.

[9] Kallman P, Lind BK, Brahme A. An algorithm for maximizing the probabilityof complication–free tumor–control in radiation-therapy. Phys Med Biol 1992;37:871–890.

[10] Mavroidis P, Lind BK, Brahme A. Biologically effective uniform dose for specification,report and comparison of dose response relations and treatment plans. Phys Med Biol2001;46:2607–2630.

[11] Niemierko A. Reporting and analyzing dose distributions: a concept of equivalentuniform dose. Medical Physics 1997;24:103–110.

[12] Niemierko A, Urie M, Goitein M. Optimization of 3d radiation-therapy with bothphysical and biological end-points and constraints. Int J Radiat Oncol Biol Phys1992;23:99–108.

[13] Wu QW, Djajaputra D, Wu Y, Zhou JN, Liu HH, Mohan R. Intensity-modulatedradiotherapy optimization with geud-guided dose-volume objectives. Phys Med Biol2003;48:279–291.

[14] Wu QW, Mohan R, Niemierko A, Schmidt-Ullrich R. Optimization ofintensity-modulated radiotherapy plans based on the equivalent uniform dose.Int J Radiat Oncol Biol Phys 2002;52:224–235.

141

[15] Hamacher HW, Kufer KH. Inverse radiation therapy planning a multiple objectiveoptimization approach. Discrete Applied Mathematics 2002;118:145–161.

[16] Bednarz G, Michalski D, Houser C, Huq MS, Xiao Y, Anne PR, Galvin JM. The useof mixed-integer programming for inverse treatment planning with pre-defined fieldsegments. Phys Med Biol 2002;47:2235–2245.

[17] Ferris MC, Meyer RR, D’Souza W. Radiation treatment planning: Mixed integerprogramming formulations and approaches. In G Appa, L Pitsoulis, HP Williams,eds., Handbook on Modelling for Discrete Optimization. Springer-Verlag, New York,NY, 2006;317–340.

[18] Langer M, Brown R, Urie M, Leong J, Stracher M, Shapiro J. Large-scaleoptimization of beam weights under dose-volume restrictions. Int J Radiat OncolBiol Phys 1990;18:887–893.

[19] Langer M, Morrill S, Brown R, , Lee O, Lane R. A comparison of mixed integerprogramming and fast simulated annealing for optimizing beam weights in radiationtherapy. Medical Physics 1996;23:957–964.

[20] Lee EK, Fox T, Crocker I. Simultaneous beam geometry and intensity mapoptimization in intensity-modulated radiation therapy treatment planning. An-nals of Operations Research 2003;119:165–181.

[21] Lee EK, Fox T, Crocker I. Integer programming applied to intensity-modulatedradiation therapy treatment planning. Int J Radiat Oncol Biol Phys 2006;64:301–320.

[22] Shepard DM, Ferris MC, Olivera GH, Mackie TR. Optimizing the delivery ofradiation therapy to cancer patients. SIAM Review 1999;41:721–744.

[23] Romeijn HE, Ahuja RK, Dempsey JF, Kumar A, Li JG. A novel linear programmingapproach to fluence map optimization for intensity modulated radiation therapytreatment planning. Phys Med Biol 2003;38:3521–3542.

[24] Romeijn HE, Ahuja RK, Dempsey JF, Kumar A, Li JG. A column generationapproach to radiation therapy treatment planning using aperature modulation. SIAMJournal of Optimization 2005;15:838–862.

[25] Romeijn HE, Dempsey JF, Li JG. A unifying framework for multi-criteria fluencemap optimization models. Phys Med Biol 2004;49:1991–2013.

[26] Romeijn HE, Ahuja RK, Dempsey JF, Kumar A. A new linear programmingapproach to radiation therapy treatment planning problems. Operations Research2006;54:201–216.

[27] Das SK, Marks LB. Selection of coplanar or noncoplanar beams usingthree-dimensional optimization based on maximum beam separation and minimized

142

nontarget irradiation. Int J Radiat Oncol Biol Phys 1997;38:643–655.

[28] Haas OC, Burnham KJ, Mills J. Optimization of beam orientation in radiotherapyusing planar geometry. Phys Med Biol 1998;43:2179–2193.

[29] Schreibmann E, Lahanas M, Xing L, Baltas D. Multiobjective evolutionaryoptimization of the number of beams, their orientations and weights forintensity-modulated radiation therapy. Phys Med Biol 2004;49:747–770.

[30] Chao KSC, Blanco AI, Dempsey JF. A conceptual model integrating spatialinformation to assess target volume coverage for IMRT treatment planning. Int JRadiat Oncol Biol Phys 2003;56:1438–1449.

[31] Nocedal J, Wright SJ. Numerical Optimization. Springer-Verlag, 1999.

[32] Ezzell GA. Genetic and geometric optimization of three-dimensional radiation therapytreatment planning. Medical Physics 1996;23:293–305.

[33] Li Y, Yao J, Yao D. Automatic beam angle selection in IMRT planning using geneticalgorithm. Phys Med Biol 2004;49:1915–1932.

[34] Li Y, Yao J, Yao D, Chen W. A particle swarm optimization algorithm for beamangle selection in intensity-modulated radiotherapy planning. Phys Med Biol 2005;50:3491–3514.

[35] Bortfeld T, Schlegel W. Optimization of beam orientations in radiation therapy: sometheoretical considerations. Phys Med Biol 1993;38:291–304.

[36] Djajaputra D, Wu Q, Wu Y, Mohan R. Algorithm and performance of a clinicalIMRT beam-angle optimization system. Phys Med Biol 2003;48:3191–3212.

[37] Lu HM, Kooy HM, Leber ZH, Ledoux RJ. Optimized beam planning for linearaccelerator-based stereotactic radiosurgery. Int J Radiat Oncol Biol Phys 1997;39:1183–1189.

[38] Pugachev A, Xing L. Incorporating prior knowledge into beam orientationoptimization in IMRT. Int J Radiat Oncol Biol Phys 2002;54:1565–1574.

[39] Rowbottom CG, Oldham M, Webb S. Constrained customization of non-coplanarbeam orientations in radiotherapy of brain tumours. Phys Med Biol 1999a;44:383–399.

[40] Stein J, Mohan R, Wang XH, Bortfeld T, Wu Q, Preiser K, Ling CC, Schlegel W.Number and orientations of beams in intensity-modulated radiation treatments.Medical Physics 1997;24:149–160.

[41] Soderstrom S, Brahme A. Selection of suitable beam orientations in radiation therapyusing entropy and fourier transform measures. Phys Med Biol 1992;37:911–924.

143

[42] Soderstrom S, Brahme A. Which is the most suitable number of photon beam portalsin coplanar radiation therapy? Int J Radiat Oncol Biol Phys 1995;33:151–59.

[43] Rowbottom CG, Webb S, Oldham M. Beam-orientation customization using anartificial neural network. Phys Med Biol 1999b;44:2251–2262.

[44] Gokhale P, Hussein EM, Kulkarni N. The use of beams eye view volumetrics in theselection of non-coplanar radiation portals. Medical Physics 1994;23:153–163.

[45] Meedt G, Alber M, Nusslin F. Non-coplanar beam direction optimization forintensity-modulated radiotherapy. Phys Med Biol 2003;48:2999–3019.

[46] Chen GT, Spelbring DR, Pelizzari CA, Balter JM, Myrianthopoulos LC, VijayakumarS, Halpern H. The use of beams eye view volumetrics in the selection of non-coplanarradiation portals. Int J Radiat Oncol Biol Phys 1992;23:153–163.

[47] Cho BCJ, Roa HW, Robinson D, Murray B. The development of target-eye-viewmaps for selection of coplanar or noncoplanar beams in conformal radiotherapytreatment planning. Medical Physics 1999;26:2367–2372.

[48] Goitein M, Abrams M, Rowell D, Pollari H, Wiles J. Multi-dimensional treatmentplanning: Ii. beams eye-view, back projection, and projection through CT sections.Int J Radiat Oncol Biol Phys 1983;9:789–97.

[49] Pugachev A, Xing L. Computer-assisted selection of coplanar beam orientations inintensity-modulated radiation therapy. Phys Med Biol 2001;46:2467–2476.

[50] Pugachev A, Xing L. Pseudo beam’s-eye-view as applied to beam orientation selectionin intensity-modulated radiation therapy. Int J Radiat Oncol Biol Phys 2001;51:1361–1370.

[51] Holder A, Salter B. A tutorial on radiation oncology and optimization. InH Greenberg, ed., Tutorials on Emerging Methodologies and Applications in Op-erations Research. Kluwer Academic Press, Boston, MA, 2004.

[52] Morrill SM, Lane RG, Jacobson G, Rosen II. Treatment planning optimization usingconstrained simulated annealing. Phys Med Biol 1991;36:1341–61.

[53] Oldham M, Khoo V, Rowbottom CG, Bedford J, Webb S. A case study comparingthe relative benefit of optimising beam-weights, wedge-angles, beam orientationsand tomotherapy in stereotactic radiotherapy of the brain. Phys Med Biol 1998;43:2123–46.

[54] Rowbottom CG, Webb S, Oldham M. Improvements in prostate radiotherapy fromthe customization of beam directions. Medical Physics 1998;25:1171–1179.

144

[55] Wang X, Zhang X, Dong L, Lie H, Wu Q, Mohan R. Development of methods forbeam angle optimization for IMRT using an accelerated exhaustive search strategy.Int J Radiat Oncol Biol Phys 2004;60:1325–37.

[56] Wang X, Zhang X, Dong L, Liu H, Gillin M, Ahamad A, Ang K, Mohan R.Effectiveness of noncoplanar IMRT planning using a parallelized multiresolutionbeam angle optimization method for paranasal sinus carcinoma. Int J Radiat OncolBiol Phys 2005;63:594–601.

[57] Woudstra E, Heijman BJM. Automated beam angle and weight selection inradiotherapy treatment planning applied to pancreas tumors. Int J Radiat OncolBiol Phys 2004;56:878–88.

[58] D’Souza WD, Meyer RR, Shi L. Selection of beam orientations in intensity-modulatedradiation therapy using single-beam indices and integer programming. Phys Med Biol2004;49:3465–3481.

[59] Ehrgott M, Johnston R. Optimisation of beam directions in intensity modulatedradiation therapy planning. OR Spectrum 2003;25:251–264.

[60] Lim J, Ferris M, Shepard D, Wright S, Earl M. An optimization framework forconformal radiation treatment planning. INFORMS Journal On Computing 2006.

[61] Wang C, Dai J, Hu Y. Optimization of beam orientations and beam weights forconformal radiotherapy using mixed integer programming. Phys Med Biol 2003;48:4065–4076.

[62] Fox C, Romeijn HE, Dempsey JF. Fast voxel and polygon ray-tracing algorithms forIMRT treatment planning, 2005. Submitted to Medical Physics.

[63] Siddon RL. Prism representation: a 3d ray-tracing algorithm for radiotherapyapplications. Phys Med Biol 1985;8:817–824.

[64] Siddon RL. Fast calculation of the exact radiological path for a three-dimensional CTarray. Medical Physics 1985;12:252–255.

[65] Jacobs F, Sundermann E, Sutter BD, Christiaens M, Lemahieu I. A fast algorithmto calculate the exact radiological path through a pixel or voxel space. Journal ofComputing and Information Technology (CIT) 1998;6:89–94.

[66] Aleman DM, Romeijn HE, Dempsey JF. Beam orientation optimization methodsin intensity modulated radiation therapy treatment planning. IIE ConferenceProceedings 2006.

[67] Aleman DM, Romeijn HE, Dempsey JF. A response surface approach to beamorientation optimization in intensity modulated radiation therapy treatment planning.In review 2006.

145

[68] Jones DR. A taxonomy of global optimization methods based on response surfaces.Journal of Global Optimization 2001;21:345–383.

[69] Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensiveblack-box functions. Journal of Global Optimization 1998;13:455–492.

[70] Csallner AE, Csendes T, Markot MC. Multisection in interval branch-and-boundmethods for global optimization i. theoretical results. Journal of Global Optimization2000;16:371–392.

[71] Lagouanelle J, Soubry G. Optimal multisections in interval branch-and-boundmethods of global optimization. Journal of Global Optimization 2004;30:23–38.

[72] Epperly TGW, Pistikopoulos EN. A reduced space branch and bound algorithm forglobal optimization. Journal of Global Optimization 1997;11:287–311.

[73] Barrientos O, Correa R. An algorithm for global minimization of linearly constrainedquadratic functions. Journal of Global Optimization 2000;16:77–93.

[74] Thoai NV. Convergence of duality bound method in partly convex programming.Journal of Global Optimization 2002;22:263–270.

[75] Tuy H. On solving nonconvex optimization problems by reducing the duality gap.Journal of Global Optimization 2005;32:349–365.

[76] Phong TQ, An LTH, Tao PD. Decomposition branch and bound method for globallysolving linearly constrained indefinite quadratic minimization problems. OperationsResearch Letters 1995;17:215–220.

[77] Bomze I. Branch-and-bound approaches to standard quadratic optimization problems.Journal of Global Optimization 2002;2:17–37.

[78] Cambini R, Sodini C. Decomposition methods for solving nonconvex quadraticprograms via branch and bound. Journal of Global Optimization 2005;33:313–336.

[79] Aleman DM, Kumar A, Ahuja RK, Romeijn HE, Dempsey JF. Neighborhood searchapproaches to beam orientation optimization in intensity modulated radiation therapytreatment planning. in review 2007.

[80] Kumar A. Novel methods for intensity-modulated radiation therapy treatmentplanning. Ph.D. thesis, University of Florida, 2005.

[81] Geman S, Geman D. Stochastic relaxation, gibbs distributions, and the bayesianrestoration of images. IEEE Transactions on Pattern Analysis and Machine Intelli-gence 1984;6:721–741.

146

[82] Gelfand AE, Smith AFM. Sampling based approaches to calculating marginaldensities. Journal of the American Statistical Association 1990;85:398–409.

[83] Smith RL. A monte carlo procedure for the random generation of feasible solutionsto mathematical programming problems. Bulletin of the TIMS/ORSA Joint NationalMeeting 1980;:101.

[84] Belisle CJP, Romeijn HE, Smith RL. Hit-and-run algorithms for generatingmultivariate distributions. Mathematics of Operations Research 1993;18:255–266.

[85] Kirkpatrick S, Gelatt CD. Optimization by simulated annealing. Science 1983;220:671–680.

[86] Bomze I. Fast simulated annealing. Physics Letters 1987;122A:157–162.

[87] Belisle CJP. Convergence theorems for a class of simulated annealing algorithms onRd. Journal of Applied Probability 1992;29:885–895.

[88] Aleman DM, Glaser D, Romeijn HE, Dempsey JF. A primal-dual interior pointalgorithm for fluence map optimization in intensity modulated radiation therapytreatment planning. work in progress 2007.

[89] Jelen U, Alber M. A finite size pencil beam algorithm for IMRT dose optimization:density corrections. Physics in Medicine and Biology 2007;52:617–633.

[90] Jelen U, Sohn M, Alber M. A finite size pencil beam for IMRT dose optimization.Physics in Medicine and Biology 2005;50:1747–1766.

[91] Sempau J, Wilderman SJ, Bielajew AF. Dpm, a fast, accurate monte carlo codeoptimized for photon and electron radiotherapy treatment planning dose calculations.Phys Med Biol 2000;45:2263–91.

[92] Baro J, Sempau J, Fernandez-Varea JM, Salvat F. Penelope: An algorithm for montecarlo simulation of the penetration and energy loss of electrons and positrons inmatter. Nuclear Instruments and Methods 1995;B100:31–46.

[93] Sempau J, Baro J, Fernandez-Varea JM, Salvat F. An algorithm for monte carlosimulation of coupled electron-photon showers. Nuclear Instruments and Methods1997;B132:377–90.

[94] Khan FM. The Physics of Radiation Therapy. Lippincott William and Wilkins, 1994.

147

BIOGRAPHICAL SKETCH

Dionne M. Aleman completed her bachelor’s degree in industrial and systems

engineering at the University of Florida. She went on to study intensity modulated

radiation therapy (IMRT) treatment planning optimization in the graduate program of the

Department of Industrial and Systems Engineering at the University of Florida. She will

receive her Doctor of Philosophy in Industrial and Systems Engineering in December of

2007, after which she will pursue a career in the Department of Mechanical and Industrial

Engineering at the University of Toronto. Dionne plans to continue her research in cancer

treatments, as well as other applications of operations research techniques to the medical

and healthcare industries.

148

OPTIMIZATION METHODS IN INTENSITY MODULATED...

Documents

Transcript of OPTIMIZATION METHODS IN INTENSITY MODULATED...