GHENT UNIVERSITY FACULTY OF ECONOMICS AND...
Transcript of GHENT UNIVERSITY FACULTY OF ECONOMICS AND...
GHENT UNIVERSITY
FACULTY OF ECONOMICS AND BUSINESS
ADMINISTRATION
YEARS 2013 – 2014
A BASIC EVOLUTIONARY
ALGORITHM FOR THE PROJECT
STAFFING PROBLEM
Master thesis presented in order to acquire the degree of
Master of Science in Applied Economics: Business Engineering
Piet Peene
Under the management of
Prof. Dr. Broos Maenhout
GHENT UNIVERSITY
FACULTY OF ECONOMICS AND BUSINESS
ADMINISTRATION
YEARS 2013 – 2014
A BASIC GENETIC ALGORITHM FOR
THE PROJECT STAFFING PROBLEM
Master thesis presented in order to acquire the degree of
Master of Science in Applied Economics: Business Engineering
Piet Peene
Under the management of
Prof. Dr. Broos Maenhout
Permission
Undersigned declares that the content of this master thesis may be consulted and
reproduced, when referencing to it.
Piet Peene
I
I. Preface
This master thesis denotes the end of my studies in Business Engineering,
Operations Management at the University of Ghent. It is the conclusion of a
fascinating road through the fields of knowledge in operations management.
However, exploring these paths sometimes presented unforeseen challenges. Being
able to overcome these challenges will only strengthen motivation and courage
towards future trials. May that be an important lesson I have learned in the process.
In my opinion, writing a thesis is a long-haul task in a subject of personal interest. My
spark of interest in mathematical modelling was ignited when taking a 3rd Bachelor
class in Operations Research taught by Prof. Dr. Broos Maenhout. Further classes in
the master’s degree that broadened my interest in the subject of planning and
scheduling included Project Management and Applied Operations Research, taught
by Prof. Dr. Mario Vanhoucke. A thesis on the project scheduling and staffing
problem is therefore a perfect match with my interests.
This thesis required a lot of effort, but it would not be accomplished without the help
and support of others. Special thanks go to my promoter Prof. Dr. Broos Maenhout,
for guiding me through the process, offering advice and working material, not to
mention his flexibility in scheduling of consultation meetings, even outside regular
working hours.
Furthermore I owe my parents, Annie Lips and Yves Peene, and my sister Tine and
her boyfriend Geert Depuydt many thanks for the love and support they provided, not
only during the making of this thesis but throughout the completion of my higher
education. I also want to thank my friends, especially Annelies Deleersnyder, Arno
Wallays and Michelle Vu, whom I got to know during my time at the university. They
were always available for mental support and distractions.
II
II. Table Of Content
1. Introduction .......................................................................................................... 1 2. Problem description and model formulation ........................................................ 4
2.1. Project Scheduling Problem Area................................................................. 4 2.2. Problem description ...................................................................................... 6
2.2.1. Project scheduling problem description ................................................. 6 2.2.2. Project staffing problem description ...................................................... 8
2.3. Mathematical model formulation................................................................. 10 3. Methodology ...................................................................................................... 14 4. Literature Overview ........................................................................................... 15
4.1. Genetic algorithms ..................................................................................... 15 4.1.1. What are genetic algorithms................................................................ 15 4.1.2. Genetic algorithm framework .............................................................. 16
4.2. Data representation .................................................................................... 19 4.2.1. Project Schedule representation ......................................................... 19 4.2.2. Project Staffing representation ............................................................ 24
4.3. Solution methods........................................................................................ 26 4.3.1. Initialization ......................................................................................... 26 4.3.2. Selection ............................................................................................. 29 4.3.3. Operation ............................................................................................ 32 4.3.4. Local Optimization ............................................................................... 37 4.3.5. (Partial) Evaluation .............................................................................. 41 4.3.6. Reinsert ............................................................................................... 44 4.3.7. Ending condition .................................................................................. 47
5. The algorithm .................................................................................................... 49 6. Computational experiments ............................................................................... 50
6.1. Observation link AVGSQDEV – Total cost ................................................. 50 6.2. Benchmark ................................................................................................. 51 6.3. Results ....................................................................................................... 54
6.3.1. Basic cycles ........................................................................................ 55 6.3.2. Stage contributions ............................................................................. 58 6.3.3. Sensitivity Analysis .............................................................................. 59
7. Conclusions and further research ...................................................................... 69
III
III. List of figures
Figure 1 Extended Project Management triangle ........................................................ 1 Figure 2 Flowchart Genetic Algorithm, framework 1 ................................................. 16 Figure 3 Flowchart Genetic Algorithm, framework 2 ................................................. 17 Figure 4 Activity-on-the-node project schedule activity network (AN2) ..................... 20 Figure 5 Example project schedule, PS2 (based on AN2) ........................................ 23 Figure 7 Labor supply and demand for AN2, PS2 .................................................... 25 Figure 6 Flowchart Genetic Algorithm, framework 2 ................................................. 26 Figure 8 Pseudo code Initialization methods ............................................................ 28 Figure 9 Example Roulette Wheel ............................................................................ 31 Figure 10 Pseudo code Roulette Wheel Selection ................................................... 31 Figure 11 Pseude code Tournament Selection ......................................................... 32 Figure 12 Pseudo code Blend CrossOver ................................................................ 35 Figure 13 Local search simplified ............................................................................. 38 Figure 14 Pseudo code LS1 Burgess and Killebrew simplified RLP ......................... 39 Figure 15 Pseude code Double-Justification ............................................................ 40 Figure 16 Pseudo code 2-exchange neighborhood .................................................. 41 Figure 17 Average squared deviation of resource consumption ............................... 43 Figure 18 AVGSQDEV - Total Cost dispersion for all project lengths ....................... 50 Figure 19 AVGSQDEV - Total Cost dispersion for project length of 11 days ........... 51 Figure 20 Benchmark setup...................................................................................... 52 Figure 21 Lower bound benchmark in function of project duration ........................... 53 Figure 22 Total Cost evolution basic cycle AN2 Framework1 ................................... 56 Figure 23 Total Cost evolution basic cycle AN2 Framework2 ................................... 56 Figure 24 Total cost Vs number of operations for different population sizes ............ 60 Figure 25 Total cost Vs population size for different number of operations .............. 61 Figure 26 Trade-Off Total Cost Vs. Execution Time ................................................. 62 Figure 27 Efficient Frontier Total Cost Vs. Execution Time ...................................... 63 Figure 28 Efficient Frontier Total Cost Vs. Execution Time, with / without doubles .. 65 Figure 29 Total Cost Vs. Number of Iterations.......................................................... 65 Figure 30 Total Cost Vs. Mutation Percentage ......................................................... 66 Figure 31 BSP Vs number of operations .................................................................. 67 Figure 32 Total cost Vs ending condition .................................................................. 68 Figure 33 Performed operations Vs ending condition ............................................... 68
IV
IV. List of tables Table 1 Example Duration Vector ............................................................................. 21 Table 2 Absolute Starting Times ............................................................................... 21 Table 3 Relative Starting Times ................................................................................ 22 Table 4 Resulting Starting Times .............................................................................. 22 Table 5 Work patterns forming labor supply ............................................................. 24 Table 6 Total Cost evolution basic cycle AN2 Framework1 ...................................... 55 Table 7 Total Cost evolution basic cycle AN2 Framework2 ...................................... 56 Table 8 Best solution methods, total cost and lower bound per basic cycle ............. 57 Table 9Total cost comparison with and without doubles in the population ............... 64
V
V. Abbreviations AN Activity network
AVG Average
AVGSQDEV Average squared deviation
BSP Best schedule percentage
CP Critical path
GA Genetic algorithm
PS Project schedule
RACP Resource availability cost problem
RCPSP Resource-constrained project scheduling problem
RLP Resource levelling problem
RRP Resource renting problem
SPI Serial parallel indicator
TSP Travelling salesman problem
1
1. Introduction This thesis deals with a project scheduling and staffing problem. It fits in the
functional area of project management. Many people and institutions have tried to
give a meaningful definition of project management. The Association of Project
Management (APM) came up with an apt definition stating project management is
“the planning, organisation, monitoring and control of all aspects of a project and the
motivation of all involved to achieve the project objectives safely and within agreed
time, cost and performance criteria. The project manager is the single point of
responsibility for achieving this.” (APM BOK, 1995) A project is “a temporary
endeavour undertaken to create a unique product, service or result”. (PMBOK, 2004)
This very short definition of a project is very meaningful in two of its words, i.e.
temporary and unique. ‘Temporary’ indicates that there is a well-defined start and
end. ‘Unique’ signifies that there is no predefined scheme for executing the project.
However there may be similarities to previous projects. A project consists of multiple
tasks or activities that need to be executed in a certain order, this order and the
precedence relations between the activities are defined in an activity network. The
actual timing of all activities is defined in a project schedule.
A classic combination of criteria, measuring success or failure of a project is depicted
in the iron triangle or the project
management triangle. (Atkinson, 1999)
We made an extended version of the
triangle including the link of project
management with project staffing and
project scheduling in figure 1. The
triangle has a performance measure
on each of its corner-points. The scope
contains the content of the project,
what should be done. The cost refers
to the budget of the project. Time
refers to the amount of time to
complete the whole project. It is often
stated that if one of the measures is
Figure 1 Extended Project Management triangle
2
altered, it will have an impact on the other two. For example, when extending the
scope of a project, the cost and time are likely to increase as well. However, these
relations are not necessarily strict meaning that a decrease in time does not
necessarily mean an increase or decrease in cost (cfr. non-regular objectives of
performance).
The goal of the thesis is to find an intelligent way to construct a schedule of activities
that has the lowest staffing cost. This construction of a schedule is called project
scheduling. Every activity in the schedule has a certain need for resources. In our
research, these resources are labor. Project scheduling creates a demand of labor
over the span of the project and results in a project make span (time criterion in
figure 1).
However this demand cannot always be met exactly by the supply of labor, which is
defined by project staffing. Once the schedule and resource needs are known, the
project staffing can be executed. This project staffing results in the supply of
resources and gives the eventual staffing cost. A close match between supply and
demand of resources is more likely to result in a lower staffing cost.
This matching of supply and demand and its link to project scheduling and project
staffing is shown in the bottom part of figure 1.
In order to attain the goal of constructing a good schedule, we implement a basic
evolutionary algorithm, coded in C++. This research and its coded implementation
are subject to various limitations. Since the focus here is mainly on the quantitative
aspects of project staffing and project scheduling, the qualitative aspects are being
neglected. An example of the qualitative aspect is job satisfaction or the loss of it
resulting from irregular or acyclic working patterns, including overtime and idle time.
Besides the lack of qualitative information, also some quantitative aspects show
deficiencies. These deficiencies are mainly due to the many assumptions included
into the modelling. Examples include the assumption that the time and resource
consumption of each activity is exact and known a priori, and costs assigned to the
different types of labor time is only an estimation.
However these assumptions and estimations are vital to the construction of a
mathematical model and are being set to resemble reality as close as possible.
3
There are also practical limitations towards the execution of the algorithm, in the
sense that available computing power sets a boundary to this research. Although
computing power is ever increasing, computationally testing the possible
combinations to construct the algorithm is still limited.
To conclude this introductory chapter, a brief overview of the structure of this thesis is
given.
Chapter two will dig deeper into project staffing, project scheduling and the
interactions between them. The chapter concludes with an unambiguous definition of
both problems and their mathematical model.
Chapter three presents the methodology, i.e. the way in which the research is
conducted. Chapter four provides a literature overview on the different types of
solution algorithms and its building blocks. Chapter five describes the algorithm that
proves to be the most performant. In chapter six, the computational results are given
in combination with general observations and calculation of a benchmark. A
conclusion and recommendations for further research are presented in chapter
seven.
4
2. Problem description and model formulation The first chapter situates our problem in the functional area of project management.
The second chapter will define the problem more in detail. In the first section, the
problem is put into the bigger picture by describing similar problems. The second
section gives the specific problem description of our problem that will be used
throughout the remainder of this thesis. The third and final section translates the
problem description into a mathematical model.
2.1. Project Scheduling Problem Area In project scheduling, a set of activities need to be scheduled meaning a start time
has to be assigned to all activities. The project scheduling problem has been widely
researched. Overviews and classification methods for the scheduling problem are
given by Icmeli et al. (1993), Elmaghraby (1995), Herroelen et al (1997, 1998, 1999),
Brucker et al. (1999) and Hartmann and Briskorn (2010). Different classifications can
be made based upon the difference in characteristics between the problems. The use
of a different type of resource, i.e. renewable or non-renewable leads to a different
kind of problem as well as the activity characteristics and type of scheduling
objective. Examples of objectives for the project scheduling problem are minimization
of the duration of the project, levelling resources over the course of the project,
minimizing resource idle time and maximizing the net present value. The most
popular problems are the resource-constrained project scheduling problem or
RCPSP. The goal of this problem is to minimize the total length of the project taking
into account a certain renewable resource constraint. Other similar problems are the
resource availability cost problem (RACP), the resource levelling problem (RLP), the
time-constrained project scheduling problem (TCPSP) and resource renting problem
(RRP). The RACP aims at minimizing the total cost of the unlimited renewable
resources required to complete the project before a certain deadline. The RLP has
the objective to schedule the activities such that the resulting resource demand over
the span of the project is as levelled as possible. The TCPSP aims at meeting project
deadlines, starting with a fixed capacity of resources. In order to meet the deadlines,
decisions have to be made concerning working overtime and hiring additional
5
resources to enlarge the existing fixed capacity. The RLP has as objective to
minimize the renting costs incurred by renewable resource, these costs concern both
fixed and variable renting costs.
After the project activities are scheduled, the staffing needs to foresee sufficient labor
resources to carry out the resource demand of the schedule.
The main objective is to minimize total staffing cost of the project. Total costs are the
sum of the cost of regular personnel, cost of overtime, cost of idle time and cost of
temporal personnel. This problem objective is often referred to as ‘the deadline
problem’ (brucker et al, 1999), meaning that there is a given deadline on the
makespan of the project and the goal is to find a feasible schedule that minimizes the
costs. This is opposed to ‘the budget problem’ where one is given a certain budget
and needs to find a feasible schedule that minimizes the makespan.
The staffing problem has already been solved deterministically by Maenhout and
Vanhoucke (2014). This thesis will focus on the project scheduling problem.
The goal of this thesis is to develop an algorithm that generates a project schedule
that minimizes the staffing costs. However, it is important to note that a shorter make
span or more levelled resource usage does not necessarily mean a lower total cost
of the project. Therefore, translating the global objective into an intermediate
objective for the scheduling problem is not straightforward.
6
2.2. Problem description This section handles the problem description of both the project scheduling and
project staffing problem. There is not a single formulation of these problems. The
basic idea is always common, but subtle deviations can be made to the goal or the
constraints of the problem, making it seemingly result in a totally different problem.
2.2.1. Project scheduling problem description The basic idea of project scheduling is to determine a start time for each activity in
the project activity network. The assignment of these start times are not random but
should serve a goal. The overall goal is to produce a schedule that provides the
lowest personnel staffing cost. This cost determination is not part of the scheduling
process but part of the staffing process. There is no direct translation of this staffing
cost goal to a goal for the scheduling problem. As an intermediate goal that gives an
approximation for the staffing goal, we set a resource levelling objective for the
scheduling problem. This makes the project scheduling problem resemble the
resource levelling problem (RLP) as discussed in section 2.1.1. The RLP has a non
regular measurement of performance; it has no early completion measure.
(Neumann and Zimmermann, 1999)
Besides the objective of the problem under consideration, the scheduling constraints
that are active have an influence on the problem definition. These constraints can be
derived from either activity characteristics or resource characteristics. (Herroelen et
al., 1997)
Activity characteristics:
• No pre-emption (1)
• Finish-start precedence relations (2)
• Fixed and discrete duration per activity (3)
• Predefined project deadline (4)
• Acitivity resource needs: constant and discrete (5)
• Single execution mode (6)
7
Pre-emption or splitting of an activity is not allowed.(1) Pre-emption means that if an
activity is started, it can be interrupted at some point in time, to resume later. Pre-
emption brings more flexibility in the schedule and thus adds complexity. The
scheduling of the activity is constrained to precedence relations. (2) This means that
a certain order of execution of the activities needs to be maintained. This order is
determined by the activity network. The only type of precedence relation used, is the
basic PERT/CPM finish-start precedence relation. This means that the previous
activity in the network has to finish first before the next activity can be started. Other
precedence relations, referred to as generalized precedence relations, such as start-
start, start-finish and finish-finish precedence relations are not used. Also the use of
minimal and maximal time lags is omitted, for the sake of complexity. A start-start
precedence relationship with a minimal time lag of three days means that the next
activity can start three days or later than the start of the previous activity. The
duration of an activity is known in advance and has an integer value. This means that
the duration does not depend on a stochastic process or events on prior activities.
Forcing the activities to have integer durations simplifies the calculations. A
predefined project deadline of 21 days is applied, for technical reasons not functional.
(4) We did not use a deadline as a relative percentage of the critical path since the
critical paths of the activity networks under consideration differ heavily. This would
result in strangling the solution space for the activity network with small critical path
and creating an abundant solution space size for the activity network with a large
critical path. The activity resource needs are constant, meaning that over the course
of an activity, the resource demand for each time unit is equal. (5) The activity
resource demand is integer for the same reason the activity durations are integer
values. Contrary to discrete and constant, resource needs could be continuous and
the amount necessary could be a function of the duration. There is only a single
execution mode for the activities. (6) Multiple activity modes would imply the
possibility of executing one activity or a subset of activities to be executed in different
ways, possibly incurring different costs.
Resource characteristics:
• Single resource (7)
• One resource type: renewable resource (8)
• Variable availability of resources (9)
8
The resource used for executing the activities is labor. Every unit of labor is assumed
to be equal, the labor units do not require different skill levels.(7)
Concerning the resource constraints, we consider only one resource type in our
problem. This resource type is a renewable resource. (8) A renewable resource is a
resource that gets renewed from period to period. Besides labor, machines are
another example of a renewable resource. Examples of non-renewable resources
are materials, energy and money; once they are used, they are gone. The availability
of resources is variable, and defined by the staffing problem. (9) The amount of labor
available at each time unit depends on the number of workers employed and their
different working patterns.
2.2.2. Project staffing problem description The basic idea of project staffing is to find the combination of working patterns that
covers the resource needs of an activity schedule. Each work pattern is a serial string
of work days and days off and is executed by a single worker. The work patterns are
non-cyclic, meaning that there is no predefined and reoccurring pattern of days off
and days on. Not all patterns are allowed however, there are minimum and maximum
constraints defined on the number of consecutive days off and consecutive days on.
Minimum consecutive days on 2
Maximum consecutive days on 6
Minimum consecutive days off 1 (does not result in an actual constraint)
Maximum consecutive days off 2
The goal of the staffing problem is to find the combination of working patters that
satisfies the resource needs of the activity schedule and minimizes the labor costs
incurred by the staffing. An overview of the different costs and their weights are
presented below. (Maenhout & Vanhoucke, 2014)
• Regular personnel time units 2
• Overtime units 3
• Temporal personnel time units 4
• Idle time units 1
9
The cost of regular personnel time units is a variable cost, depending on the project
makespan. It does not take into account the actual number of days worked. Each
project day incurs a cost of 2. A work pattern is subdivided into several periods, each
containing seven days. A regular period of seven days has five days on and two days
off. If a work pattern has an extra day on, on top of these five days, an overtime unit
cost is incurred. For every work pattern, its cost can be calculated based upon the
regular time units and the overtime units. The cost of a work pattern is thus known a
priori, before the actual staffing takes place. The other two costs, i.e. temporal
personnel units and idle time units are costs that come as a result of the combination
of several work patterns. If the combination of work patterns does not supply enough
labor on a certain day, external labor has to be hired for that day, incurring a
temporal cost per extra unit. If however the combination of work patterns supply
excess labor on a certain day, a penalty cost per idle time unit is added.
10
2.3. Mathematical model formulation A mathematical formulation of the scheduling and staffing problem leaves no room
for misconception and has the potency of representing the problem concisely. The
notation will be explained in the form of sets, input data and decision variables
(Maenhout and Vanhoucke, 2010). Sets are well-defined groups of elements that
have common characteristics. An individual element in a set is recognized by its
index. A set will be denoted by a capital letter. If W is the set of workers, w1
represents the first individual worker in the set. Input data is static data and known on
beforehand. It is important that this data is as close to reality as possible since it will
have a great impact on the behavior of the algorithm and ultimately on the results.
Decision variables are the unknown factor in the model. Solving the mathematical
problem means determining the value of these decision variables.
Sets
W set of workers (index i)
T set of days in the scheduling horizn (index t)
A set of activities in the project (index j)
Input Data
cr cost per worker per day
co cost per worker per day of overtime
cx cost per worker per day of outsourced labor
cl cost per worker per day of idle time
dj duration of activity j
ppj 1 if p is a predecessing activity of activity j
rj number of resources necessary to execute activity j
PD project deadline
DOmin minimum consecutive days off
DOmax maximum consecutive days off
DWmin minimum consecutive days working
DWmax maximum consecutive days working
11
Decision Variables
stj starting time of activity j
resulting parameter: ajt, 1 if activity j is performed on day t
PL project schedule length
witr 1 if worker i works a regular shift on day t
wito 1 if worker i works an overtime shift on day t
wtx number of workers outsourced externally on day t
wtl number of workers in excess on day t
The actual model consists of an objective function and constraints. The objective
function represents the ultimate goal. This could be the minimization of the make
span, the levelling of work load, the maximization of profits etc. In this case the
objective is to minimize the total personnel staffing costs. Underneath the objective
function, a number of constraints are formulated. These constraints find their origin in
a regulatory domain, (e.g. the number of allowed consecutive working days) or are
driven by feasibility boundaries (e.g. activity 1 needs to be completed before activity
2 can start).
Objective function
(1)
Subject to constraints
(2)
(3)
(4)
(5)
(6)
(7)
(8)
12
(9a)
(9b)
(10)
(11)
(12)
(13)
(14)
The objective function (1) represents the total personnel cost of the project. It can be
broken down into 4 parts. The first part is a cost incurred for each worker in a regular
schedule. The cost is independent of the actual number of days worked but is
completely dependent of the length of the project schedule. It represents a fixed cost
per hired worker. The second part of the cost calculation accounts for the overtime
units. A worker is supposed to work a normal schedule of five days per week.
However when he works more, an extra cost is incurred on top of the regular cost.
The third part is a cost related to outsourcing extra workers. Sometimes, the regular
hires can’t carry all the workload thus extra workers will be sourced from outside.
This has the advantage that there is no fixed cost over the whole length of the
project. However these units of work are usually more expensive than regular or
overtime work units. The last part of the cost function is a cost that represents the
excess supply of workers. When there are more workers available than necessary for
the amount of work, an extra cost is incurred.
The first constraint equation (2) shows the connection between the project
scheduling and the project staffing. The project scheduling results in a demand of
labor for each day of the project, which is represented on the left-hand side of the
equation. The right-hand side of the equation represents the supply of labor for each
13
day of the project, which is the result of the project staffing. Supply and demand of
labor need to be in balance on a daily basis. If no balance can be found between the
demand of labor and the supply generated by regular and overtime work units of the
hired workers, extra outsourced labor or excess labor will rectify the total balance.
Equations (3) - (6) are constraints exclusively related to the project scheduling
problem. (3) is the mathematical representation of the finish-start precedence
relations. The fourth equation forces the non-preemptive nature of the activities in the
project schedule. The project length (5) is defined as the end of the last activity. This
project length is bound to a certain predetermined project deadline (6).
Equations (7) - (12) are constraints exclusively related to the project staffing problem.
The seventh equation enforces that a worker can execute a regular work unit or an
overtime work unit but never both. On one day, either extra workers get outsourced
or there is excess labor, or none of both. (8) It does not make any sense to attract
external workforce while there are still regular workers available. Constraints (9) –
(12) represent working agreements between the employees and the employer.
Constraint (9) and (10) ensures the minimum and maximum number of consecutive
days off work, respectively while (11) and (12) ensures the minimum and maximum
number of consecutive days a worker is allowed to work.
Equation (13) limits certain variables to a binary value and other variables to positive
integers (14).
14
3. Methodology The problem explained in the previous chapters shows a complex interaction of two
subproblems. A structural approach towards the solution of the problem is essential,
and basic assumptions are necessary to limit its complexity. As mentioned before,
the project staffing problem is perceived as a given and focus will go almost entirely
to the project scheduling problem.
First, the project activity network dataset will be described in more depth. This is
important to show that we are solving a real life problem and not an abstract
theoretical problem. Furthermore, it must be noted that there is no single best
method to solve all kinds of different project activity networks. The characteristics of
these networks might play an important role in the final selected solution method.
In a second phase, existing basic genetic algorithm methods from the literature will
be discussed. These methods can be grouped into generation methods, selection
methods, operation methods, optimisation methods, reinsert methods and population
management methods. These are picked from a broad range of applications, not
limited to the project scheduling problem. This phase is concluded by placing these
methods into a framework of a genetic algorithm.
The third phase consists of programming the methods in C++ and to connect them in
a logical way. The resulting program will then be executed in several runs on three
prototype datasets. Every run will narrow the number of methods by either excluding
the weakest methods or by withholding only the best methods. For each prototype
dataset, a genetic algorithm will be formulated.
As genetic algorithms do not promise an optimal solution, the goal is to reach a good
solution. It is possible to determine an upper and lower bound for the cost objective
function. These two values will then be used for benchmarking purposes of the
proposed genetic algorithms. Besides benchmark testing, we will also perform tests
to determine the effectivity of each method. General parameters will be changed to
check the influence of these parameters on the applied algorithm and its results.
15
4. Literature Overview In chapter four, the literature overview is given. The first section gives an introduction
on genetic algorithms and presents the genetic algorithm frameworks. The second
section shows how the data is represented. The third and last section will elaborate
in depth on the different solution methods that are applied within the genetic
algorithm framework.
4.1. Genetic algorithms
4.1.1. What are genetic algorithms A genetic algorithm (Holland, 1975) is a heuristic that imitates human nature of
evolution to find good (but often suboptimal) solutions to a problem in a reasonable
amount of time. To the contrary there are exact solution methods which will always
come up with the optimal solution. The genetic algorithm however intelligently
exploits the random search.
It has been proven that genetic algorithms, in combination with local search,
simulated annealing or tabu searchn provide very good solutions among Heuristics.
(Brucker, 1999)
We will add local search to genetic algorithm framework. Genetic algorithms are
population based algorithms, i.e. they work on a set of solutions.
The general procedure is described below.
Firstly, an initial population of member solutions is generated (1). Out of this
population, ‘parents’ will be selected for mating(2). The parents will be combined in a
certain way to generate new solutions (=mating), called the ‘children’ (3). The
children will enter the population and a new cycle can start from the selection
procedure. Step 2 and 3 will be repeated until an ending condition is reached (4).
The underlying principal is survival of the fittest. This means that the stronger
members of the population will survive while the inferior members will be eliminated.
The next sections will go deeper into this general setup.
16
4.1.2. Genetic algorithm framework This subsection discusses the integration of the project scheduling and project
staffing problem and will mold it into a genetic algorithm framework.
The project scheduling problem is leading throughout the execution of the genetic
algorithm. The staffing problem is called at the appropriate time when evaluation of
the resulting schedule is needed.
The integration of the scheduling and staffing problem is molded into two slightly
different forms of genetic algorithms, presented in two diagrams in figure 2 and figure
3.
The first framework consists of an
initialisation phase, a selection phase
and an operation phase. The
obtained child after the operation
phase will undergo local optimization
where the algorithm will look for
incremental improvements in the
neighborhood of the child. This local
optimization comes with partial
evaluation to evaluate every instance
of the explored neighborhood. We call
it partial evaluation because the
actual objective function is never
calculated in this phase. Another
measurement, which is a close
approximation of the objective
function under certain conditions, will
be calculated in this phase. The reason for this partial evaluation is that the regular
evaluation, which includes calculating the objective function through calling the
staffing algorithm, consumes a considerable amount of time. This would extend the
execution time of the algorithm unnecessarily. After the local optimization, only one
schedule which is assumed to be the best, based upon the partial evaluation, will
undergo the complete evaluation phase, including the staffing part. After the
evaluation, a decision has to be made whether the newly generated and optimized
Figure 2 Flowchart Genetic Algorithm, framework 1
17
child can enter the population. This decision is made in the reinsert phase. The last
phase checks an ending condition. If a certain ending condition is reached, the
execution will stop and the best schedule found up to that moment will be the output
of the algorithm. If the ending condition is not yet reached, the phases described
above will be repeated starting from the selection phase.
The advantage of this form of the algorithm is that it does not allow deterioration of
the resulting schedule, i.e. the outcome of the schedule at the end. This is because
at the end of each cycle, the objective
function is calculated and the best
schedule is stored.
The biggest disadvantage of this form
of the algorithm is that still an
evaluation is being performed during
each cycle, which consumes a
considerate amount of time.
This is the reason why an alternative
framework is formulated, represented
in figure 3. The only difference
between the second and the first
framework is that the second
framework postpones the evaluation
phase until the very end. This second
framework will not perform the
evaluation in every cycle and thus
save lots of execution time. In this
case the quality of the schedules in
the population is entirely controlled by
the partial evaluation. When the ending condition is reached, the evaluation will be
executed on every schedule in the population. The biggest advantage of this
framework is the amount of time that can be saved in each cycle. The disadvantage
however is that it is possible that the best schedule present in the population gets
replaced by another one during the execution of the agorithm. Because only at the
very end, we will know which schedule is the best. Before that, we rely on an
approximation of the objective function to get an indication of which schedule will
Figure 3 Flowchart Genetic Algorithm, framework 2
18
probably be good, and thus should not be replaced and which schedule is bad and
thus should be replaced. It is expected that the second framework yields an inferior
quality of the schedules.
19
4.2. Data representation The way in which the data is represented can have a great influence on the range of
methods that can be applied. In this section the data representation for both
scheduling and staffing problem will be shown.
4.2.1. Project Schedule representation
Good project management starts with a solid representation of the project schedule.
A good tool for this is PERT, the Project Evaluation and Review Technique. (Cottrell,
1999) It is used for analyzing and representing activities in a project and was first
developed in the late 1950s by the U.S. Navy as a tool for measuring and controlling
the development progress for the Polaris Fleet Ballistic Missile program. (Malcolm,
1959) The method perceives a project as a network of activities and events.
An activity network shows the activities and the relations between them, often
referred to as precedence relations. There are two types of activity networks, an
activity-on-the-node (AON) network and an activity-on-the-arc (AOA) network
representation.
Activity-on-the-arc
In this representation, each arc or arrow represents an activity or a task. The nodes
define a milestone which is achieved when all activities on the arrows leading up to
this node are completed. Dummy arcs can be introduced to enforce additional
precedence relations.
Activity-on-the-node
In this representation, each node represents an activity or a certain task that has to
be executed. The arcs or arrows represent the precedence relation. Figure 4 shows
an example of such a network. Each node gets an activity number inside the node,
the duration of the node is put on top and the necessary labor to execute the activity
is put below the node. The network clearly visualizes that activity five can only be
executed when both activity three and four have been executed. Activity five is called
the successor of activities three and four, activities three and four are a predecessor
of activity five. The activity-on-the-node network has two dummy activities to start
20
and end the network, they don’t consume any time nor resources. Their sole function
is to have a clear single node at the beginning and end of the network. The network
in Figure 4 will be used as an example in the next chapters.
In order to limit the complexity, it is assumed that the durations of the activities are
deterministic. PERT often takes a certain variance on the duration of an activity into
account when analyzing project schedules. Another complexity limiting factor is the
type of relationships that are used. In this thesis, only finish-start relationships are
considered. This means that the successor can only start when the predecessor has
finished. When using generalized precedence relations, also start-start, finish-finish
and start-finish relationships can be defined.(Dawson, 1995) Their respective
meanings are: the successor can start when the predecessor has started, the
successor can finish when the predecessor is finished and the successor finishes
when the predecessor starts.
In combination with PERT, CPM or critical path method is often used. The critical
path represents the group of activities that cannot be delayed without increasing the
length of the project. It thus is the chain of activities that result in the minimal length
of the project.
Throughout this thesis we have used the activity-on-the-node network representation
method since this emphasizes the activities rather than the milestones. Furthermore
it is easier to interpret at first sight and there is no need to define any dummy
Figure 4 Activity-on-the-node project schedule acti vity network (AN2)
21
activities besides the start and end activity. Other advantages identified by Turner
include the ease of drawing activity-on-the-node networks and the ability to write
network software more easily and the independency. (Turner, 1993)
Programming data representation
The network can be translated or decoded into static and dynamic data. The static
data includes the duration of the activities, the necessary resources for the execution
of each activity and the precedence relations of the activities. These will remain
identical throughout the scheduling process. The dynamic data are the starting times
of each activity, these will change throughout the process and are the eventual
outcome.
Both the aforementioned static and dynamic data will be saved into vectors. i.e. a
vector of durations, a vector of resource usages, a vector of successors and a vector
of starting times. For example figure 4, this results into a vector represented in table
1.
a1 a2 ... a12 d[aj] 0 5 1 4 2 2 2 2 3 4 4 0
Table 1 Example Duration Vector
For the decoding of the activity starting times, there are two options. The first one
consists of the absolute starting time of the activities and the second considers a
relative starting time. (Wall, 1996) The first method is straightforward and states an
exact starting time, independent of the starting time of other activities. Table 2 shows
an example of a vector with absolute starting times.
a1 a2 ... a12 st[aj] 1 4 6 8 9 14 10 18 8 17 19 19
Table 2 Absolute Starting Times
Activity one starts at day one, activity two starts at day four and activity three starts at
day six.
The second method does not state an absolute starting time but rather a relative
starting time of the activity i.e. the relative starting time indicates how many days
there are between the start of an activity and the end of the latest predecessor. The
vector of these relative starting times will be referred to as the float vector in the
remainder of the thesis.
22
Table 3 shows how a float vector is represented.
a1 a2 ... a12 fl[aj] 0 4 6 8 4 2 6 3 8 5 5 0
Table 3 Relative Starting Times
This float vector has to be interpreted in combination with the precedence relations to
result into the actual starting times of the activities. If you combine this with the
network of figure 4, the resulting starting times are calculated in table 4.
a1 a2 ... a12 st[aj] 0 4 15 8 20 11 6 19 30 27 27 33
Table 4 Resulting Starting Times
The start time of activity three is the end time of the latest predecessor (activity two)
which is nine. Add to this number the float value of six and you get the starting time
for activity three, i.e. fifteen.
We opted to go for the relative time representation for the simple reason that it is
impossible to break any precedence relations constraint since it is imbedded into the
definition of the float vector. If the absolute starting times are used, every schedule
that is produced needs to be tested if any precedence relation is violated and if
necessary repair these violations. An example of this is activity nine in table 2, which
starts at day eight while its predecessing activity seven starts at day 10. table 2 is
thus an example on an infeasible schedule.
Dataset
We execute the algorithm on three prototype activity network datasets. These activity
networks contain the same activities, i.e. twelve activities including a dummy start
and dummy end activity. Even the activity characteristics, concerning duration and
resource demand are identical. The only aspect that is different between the three
activity networks under research is the order in which the activities should be
executed. This order is visually represented by the arrows as the precedence
relations in the activity networks. Appendix A shows all three activity networks,
further denoted as activity network AN1, AN2 and AN3. Note that AN2 is discussed
previously in this section. These three networks are not chosen at random but have a
distinctive topological structure. We want to test how the algorithm reacts to activity
networks with a tendency to a very parallel structure compared to activity networks
23
with a tendency to a very serial structure. This is done by measuring the serial or
parallel indicator as a topological indicator to measure the network structure.
(Vanhoucke et al., 2008) This indicator has a value ranging from 0 to 1, with 0
meaning a complete parallel network structure and 1 meaning a complete serial
network structure. This indicator (I) is calculated using the formula below.
In this calculation, n indicates the number of activities excluding the dummy start and
end node and m denotes the maximum progressive level of the network.
(Elmaghraby, 1977) AN1, AN2 and AN3 have 0.11, 0.33 and 0.44 as respective
values for the serial parallel indicator (SP indicator). These values seem very low.
However when setting higher values, the tendency towards a serial structure is so
overwhelming that there is very limited scheduling flexibility. When the SP indicator is
0, which means that all activities are in a parallel structure, the scheduling flexibility is
maximal since there is not strict order in the activities. When the SP indicator is 1,
which means that all activities are in a serial structure, there is no scheduling
flexibility since the order of the activities is completely fixed.
We assume that parallel networks have a broader solution space and offer more
possibilities to the staffing of a project, possibly resulting in lower staffing costs.
When the scheduling
problem is solved, all
activities received a start
time and a project schedule
can be printed. (Figure 5)
The red line represents the
labor demand per day, this is
the demand that needs to be
covered by the project
staffing.
Figure 5 Example project schedule, PS2 (based on AN 2)
24
4.2.2. Project Staffing representation
The project staffing representation revolves around the representation of the work
pattern. This work pattern is a binary vector indicating whether a day in the pattern is
a working day or a day off. An example of a work pattern and a combination of work
patterns to form the labor supply is shown below.
Pattern Workers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 w1 1 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 w2 1 1 1 0 0 1 1 1 0 1 1 0 0 1 1 0 1 1 1 1 w3 2 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 1 1 w4 2 1 1 1 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 w5 3 1 1 1 1 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1
Staffing (Supply) 8 8 8 8 5 6 6 6 6 6 5 5 7 9 8 9 3 9 9 Scheduling (Demand) 8 8 8 8 5 6 6 6 6 6 5 5 9 9 9 9 10 10 10 Ext / Idle (Difference) 0 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 7 1 1
Table 5 Work patterns forming labor supply Table 5 shows an example of staffing executed on AN2 and PS2. Five work patterns
are distinguished to carry out the labor defined by PS2. Nine workers are hired for
the project, work pattern one and two are executed by one worker each, work pattern
three and four are executed by two workers each and 3 workers have working
pattern five. The center of the table indicates whether a work pattern is on or off work
on a certain day. Work patterns three and four have an overtime unit on day four;
their first week contains six days of work in stead of the regular five days. In the
bottom part of the table, the total labor supply per day is calculated as the sum of the
individual active work patterns. The row below you can find the labor demand per day
as defined by PS2. The lowest row indicates whether external labor units (positive
value) should be hired or idle time (negative value) occurs for every day. Towards the
end of the project, external workers are hired. The difference of labor supply and
demand can also be shown in the project schedule graph. Below, you can find an
extended version of figure 5. The red line still shows the labor demand while the
green dotted line indicates the labor supply as defined by the staffing. Demand
exceeds supply towards the end of the project.
26
4.3. Solution methods This section contains an overview of all
the methods that were taken into
consideration for the algorithm. The
structure of this section is guided by the
flowchart in figure 6. The flowchart
represents the different phases in the
algorithm. Each phase contains multiple
methods that contribute to the solution of
the problem. The topics will be
discussed in this order: Initialization,
Selection, Operation, Local Optimization,
Partial Evaluation, Reinsert, Ending
Condition an Evaluation. Figure 6
indicates the sections and subsections in
which the different methods are
discussed.
4.3.1. Initialization To start the algorithm, we need to initialize a population of schedules in the form of
float vectors. Although research has often neglected the importance of the
initialization phase, a bad initial population can lead to increased time-to-solution or
even getting trapped into local optima. A minimum of diversity in the population is
necessary to avoid premature convergence of the solutions towards suboptimal
regions of the solution space. To initialize our population, simple constructive
heuristics will be used.
To construct the schedule float vectors, we distinguish three groups of initialization
methods, i.e. random, uniform and Gaussian initialization.
Figure 7 Flowchart Genetic Algorithm, framework 2
27
I1 Random Initialization
In the random initialization method, a maximum float value (MFV) is determined.
Then a value is randomly generated in [0, MFV]. The parameterisation for MFV leads
to the following methods.
I1a MFV = 1 x AVG duration of activities
I1b MFV = 2 x AVG duration of activities
I1c MFV = 3 x AVG duration of activities
I2 Uniform Initialization
In the uniform initialization method, a central value (CV) and a deviation value (DV) is
determined. Then a value is uniformly generated in [CV-DV, CV+DV]. The biggest
difference with the random initialization method is that this method does not
necessarily include the value 0. If CV-DV would return a negative value, it is
automatically initiated with a 0. If a large amount of float values are generated using
this method, you will notice that they follow a uniform distribution. The parameters for
CV and DV led to the following methods.
I2a CV = 1 x AVG duration of activities DV= 0,5 x AVG duration of activities
I2b CV = 2 x AVG duration of activities DV = 0,5 x AVG duration of activities
I2c CV = 2 x AVG duration of activities DV = 1 x AVG duration of activities
Note that there is no method where CV = DV = AVG duration of activities since this
method is identical to method I1b.
I3 Gaussian Initialization
In the Gaussian method, a central value (CV) and a standard deviation value (SDV)
is determined. Then a value is generated conform the gaussian distribution with a
mean CV and a standard deviation SDV. This method differs from the two previous
ones by the fact that it allows more extreme values sine it has no maximum value,
i.e. there is no closed upper end.
If this method would return a negative value, it is automatically initiated with 0.
The parameters for CV and SDV led to the following methods.
I3a CV = 1 x AVG duration of activities SDV= 0,5 x AVG duration of activities
I3b CV = 1 x AVG duration of activities SDV= 1 x AVG duration of activities
I3c CV = 2 x AVG duration of activities SDV= 0,5 x AVG duration of activities
I3d CV = 2 x AVG duration of activities SDV= 1 x AVG duration of activities
28
I4 Combined Initialization
This method makes a combination of the previous methods I1, I2 and I3 to become
the initial population.
The pseudo code for the initialization phase can be found in figure 8.
Pick an Initialization method I1, I2, I3 or I4 (I)
Determine initialization characteristics MFV, CV, DV, SDV
While population not entirely filled
Create new empty float vector
While there are activities without float value
Select a random activity a
If activity a has no float value
Initialize activity a using initialization method I
Endif
Endwhile
Endwhile
Figure 8 Pseudo code Initialization methods
Other Initialization method
Another initialization method that was considered finds its background in RACP. It is
based of the maximum / minimum bounding strategy for determining the cheapest
resource availability levels for a project. (Demeulemeester, 1995)
The general idea is to calculate a resource availability constraint and use this
constraint in combination with a scheduling rule to generate initial schedules.
In a first step, calculate the minimum possible resource usage that would be
necessary if there would be a constant availability of resources over the span of the
project. This in fact, is equal to the resource usage of the most needy activity in the
project. Applying this to our example in figure 4, the minimum possible resource
usage is 10. This corresponds to the resource usage of activity 9. Secondly,
calculate the maximum possible resource usage that would be necessary if there
would be a constant availability of resources over the span of the project. For our
example, this resource usage is 19 and is the maximum resource usage that is
possible if activities 9, 10 and 11 would occur simultaneously. In a third step, the
29
activities will be scheduled using a basic priority rule and schedule generation
scheme. Step 3 is repeated with different priority rules and a resource constraint
ranging from 10 to 19. Each time step 3 is executed, this results in a schedule that is
put in the initial population.
4.3.2. Selection Once an initial population of schedules is available, we need to find a way to select
one or more schedules on which a certain operation will be performed later on.
These selected schedules are called parents. The key idea of this selection phase is
to select good parents, in order to give them an opportunity to pass on their good
genes onto the next generation. Likewise, this phase should also prevent the worst
solutions from passing on their inferior genes onto the next generation. (Sivaraj,
2011)
A distinction can be made between two types of selection methods or schemes, a
proportionate scheme and an ordinal-based scheme. (Sastry and Goldberg, 2001)
Using an ordinal-based scheme, the chance of an individual to be selected depends
on the ranking of the individual in the population based on a fitness measure.
With a proportionate scheme, the chance of an individual to be selected depends on
the relative fitness of the individual in comparison with the other individuals in the
population. In other words, with an ordinal-based scheme, the chance of being
selected merely depends on the fitness rank of the individual in the population, while
a proportionate scheme also takes into account how much one solution is better than
the other to determine the selection likelihood. The latter not only implies an order of
the individuals in the population but also a scaling measure to determine the relative
superiority of one individual to another.
In this section, we will take a closer look at three selection methods, a random
selection method, a roulette wheel selection method (proportionate scheme) and a
tournament selection (ordinal-based scheme).
30
S1 Random Selection
This selection method does not embed any intelligence. It merely selects 2
individuals randomly. This method does not give preference to individuals that are
more fit than others and therefore this method is perceived to be inferior compared to
other selection methods that make use of more intelligent criteria.
S2 Roulette Wheel Selection
As stated in the introduction, the roulette wheel selection method is a proportional
scheme to select individuals out of a population. The first step is to calculate a fitness
value for each of the individuals in the population. Depending on the algorithm being
used, this fitness value could be either the total cost (figure 2) or the average
squared deviation (figure 3). The second step assigns a probability to each individual
based on the fitness value. These probabilities are set out on a roulette wheel, the
bigger the probability of the individual to be taken, the bigger the circumference of
that individual on the roulette wheel will be. In the third step, the wheel gets spun and
wherever the wheel stops, this individual will be taken. To select another individual,
repeat the procedure starting from step two.
A small example will illustrate this method.
Assume five individuals and their respective fitness values. An increasing value
indicates a better individual. The circumference of the roulette wheel gets divided into
five parts, each part belonging to the selection of one individual. An example of the
division of the wheel is given in figure 9. The wheel gets spun and the individual is
chosen where the roulette wheel comes to a standstill.
31
Figure 9 Example Roulette Wheel
The pseudo code for the roulette wheel selection can be found in figure 10.
Calculate the fitness value of each individual
While necessary to select additional individual
Determine selection probability for each individual based on fitness value
Generate a random number, mimicking the spinning of a roulette wheel
Translate the outcoming number into the underlying individual
Endwhile
Figure 10 Pseudo code Roulette Wheel Selection
S3 Tournament Selection
The tournament selection method is an ordinal-based scheme to select individuals
out of a population. The first step consists of determining the rank-order of the
individual based on a fitness value. This value will again be the total cost or the
average squared deviation. This ranking will determine which individual wins in a
tournament. In the second step, two individuals are selected randomly. Thirdly, the
individual with the highest ranking wins the tournament and survives the selection
stage. Repeat steps two and three until enough individuals are selected.
The pseudo code for the tournament selection can be found in figure 11.
32
Calculate the fitness value of each individual
Make a ranking of the individuals based on fitness value
While necessary to select additional individual
Select 2 individuals randomly
Determine the winner based on the ranking
Endwhile
Figure 11 Pseude code Tournament Selection This tournament selection method can be extended by either adding additional
tournament rounds, such that an individual has to win 2 or more rounds before it is
selected or by adding more individuals competing in each round.
A big advantage of this method over the roulette wheel is that there are no scaling
issues. Since the tournament selection merely uses a rank, there is no need to
translate the difference in fitness in a different selection probability. (Whitley, 1989)
S4 Combined Selection
Selection method S4 uses all three aforementioned methods to select individuals.
Which method is used for each selection is determined on a random basis.
4.3.3. Operation On the (pair of) parents, operators will be executed in order to generate different
solutions, called ‘children’. The solutions can change drastically or just slightly.
The two main types of operations are crossover and mutation. (Luke and Spector,
1998) Crossover relies on the hypothesis that highly fit individuals in the population
consist of fit building blocks that can be mixed in order to become even more fit
individuals. It will push the population to converge into one or more local optima. This
process of convergence is often called intensification or exploitation. Mutation on the
other hand serves the goal of maintaining genetic diversity in the population.
Mutation thus fulfils the task of exploring the solution space. The aspects of mutation
and crossover are antagonists but are both equally important. On the one hand, we
need to be sure to have covered the solution space as much as possible through
genetic diversity. But on the other hand, we also want to make sure that we find the
best solution in the researched solution space through convergence into the best
areas.
33
The occurrence of crossover and mutation are reflected by their crossover rate and
mutation rate respectively, indicating the chance this operation will be executed on a
selected pair of parents. Fixing these rates to a general optimal value is very difficult
since they are very problem specific and they even depend on the stage of the
genetic algorithm. Research has been done on the determination of these values for
mutation and crossover both as a static constant and as a dynamic value, changing
over the course of the execution of the algorithm. (Lin et al, 2003)
In this section the following crossover operators will be discussed, including 1-point
crossover, 2-point crossover, blend crossover, mean crossover, extrapolation
crossover, uniform crossover and mutation.
C1 1-Point Crossover
This crossover operator heavily relies on the building block hypothesis as it cuts the
parents into two halves or blocks in order to recombine these blocks into the children.
The point where the parents should be cut is determined randomly. (Spears and
Anand, 1991)
An example is presented, 2 parents as float vectors containing the float value for 12
activities. The cut-off point is after the fourth activity.
a1 a2 ... a12 Parent 1 0 4 6 8 4 2 6 3 8 5 5 0
a1 a2 ... a12 Parent 2 0 2 1 4 3 9 5 7 3 10 2 0
These parents swap their float values after the cur-off point to create 2 children.
a1 a2 ... a12 Child 1 0 4 6 8 3 9 5 7 3 10 2 0
a1 a2 ... a12 Child 2 0 2 1 4 4 2 6 3 8 5 5 0
Assuming that parent 1 has a very fit first part and parent 2 has a very fit second part,
child 1 has a high probability of outperforming both parents.
34
C2 2-Point Crossover
This crossover operator is identical to the previous one besides the fact that there is
not a single cut-off point but 2 cut-off points that will chop the parent into 3 blocks to
be recombined. An example with cut-off points after the third and seventh float value.
a1 a2 ... a12 Parent 1 0 4 6 8 4 2 6 3 8 5 5 0
a1 a2 ... a12 Parent 2 0 2 1 4 3 9 5 7 3 10 2 0
Resulting children are constructed by swapping the middle block.
a1 a2 ... a12 Child 1 0 4 6 4 3 9 5 3 8 5 5 0
a1 a2 ... a12 Child 2 0 2 1 8 4 2 6 7 3 10 2 0
C3 Blend Crossover
Using the blend crossover operator, you do not copy and recombine genetic material
but you blend the corresponding genetic material, based on the distance between
them. (Eshelman and Schaffer, 1992) (Takahashi and Kita, 2001) The blending will
result into two values which are the boundaries for the newly generated child value.
The steps to be undertaken will be illustrated with an example.
a1 a2 ... a12 Parent 1 0 4 6 8 4 2 6 3 8 5 5 0
a1 a2 ... a12 Parent 2 0 2 1 4 3 9 5 7 3 10 2 0
Since the first activity has an identical float value, we will take the second activity as
an example. Firstly the distance between the two parents is calculated as the
difference between the respective float values, i.e. distance d = |4-2|.
(1)
In the second step, 2 boundary values (X1 and X2) are determined taking the lowest
float value, highest float value and the distance into account.
(2)
35
This would result in the following for our example.
(3)
In the third step, a new value gets generated randomly in the interval [X1,X2].
In our example, the interval is [1,5]. Do this for all activities until a entire child is
produced. The pseudo code for the blend crossover operator can be found in figure
12.
For every activity
Calculate the distance d between the two parents
Calculate the lower boundary, using distance d
Calculate the upper boundary, using distance d
Randomly generate a new float value between the boundaries
Endfor
Figure 12 Pseudo code Blend CrossOver
C4 Mean Crossover
The mean crossover operator will calculate the mean value of the two parents for
each activity and take this mean value as the new float value for the child (Wall,
1996). In case a non-integer value gets generated, the value will be randomly
rounded up or down. An example is shown below.
a1 a2 ... a12 Parent 1 0 4 6 8 4 2 6 3 8 5 5 0
a1 a2 ... a12 Parent 2 0 2 1 4 3 9 5 7 3 10 2 0
Applying the mean crossover operator will result in this child.
a1 a2 ... a12 Child 0 3 3 6 4 5 5 5 6 8 3 0
36
C5 Uniform Crossover
The uniform crossover is a very straightforward operator. To construct the child, for
each activity it will randomly take the float value of either parent. Extra intelligence
could be added in the sense that there is no random selection of the float value for
each activity but the fittest parent gets a higher probability. (Magalhães-Mendes,
2013)
Applying uniform crossover can result in he child shown below.
a1 a2 ... a12 Parent 1 0 4 6 8 4 2 6 3 8 5 5 0
a1 a2 ... a12 Parent 2 0 2 1 4 3 9 5 7 3 10 2 0
a1 a2 ... a12 Child 0 2 6 8 3 2 6 7 3 10 5 0
C6 Combined Crossover
This crossover method combines the five aforementioned crossover operators. Every
cycle, a new crossover operator gets chosen randomly.
Mutation
Mutation will be executed on a single individual. It does not combine 2 solutions but
merely alters an individual in a certain spot.
Mutation of an activity float value can be done neglecting the current value, meaning
that a reinitialization occurs. Another option is to take the current value into account
and mutate that value by adding or subtracting some value. Our mutation operator
neglects the current value.
This operator can be useful after a high number of generations, since at that point
solutions can converge. As mentioned in the introduction, mutation will maintain
some diversity that hinders the converging behavior of the population.
As the algorithm proceeds, it can be interesting to let the mutation rate evolve as
well. Modifying this rate inversely (proportional) to the population diversity could
prevent convergence. (Bäck, 1993)
37
4.3.4. Local Optimization
A regular genetic algorithm would, after a crossover operation, perform an evaluation
of the newly created child and consider whether to reinsert or to discard the child
from the population. However, we opted to insert local optimization or local search
first. Local search can optimize the children by looking into the nearby
neighbourhood in order to discover better solutions.
Local search has an intensification function as opposed to the diversification function.
Intensification means that you will further look into an area of the search space in
which you have found a solution yet, but want to optimize it further. Diversification
has the goal to look into unexplored search space in order to discover new valuable
solutions. (cfr. Mutation)
By applying local search iteratively on one solution, and by updating this solution by
its best neighbour, also known as hill climbing (Pisinger and Robke, 2010), you will
end up in a local optimum.
The local search operator can be a very simple swap operation or a small heuristic
that reschedules a part of the schedule.
This is very abstract but can be explained using a very simplified example in figure
13. This figure represents a two-dimensional landscape where the x-value represents
a location and f(x) represents the height of a certain location. The objective is to find
the location of the valley, i.e. the location with the lowest height. You could check
every location x going from 0 to infinite, calculate its corresponding height and then
conclude that the lowest point is location b. Another method would be to take random
location samples. These random locations are indicated by five arrows. Starting from
these five locations, you can explore the neighborhood for better locations. Starting
from location at arrow number two, you can search in two directions, right and left.
These two directions are called neighborhoods. When going to the right side you will
soon notice that the height is going up, so we will not explore that side.(hill climbing)
However when we go to the left, we notice the height to go down. Repeat this move
until you cannot go any lower. You will end up in point a. When following the same
neighborhood search strategy starting from arrow 3, you will end up in location b etc.
The great advantage of the second method, using local search, is that you need to
do less effort in order to find the valley. The disadvantage, however, is that you are
not sure whether you end up in the global optimum b or in a local optimum such as
38
location a and location c. But when the initial locations are strategically set out well,
the solution found should be close to the optimal solution.
Figure 13 Local search simplified
This example can relate to our problem in the sense that every x represents a
schedule or its float vector and f(x) represents the cost accompanying this schedule.
We want to find the lowest cost, so what we will do in the local search stage is to
alter a schedule a little bit in order to find better schedules. The project scheduling
problem, however, is a lot more complex because the neighborhood is extremely
large. Therefore good neighborhoods have to be defined in order to explore them
efficiently. Three local search strategies will be discussed into further detail, including
a simplified version of the Burgess and Killebrew heuristic, double-justification and 2-
exchange neighborhood.
L1 Burgess and Killebrew simplified RLP
This first local search method finds its origin in the heuristic proposed by Burgess
and Killebrew in 1962. The main reason for adopting this method is the observation
that, when keeping the length of the project constant, a more levelled solution will
return a lower cost. (observation in 6.1.) Since local search will hardly change the
length of the project, this gives the opportunity to translate the original problem into a
resource levelling problem.
We simplified the heuristic of Burgess and Killebrew in the sense that we do not use
any priority rule but merely drag activities back and forth in a random order. The
39
number of activities that will be considered for this dragging is determined by the
neighborhood depth variable. Furthermore we will not make use of the total sum of
squares but the average squared deviation (AVGSQDEV) as an indicator for
tendency towards a levelled solution. The following steps are executed in this
proposed method. Starting from a schedule, take a random activity and calculate its
earliest start time and latest start time. In terms of float values, the values represent 0
and the free float values respectively. Free float is defined as the maximum amount
of delay you can add to an activity without disturbing any subsequent activity. The
second step consists of an iterative procedure in which you schedule the activity at
each point in time between the earliest start and the latest start. At the end of each
iteration, a new schedule is constructed which gets evaluated in the third step using
the partial evaluation method, described in section 4.3.5. The essence of that
section states that the evaluation is not done based upon the fitness i.e. total cost of
the schedule but rather on the tendency towards a levelled solution. Based on this
partial evaluation, the best schedule is retained for further local optimization. These
three steps are repeated for a predetermined number of times that is comprised in
the neighborhood depth variable. The pseudo code for L1 can be found in figure 14.
For 1 to neighborhood depth
Select a random activity
Calculate the free float value or slack
For f: 0 to free float
Schedule the chosen activity using float value f
Calculate the AVGSQDEV for the resulting schedule
If Newly created schedule performs better
Retain this schedule for further local optimization
EndIf
EndFor
EndFor
Figure 14 Pseudo code LS1 Burgess and Killebrew simplified RLP
40
L2 Double-Justification
This method of double-justification comes from the area of the RCPSP in which the
project makespan minimization is the objective. (Muller, 2009) Double-justification
means that a schedule gets sequentially right-justified and left-justified. A right-
justified schedule is a schedule in which all activities are pulled as close as possible
to the end of the project while a left-justified schedule is one In which all activities are
scheduled as close as possible to the beginning of the project. A small adjustment
that we made to this procedure is the addition of a resource constraint during the
backward and forward scheduling that result into the right- and left-justified schedule
respectively. The following steps explain the procedure into further detail.
Start with calculating the maximum workload, this workload will be used later on as a
resource constraint. The second step consists of backwards scheduling, resulting in
a right-justified schedule. The order in which the activities are scheduled is
determined by their finishing time; the latest finishing activity gets scheduled first, as
late as possible and taking the resource constraint into account. The third step
consists of the forward scheduling resulting in a left-justified schedule. The order in
which the activities are scheduled is determined by their starting time; the earliest
starting activity gets scheduled first, as soon as possible and taking the resource
constraint into account. The pseudo code for this local optimization method can be
found in figure 15.
Calculate the maximum resource usage MAX, for resource constraint purposes
Determine order of activities for backward scheduling (~finishing times)
For every activity in order
Schedule activity as late as possible taking MAX into account
EndFor
Determine order of activities for forward scheduling (~starting times)
For every activity in order
Schedule activity as soon as possible taking MAX into account
EndFor
Figure 15 Pseude code Double-Justification
41
L3 2-Exchange Neighborhood
The 2-exchange neighborhood or 2-opt neighborhood is a local search application on
the travelling salesman problem (TSP). In the travelling salesman problem, two tours
are neighbors if one tour can be obtained starting from the other by exchanging 2
destinations. We try to apply this swapping mechanism to project scheduling. The
following steps explain how it works. The first step is the random selection of an
activity. The second step consists of determining all direct and indirect predecessors
and successors of the activity. These activities are neglected, since a swap in time of
the selected activity and these activities would result in an infeasible schedule due to
precedence relations. If there are activities with which the selected activity can swap,
execute this swap by exchanging the starting time of both activities. The pseudo
code of this local search method can be found in figure 16.
Select an activity a1 randomly
Find (in)direct predecessors and successors of the chosen activity, add them to list l
If there exist activities, not on list l
Select an activity a2, not on list l, randomly
Execute an exchange of the starting times of a1 and a2
Repair solution if necessary
EndIf
Figure 16 Pseudo code 2-exchange neighborhood L4 Combined Local Search
During the execution of the genetic algorithm, all three of the previously described
local search methods are used in a random fashion.
4.3.5. (Partial) Evaluation
When new individuals are generated, it is interesting to evaluate them, i.e. calculate a
fitness value. This section contains two measures for evaluation, i.e. the total cost in
the evaluation phase and the average squared deviation of labor consumption
(AVGSQDEV) in the partial evaluation phase. Both these evaluation measures are
non-regular measures of performance.
42
4.3.5.1. Total Cost (Evaluation)
To check the total cost of a schedule, we will let the staffing algorithm process the
generated schedule. The input is thus a project schedule and the outcome is a set of
workers with a certain work pattern. Based on this outcome, the total staffing cost of
the project can be calculated. The staffing is formulated and solved as a linear
programming problem. The solution of this problem is deterministic and optimal
meaning that given a certain input, always the same output is generated
(deterministic) and this output is the best possible (optimal). This linear problem was
provided by Prof. Dr. Maenhout and encoded into C++ which calls the Gurobi
Optimizer, a mathematical programming solver for different problems such as linear
programming and mixed integer programming problems.
This total cost calculation is the most important measure for fitness of a schedule, it
embodies the objective function as defined in section 2.3. However this calculation
takes a lot of computational effort, ranging from two to six seconds, depending on the
project length. Therefore, a second genetic algorithm framework is designed to
eliminate the calculation of the total cost as much as possible while affecting the
quality of the final schedule as little as possible.
4.3.5.2. Average Squared Deviation (Partial Evaluation) When doing a partial evaluation, the staffing algorithm will not be executed. This
performance measure is designed as an alternative for the total cost measure. It
represents a resource levelling approach. Observations in section 6.1 confirm that
the use of this measure is adequate for comparing schedules with an identical project
length. In that case, a lower AVGSQDEV will on average result in a lower total cost.
This assumption can be applied to the local optimization stage. At the end of each
search cycle, a performance measure needs to be calculated to determine the best
schedule in the neighborhood. Since these neighborhood search methods hardly
change the length of the project, the link between AVGSQDEV and total cost is
justified. This previously mentioned use of partial evaluation is present in both
genetic algorithm frameworks.
In order to further decrease computational efforts, the second framework goes one
step beyond. The second framework also uses the AVGSQDEV measure to decide
43
upon the reinsert of a schedule into the population. This often requires a comparison
of schedules with very different project lengths. Observations in section 6.1 do not
justify the use of the AVGSQDEV in this case, however it does not necessarily prove
that it should not be used. The big advantage of this method is the small
computational effort required for this calculation. The biggest disadvantage of this
method is that it is an approximation of the total cost measure, which means that a
lower AVGSQDEV will not necessarily always result in a lower total cost. This could
lead to situations where a schedule with a better AVGSQDEV but worse total cost
will replace a schedule with worse AVGSQDEV but better total cost during the
reinsert stage.
The formula to calculate the AVGSQDEV is given in figure 17.
Figure 17 Average squared deviation of resource con sumption T set of days in the scheduling horizn (index t)
A set of activities in the project (index j)
rj number of resources necessary to execute activity j
PL project schedule length
average resource use
This AVGSQDEV gives an impression of how levelled the resource demand of a
project schedule is. A low AVGSQDEV means that the resource consumption
throughout the project schedule is relatively equal and thus tends to be levelled. A
high average squared deviation points out that there will be big differences in
resource consumption between the different days in the schedule. Alternative
formulations of the levelling measure are the sum of squared deviations, the
weighted jumps in resource usage and the sum of absolute deviations. (Herroelen et
al., 1997)
44
4.3.6. Reinsert When new members (children) are generated, they should have the possibility to
enter the population. This section will handle this entering of the population. In the
first part, reinsert conditions are discussed. These conditions will decide whether a
newly created child is eligible to enter the population. The second part will handle the
population management. Population management will dictate how the population will
evolve throughout the generations.
4.3.6.1. Reinsert Conditions The algorithm can decide to let a child, that is generated out of existing parent
individuals, enter the population or not. Different conditions can be distinguished.
Some are relatively loose and will be passed very easily, making the population
improve slowly; other conditions are rather strict and as a consequence the
population fitness will improve either very fast or not at all. The insert condition
always results in the comparison of two or more schedules based on a performance
measure. For framework1, this performance measure is the total cost while for
framework2, this performance measure is the AVGSQDEV. The reinsert conditions
are described, assuming a steady-state population. This results in the replacement of
an existing member schedule by a new schedule. However as 4.3.6.2. points out,
also generational populations will be tested. In that case, there is no replacement of
an existing member but merely the addition of a new member into a new population.
R1 Outperform weakest schedule
This reinsert condition firstly calculates the performance measure of the newly
generated schedule and compares this to the performance measure of the weakest
schedule in the population. If the new schedule outperforms the weakest schedule,
this weakest schedule will be replaced by the new one.
45
R2 Outperform one parent
This reinsert condition firstly calculates the performance measure of the newly
generated schedule and compares this to the performance of both parents. If the new
schedule outperforms at least one parent schedule, the weakest parent will be
replaced by the new schedule.
R3 Outperform both parents
This reinsert condition firstly calculates the performance measure of the newly
generated schedule and compares this to the performance of both parents. If the new
schedule outperforms both parent schedules, the weakest parent will be replaced by
the new schedule.
R4 Outperform 25% of existing population
This reinsert condition firstly calculates the performance measure of the newly
generated schedule and compares this to the performance measure of all member
schedules in the population. If the new schedule outperforms at least 25% of the
schedules in the existing population, the weakest schedule will be replaced by the
new one.
R5 Outperform 50% of existing population
This reinsert condition firstly calculates the performance measure of the newly
generated schedule and compares this to the performance measure of all member
schedules in the population. If the new schedule outperforms at least 50% of the
schedules in the existing population, the weakest schedule will be replaced by the
new one.
R6 Outperform 75% of existing population
This reinsert condition firstly calculates the performance measure of the newly
generated schedule and compares this to the performance measure of all member
schedules in the population. If the new schedule outperforms at least 75% of the
schedules in the existing population, the weakest schedule will be replaced by the
new one.
46
Doubles
On top of the aforementioned reinsert conditions, an extra condition can be added.
This is the so-called doubles condition. This condition prohibits a new individual to
enter the population if that individual already exists in the population. It thus promotes
diversity in the population and will prevent convergence into local optima.
4.3.6.2. Population management
Population management or reproduction strategies dictate how the population
evolves throughout the execution of the algorithm. Two alternative startegies, i.e.
steady-state resulting in an overlapping population and generational reproduction
(Syswerda, 1991) resulting in a non-overlapping population are considered and a
hybrid form. (Noever, Baskaran, 1992)
P1 Steady-state population
This population management mechanism states that only a few individuals get
replaced at a certain point in time, keeping the size of the population constant. Which
individuals get replaced is defined by the reinsert condition. This results in the fact
that populations will overlap or different generations will overlap. It means that a child
(new generation) can enter the population of parents (old generation), and can mate
with its parents’ generation.
P2 Generational population
On the opposite side of the steady-state population, there is the generational
population. This population management mechanism will not replace individuals
separately but replace an entire population at once. It means that no replacement or
intermediate deletion of schedules from the population occurs. When enough
children are produced that pass the reinsert condition, the existing population will be
replaced by this new population of children. However, in order not to lose high-
performance member schedules from the previous population, a form of elitism or
elitist model is applied. (De Jong, 1975) We applied a simple elitist policy, stating that
when replacing the old population by a new population, the top 20% best schedules
of the old population should be forced into the new population. Depending on the
47
used framework, the performance measure to determine the best schedules is the
total cost and the AVGSQDEV for framework1 and framework2 respectively.
P3 Hybrid populations
This population management mechanism integrates the two previous systems. In the
beginning the population will act as a steady state population. After a certain amount
of replacements (80% of the population size), the population will act as a
generational population and it will be replaced as a whole.
4.3.7. Ending condition The cyclical character of the genetic algorithm makes it a never ending process.
Even when a local optimum or good solution is found, it can still generate new
children. A clear ending condition has to be stated to end the cycle. This ending
condition is a limitation and will never improve the objective function but rather limit
the time spent.
Static Ending Condition
This is a predefined, fixed ending condition which is not influenced by the course of
the genetic algorithm. Examples of this kind of ending condition are summed below.
• The algorithm stops after X operations
• The algorithm stops after X time units
• The algorithm stops after X evaluations
Dynamic Ending Condition
A dynamic ending condition is a condition that is influenced by the course of the
genetic algorithm. Examples of this are listed below.
• The algorithm will stop if no improvement is found in X operations
• The algorithm will stop if no improvement is found in X time units
• The algorithm will stop if the population contains X duplicates (if duplicates are
allowed)
• The algorithm stops after the fittest value of the population is lower than X
48
Hybrid ending condition
Dynamic and static ending conditions can be used in harmony. The static ending
condition will carry out the upper limit of the execution of the algorithm, expressed in
number of operations or time units etc. The dynamic ending condition would play the
role of early showstopper in case the solution is either satisfying enough or there is
no hope for improvement left and thus interrupting the execution is appropriate.
49
5. The algorithm Based on the computational results in table 8 of 6.3.1., a definite form of algorithm is chosen. The framework of choice is framework 2. Although this framework performs slightly worse than framework 1, the execution time is significantly shorter and thus better. Combination of methods and parameters:
• Population size 20 • Number of operations 100 • I1b Random Initialization
o MFV = 2 x AVG duration of activities • S3 tournament selection • C5 Uniform crossover • No mutation • L1 Burgess and Killebrew simplified RLP
o neighborhood depth =5 • R1 reinsert weakest in population
o P1 Steady state population o No doubles allozed
• No ending condition
50
6. Computational experiments In this chapter, the computational experiments of the methods described in chapter
four will be presented. In order to obtain the results, the solution methods are coded
into C++. The coding of this program took a couple hundreds of hours. This program
is then run on an Intel i3 core processor 2,13 Ghz and 4 Gb RAM. Executing this
program took several hundreds of hours of computational time.
In section 6.1. we provide an observation when first executing test cycles and
exploring the data it produces. Section 6.2. presents the benchmark for our test, this
benchmark gives an indication of how good the results, generate by the algorithm,
are. The last section contains the actual test results.
6.1. Observation link AVGSQDEV – Total cost When executing first test cycles, we were looking for performance measures besides
the total cost. Other measures, such as average float, project length and average
squared deviation were monitored. Afterwards, we tried to find connections between
these measures. One connection, between AVGSQDEV and total cost, is significant
and very useful practically. We plotted all schedules on a dispersion graph with
AVGSQDEV on the x-axis and Total cost objective on the Y-axis, which is shown in
figure 18. No clear correlation can be observed.
Figure 18 AVGSQDEV - Total Cost dispersion for all project lengths
Average Squared deviation - Objective
170
190
210
230
250
270
0 50 100 150 200 250 300
Average Squared deviation
Obj
ectiv
e
51
However if we plot that same dispersion graph, but grouped per project length, we
notice a graph like figure 19. This graph clearly shows the dispersion for all
schedules with a project length of 11 days and a positive correlation between the
total cost objective and the AVGSQDEV can be observed.
Figure 19 AVGSQDEV - Total Cost dispersion for proj ect length of 11 days
More dispersion graphs for other project lengths can be found in appendix D.
The conclusion of this observation is that the AVGSQDEV measure is a good relative
approximation for the total cost objective when comparing schedules.
6.2. Benchmark In order to assess the quality of a schedule produced by the algorithm, a benchmark
is necessary. The best benchmark is the comparison of a schedule to the optimal
schedule. However the optimal schedule is unknown and therefore we cannot
calculate its quality for benchmark purposes. As an alternative, we can relax the
problem by eliminating some assumptions and restrictions to the extent that it is
possible to find the optimal solution for this relaxed or simplified problem. Since the
problem is relaxed, it can be assumed that the optimal solution of the relaxed
problem will always outperform the heuristic solution of the original problem. This
creates a lower bound to the cost minimization problem. Figure 22 represents the
scheduling cost minimization problem.
Average Squared deviation - Objective
170
190
210
230
250
270
0 20 40 60 80 100 120 140 160 180 200
Average Squared deviation
Obj
ectiv
e
52
The vertical downwards arrow in
figure 22 depicts an axis on which
schedules can be ordered from low
quality to high quality, i.e. high
staffing costs to low staffing costs
for the scheduling cost minimization
problem. The blue dot is the best
schedule that is found by the
genetic algorithm, this is the
schedule we want to compare to a
benchmark.
The preferred benchmark is the optimal schedule, which divides the search space
into the feasible region and the infeasible region. We can calculate a lower bound,
which is in the infeasible region because the problem is relaxed in this calculation.
The goal is to find a lower bound, as close as possible to the optimal schedule and
thus minimize the GAP in figure 22 as much as possible. This inhibits a trade-off. On
one hand, increasing relaxation will increase the ease of calculating a lower bound
but on the other hand, increasing relaxation will increase the gap between the lower
bound and the optimal schedule and thus limit the quality of the benchmark. After the
lower bound to the relaxed problem is found, assumptions or constraints can be
added again in order to improve the lower bound quality i.e. bringing it closer to the
optimal schedule of the initial problem. The relaxation executed on our problem
seems drastic, but section 6.3 will prove that they are not overdone.
The first and major simplification is neglecting the complete project scheduling part.
This includes the structure of the activity network with its precedence relations as
well as the possible resulting schedules concerning schedule make span and
resource demand distribution over the duration of the project.
We assume that the total resource demand, defined as the sum of resource
demands of all activities, is spread out equally over the duration of the project. In
order to be able to compare the three different project activity networks, the total
resource demand for each network is equal and set to 143. This means that, if a
project takes ten days, the daily resource demand is 14,3 . The focus of the lower
bound calculation is put on the staffing problem. Small eliminations and loosening of
constraints will be applied there. The first relaxation in the staffing is that we will only
Figure 20 Benchmark setup
53
make use of regular time units which have a cost of 2, since these are the cheapest.
This means that we do not make use of overtime units and external time units,
incurring a respective cost of 3 and 4. Also the possible presence of idle time, which
costs 1, is neglected. The relaxation implicitly defines every worker to work five days
per week. He cannot work more, because that would imply an overtime unit and he
shouldn’t work less because idle time is neglected.
The second relaxation is that we allow ‘fractional employees’ to work. For example, if
there is a workload that would require 9,5 workers to execute, the real problem would
require to hire 10 workers to execute the load. The relaxed problem however accepts
this 9,5 fractional value.
Based on these relaxations, a lower bound for the cost of a schedule can easily be
calculated. This lower bound is dependent on the duration of the project and given in
figure 21.
Figure 21 Lower bound benchmark in function of proj ect duration
An example of the calculation is given for a project duration of 17 days. (lower bound
= 374)
Total resource demand 143 units (A)
Project duration 17 days (B)
Max available days per worker 13 days (C)
Amount of workers necessary (A/C) 11 workers (D)
Cost per worker per day of project 2 (E)
Total Cost (E*B*D) 374
Lower Bound
335
345
355
365
375
385
395
405
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Project Duration
Tot
al C
ost
Lower Bound
54
The calculation of the max available days per worker is based on the fact that each
worker works five days per week, meaning that he will have 4 days off in the span of
17 days. If the project would have a duration of 18 or 19 days a worker would also
have 4 days off. However if the project would have a duration of 20 or 21 days, every
worker is required to take 5 and 6 days off respectively.
This evolution in ratio between days off and project duration explains why the lower
bound graph goes down during the first five days of the week and makes an upwards
spark during the last 2 days of the week. (figure 21) The evolution in that ratio also
explains the general upwards trend of the cost in function of the project duration.
The lower bound graph applies to all three project activity networks since the total
resource demand of each project activity network is identical and the remainder
differences are relaxed. The graph can be limited by cutting off an upper and lower
bound to the project duration. The upper bound is the fixed project deadline of 21
days, this is identical for every activity network. The lower bound represents the
critical path of the activity schedule and is different between activity networks. If, for
example the critical path is 14 days, it Is impossible to attain a cost of 343, which can
only be accomplished with a project duration of 12 days.
6.3. Results In the results section, three topics will be discussed. The first topic handles the basic
cycles. These are the cycles necessary to determine the best combination of
methods, described in section 4.3, for each different framework of the algorithm and
for each different activity network. There are 2 frameworks of the algorithm and 3
different activity networks. The second section on the stage contributions will consist
of an analysis that determines the importance of each stage in the genetic algorithm.
By removing or altering these stages and measuring the consequences on the total
cost, the importance is quantified. This will only be done for the best algorithm on
AN2. The third section consists of tuning different parameters in the algorithm; this
also includes a sensitivity analysis on some parameters, emphasizing their
importance or worthlessness. This will also only be done for the best algorithm on
AN2.
55
6.3.1. Basic cycles A basic cycle contains the process of finding the best algorithm for an activity
network in combination with an algorithm framework. Since we have two frameworks
and three activity networks, six basic cycles will be performed. A basic cycle consists
of several phases. In each phase, the algorithm framework is run multiple times,
each time the algorithm consists of different solution methods as defined by section
4.3. In the first phase or ‘Base case’, we allow all solution methods to be chosen. In
the last or ‘Best’ phase, only one solution method per stage remains. In the phases in
between, we gradually eliminate the eligible solution methods by removing the least
performing or by retaining the best performing methods. In phase ‘Best X’, we make
sure that only about ten different combinations of solution methods remain.
Elimination is based on the cost that is on average associated with the solution
method. A summary of the elimination process of each basic cycle is given in
appendix B, which states every eligible solution method at a certain phase in the
basic cycle in combination with the average cost of that phase.
A single exception to the elimination rule, which states that the least performant
solutions methods get eliminated, is made for local optimization method L2. This
method is often removed from the possibilities despite its seemingly high
performance. The algorithms that made use of L2 show no or very limited diversity in
its resulting best solution. This gives an indication that no real exploration of the
neighborhood occurs in spite of finding a good solution.
For each basic cycle, the minimum, maximum and average costs per phase are
indicated in a table and a graph. The tables and graphs for AN2 are shown below,
the tables and graphs for AN1 and AN3 are available in appendix C.
AN2 Framework 1
Phase Average Min Max Base Case 402,06 383 450 1st Exclusion 391,04 380 410 2nd Exclusion 388,91 378 401 Best X 386,74 373 405 Best 385,34 373 400
Table 6 Total Cost evolution basic cycle AN2 Framew ork1
56
Figure 22 Total Cost evolution basic cycle AN2 Fram ework1
AN2 Framework 2
Phase Average Min Max Base Case 403,51 383 458 1st Exclusion 389,67 373 416 2nd Exclusion 389,09 377 414 Best X 387,03 378 402 Best 386,29 378 403
Table 7 Total Cost evolution basic cycle AN2 Framew ork2
Figure 23 Total Cost evolution basic cycle AN2 Fram ework2
Total Cost evolution
370
380
390
400
410
420
430
440
450
460
Base Case 1stExclusion
2ndExclusion
Best X Best
Phase
Tot
al C
ost
Average
Min
Max
Total Cost evolution
370
380
390
400
410
420
430
440
450
460
Base Case 1stExclusion
2ndExclusion
Best X Best
Phase
Tot
al C
ost
Average
Min
Max
57
Both frameworks for AN2 show a similar behaviour and return similar results whereby
framework 1 slightly outperforms framework 2 in total cost for the best case. The
most significant improvement is made during the first exclusion. This is because in
the phase before, i.e. the base case, the local optimization method L1 shows that it is
significantly better than all others. So in the first exclusion phase, all inferior local
optimization methods are eliminated. The local optimization stage proves to be the
most vital stage for the algorithm. (cfr. Section 6.3.2.)
The average total cost of the solutions generated throughout the different phases
decreases steadily. This is primarily achieved by eliminating the possibility of
returning a bad result, which is represented in the maximum cost generated per
phase.
Basic cycle Initi
aliz
atio
n
Sel
ectio
n
Ope
ratio
n
Loca
l Opt
imiz
atio
n
Rei
nser
t
Dou
bles
Pop
ulat
ion
man
agem
ent
Ave
rage
Tot
al C
ost
Bes
t Tot
al C
ost
Low
er b
ound
Ave
rage
Tot
al C
ost d
evia
tion
from
low
er b
ound
Bes
t Tot
al C
ost d
evia
tion
fr
om lo
wer
bou
nd
AN1 framework1 I1a S2 C5 L1 R1 N P1 355 352 343 3,57% 2,62% AN1 framework2 I1a S2 C5 L1 R2 N P1 366 355 343 6,61% 3,50% AN2 framework1 I1b S2 C5 L1 R1 N P1 385 373 362 6,45% 3,04% AN2 framework2 I1b S2 C5 L1 R1 N P1 386 378 362 6,71% 4,42% AN3 framework1 I3b S2 C5 L1 R1 N P1 384 378 362 6,16% 4,42% AN3 framework2 I3b S2 C5 L1 R2 N P2 386 380 362 6,63% 4,97%
Table 8 Best solution methods, total cost and lower bound per basic cycle
Table 8 is the summary of the execution of the basic cycles. It shows the best
combination of solution methods for each basic cycle. The best selection, operation
and local optimization method is identical for all basic cycles. Also the use of doubles
in the population is not recommended regardless the activity network or framework.
The best initialization method is independent of the chosen framework and seems to
rely on characteristics of the activity network. The best reinsert method for
framework1 is R1, where the weakest member of the population gets replaced while
R2, i.e. the replacement of a random parent, is the best reinsert method for
framework2 in combination with AN1 and AN3. The most widely used population
management mechanism is the use of steady-state populations (P1). Only for AN3
framework2, the use of generational populations (P2) seems more beneficial.
58
The best solution method combinations for all basic cycles are very similar and also
yield similar results compared to the lower bound benchmark. The distance between
the average total cost, which is the total cost the basic cycle using the best methods
returns on average, and the lower bound benchmark is 6% - 7%. AN1 framework1 is
an exception with a distance of only 3,5%. Knowing that the lower bound returns a
better cost value than the optimal total cost, we know that the actual distance
between the average total cost and the optimal total cost is even smaller than the
6%-7% and 3,5%. If the best total cost, which is the best cost the basic cycle using
the best methods returns, is compared to the lower bound benchmark, a gap of 2%-
5% is noticed. The relative gaps between the total costs and the lower bound also
illustrate that framework1 consistently outperforms framework2 in terms of fitness.
6.3.2. Stage contributions In this section, we will further investigate to which extent each stage in the genetic
algorithm contributes to the total cost objective function. This contribution testing is
done by either removing a certain stage from the genetic algorithm or by replacing
the method by its worst alternative. This is only applied to AN2 framework2.
Initialization
Removing the initialization stage is simply impossible since an initial population
needs to be constructed with some logic. Therefore we compare the best initialization
method (I1b) to the worst tested initialization method. A deterioration of the total cost
from 386,29 to 392,86 or 1,7% is determined.
Selection
The best selection method is the tournament selection (S2). When replacing this
method by random selection, while keeping all other methods equal, only a small
deterioration from 386,29 to 386,92 or 0,16% is assessed.
Operation
The best method for this stage is the uniform crossover operator. When executing
the same algorithm but without this operator, the total cost goes up from 386,29 to
390,77 or 1,16%.
59
Local Optimization
The best local search method L1, which searches in the neighborhood for more
levelled schedules is the best local optimization method being tested. However, if we
remove this stage from the algorithm while keeping all other stages equal, the total
cost worsens drastically from 386,29 to 403,36 or 4,42%.
Reinsert condition
The best reinsert condition under consideration is the replacement of the weakest
member in the current population (R1). When removing this condition and thus
always reinsert a newly created child into the population, replacing an existing
member randomly, the total cost increases from 386,29 to 391,35 or 1,31%.
Population management
The best population management mechanism is the use of steady-state populations
(P1). However when applying the worst alternative method, i.e. generational
populations (P2), the total cost only worsens from 386,29 to 387,76 or 0,38%.
Conclusion
The local optimization stage has the most significant impact on the total cost, it
account for over 4% when removing this optimization and keeping all other stages
equal. The initialization, operation and reinsert condition have a moderate impact of
little over 1%. The selection stage and the population management have a very
minor impact on the total cost.
6.3.3. Sensitivity Analysis In this section, the computational results of a sensitivity analysis are shown. The start
case is the best algorithm associated with AN2 framework2. (also specified in
chapter 5) In the remainder of the section, we will refer to the configuration of the
algorithm as the ‘start case’. In this section we will focus on the influence of changing
some parameters on the outcome of the algorithm.
First of all, we will have a closer look at the population size and the number of cycles
or operations performed. Then we will check the influence of allowing doubles to
60
enter the population, the number of iterations or neighborhood depth in local
optimization, the use of mutation and an ending condition on the total cost of the
resulting schedule and the execution times necessary to obtain these schedules.
Population size and number of operations
The start case considers a population size of 20 schedules and the execution of 100
operations. We now expand these possibilities to 10, 20, 50 and 100 schedules as
population size and 50, 100, 250, 625, 1500 and 3000 as number of executed
operations. Besides these two dimensions, which will guide us, the total cost and
execution times are 2 other resulting dimensions that will be monitored.
Increasing the population size and the number of operations to be executed will
increase the execution time. However the relationship between the population size /
number of operations and the total cost is not that straightforward. An assumption
could be that more operations will result in better total cost. This is not necessarily
true because of the setup of framework2. As section 4.1.2. states ‘The disadvantage
(of framework2) however is that it is possible that the best schedule present in the
population gets replaced by another one during the execution of the algorithm.’ This
is exactly what happens after a very large amount of operations. The AVGSQDEV in
the population becomes extremely low and good solutions get replaced by worse.
This is illustrated by figure 24.
Figure 24 Total cost Vs number of operations for di fferent population sizes
Total cost Vs number of operations for different po pulation sizes
382
384
386
388
390
392
50 100 250 625 1500 3000
Operations
Tot
al C
ost 10 schedules
20 schedules
50 schedules
100schedules
61
For small population sizes of 10 or 20 schedules, the total cost seems to deteriorate
when operating a high number of operations. For population sizes of 50 and 100
schedules, this phenomenon is also assumed to happen, be it at much higher levels
of executed operations. In general we can assume the graphs to be U-shaped,
whereby the second leg of the U does not come that high. The optimal number of
operations, i.e. the bottom of the U-shape, increases with the size of the population.
Similarly, the influence of increasing population sizes on the total cost, when keeping
the number of operations constant can be depicted. This is done in figure 25.
Figure 25 Total cost Vs population size for differe nt number of operations
For algorithms with a large number of operations, the increased size of the
population is a positive influence on the total cost. This can be explained by the fact
that a larger population size will accommodate a larger extent of diversity and thus
better exploration of the solution space which increases the chance at finding better
solutions. However for algorithms with a relatively low number of operations, a large
population size had a negative influence on the total cost. This is graphically shown
in figure 25 where the pink and yellow graphs go upwards when applying large
population sizes. This can be explained by that fact that the number of operations is
so low, in comparison with the population size, that the algorithm has no chance to
nest itself into a (local) optimum.
Total cost Vs population size for different number of operations
376
378
380
382
384
386
388
390
392
394
10 20 50 100
Schedules
Tot
al C
ost
50 operations
100 operations
250 operations
625 operations
1500 operations
3000 operations
62
For academic purposes, the interaction between the population size / number of
operations and the total cost is interesting. However for practical purposes, the
relationship between the execution time and the total cost is much more interesting.
Therefore we combine the aforementioned two logic statements:
• The number of operations and the population size has an influence on the
execution time (logic statement 1)
• The number of operations and the population size has an influence on the
total cost (figure 24 and figure 25) (logic statement 2)
We translate these into a relationship between execution time and the total cost. This
logic and its four dimensions are embodied in figure 26 and represent a trade-off
between total cost and execution time.
Figure 26 Trade-Off Total Cost Vs. Execution Time
Every dot in figure 26 represents a combination of population size and number of
executed operations. For example, the blue graph represents the dots with a
population size of 10 schedules. The first dot in the blue graph stands for 50
operations, the second for 100 operations, the third for 250 operations and so on.
Every dot also has an according execution time (logic statement 1) and a total cost
(logic statement 2) which are both dependent on the population size and number of
operations. This results in a total cost versus execution time trade-off. The trade-off
should be read as follows: When keeping the population size constant, an increasing
execution time will on average improve the total cost and vice versa. Note the
upwards trend in the graph for population sizes of 10 and 20 schedules, this has the
same reasoning as the one explained by figure 24.
Trade-off total cost Vs. execution time
382
384
386
388
390
392
0 200 400 600 800 1000 1200
Execution time (seconds)
Tot
al C
ost 10 schedules
20 schedules
50 schedules
100schedules
63
The trade-off in figure 26 can be molded into an efficient frontier by removing all
inefficient dots from the graph. An inefficient dot is a dot that is outperformed by at
least one other dot on both the time and cost dimension. Conversely we retain only
the efficient dots, which are defined as every dot that is not outperformed by any
other dot on both time and cost dimension. This efficient frontier is shown in figure
27. The color of the frontier graph indicates the population size as indicated in figure
26. A practical application of this efficient frontier is that the algorithm chooses the
population size and number of operations based on an amount of time, that the
algorithm is allowed to run, entered by the user. For example, I want the algorithm to
give me best schedule in an amount of 250 seconds. By checking the efficient
frontier, the best schedule that can be found, within 250 seconds of execution time is
the algorithm with a population size of 20 and 625 operations to be executed. The
expected total cost of the resulting schedule is slightly below 385.
Figure 27 Efficient Frontier Total Cost Vs. Executi on Time
Doubles
In the start algorithm, we do not allow doubles to enter the population. This primarily
inhibits premature convergence of the schedules in the population. However, if we
allow doubles to enter, the total cost of 386,29 deteriorates to 390,62 or 1,12%.
The influence of allowing doubles or disallowing doubles to enter the population is
dependent on the size of the population and the number of operations.
Efficient frontier total cost Vs. execution time
382
383
384
385
386
387
388
389
390
391
392
0 200 400 600 800 1000 1200
Execution time (seconds)
Tot
al c
ost
64
Allowing doubles has as a direct implication that the population can converge faster
towards a (local) optimum. Nevertheless this optimum seems to be of lower quality in
most cases. This is because the converging of the population prevents the program
from exploring the broader solution space in order to find better solutions.
In table 9 we calculated the difference in total cost between the start algorithm with
doubles and the start algorithm without doubles, for each combination of population
size and number of operations. An example of how the table should be read: When
using the start algorithm with population size 10 and 50 operations, the total cost of
the algorithm that uses doubles is 5,24 higher than the algorithm version without
doubles. Overall, the algorithm without doubles is the best however we indicated a
region in yellow where the algorithm with doubles outperforms or returns similar
results to the algorithm without doubles. This area has relatively big population sizes
and a relatively small number of operations. The reason why the algorithm with
doubles is able to perform well in this area is because it forces the population to
converge faster into (local) optima while the algorithm without doubles will converge
slower towards possible (local) optima.
Population size 10 20 50 100 Operations 50 5,24 -1,17 0,22 -0,69 100 5,17 4,33 0,2 0,46 250 10,76 8,54 2,13 -0,2 625 9,44 10,57 7,61 6,06 1500 8,75 11,13 11,69 12,86 3000 8,25 9,74 12,05 14,51
Table 9Total cost comparison with and without doubl es in the population
When plotting this algorithm as an efficient frontier, we assess that this new efficient
frontier is entirely outperformed by the efficient frontier of the algorithm without
doubles. We can conclude that none of the combinations of population size and
number of operations in the yellow area of table 9 is on the efficient frontier of the
algorithm with doubles.
The newly added efficient frontier in figure 29 is plotted above the previously
determined efficient frontier.
65
Figure 28 Efficient Frontier Total Cost Vs. Executi on Time, with / without doubles
Local search iterations
For local optimization method L1, an extra parameter can be tuned. This parameter
defines the number of local search iterations or the neighborhood depth of the
method. As defined in section 4.3.4. this parameter determines how many times we
consecutively check the neighborhood starting from a different activity. Logically, the
more iterations, the better the outcome will be.
Figure 29 Total Cost Vs. Number of Iterations Figure 29 shows a decreasing improvement of the total cost. While the improvements
are significant between one and five iterations, the improvement at five or more
iterations stagnates.
Efficient frontier Cost Vs. Execution time
382
384386
388
390
392394
396
0 200 400 600 800 1000 1200
Execution time
Cos
t
Cost Vs. neighborhood depth
385
386
387
388
389
390
391
392
393
394
1 3 5 7 9
Neighborhood depth
Tot
al C
ost
Total cost
66
Mutation
Figure 30 Total Cost Vs. Mutation Percentage Figure 30 shows the evolution of the total cost when increasing the mutation
probability. This graph does not provide us enough information to make a clear
statement on the use of mutation. Nevertheless we can state that a high mutation
percentage will probably not enhance the working of the algorithm. The reason could
be that the operation and local search process probably already contain enough
exploration possibilities. The start algorithm does not make use of mutation. As
stated in chapter 5, the local search method is very dominant and has a very
intensifying function. This could push the cross-over operator into the direction of an
exploring function, in order to become better solutions. The used cross-over operator
c5 is uniform cross-over, which is especially in the beginning of the algorithm
execution very exploratory. Since the cross-over method bears the exploratory
function, mutation is no longer necessary. We must note that this is a wild guess.
Ending condition
An ending condition will end the execution of the algorithm before it reaches the
predefined number of operations and will thus shorten the execution time of the
algorithm. However the goal is to shorten the execution time without affecting the
objective function too much. Before fine-tuning this parameter, we created an
indicator stating the moment when the best solution is found. This indicator is
represented as a percentage, the best schedule percentage or BSP. A value of 40%
means that the best schedule of the algorithm was found after 40 percent of the
Total cost Vs Mutation rate
386
386,2
386,4
386,6
386,8
387
387,2
387,4
387,6
387,8
0 2 5 10 25
Mutation rate (in %)
Tot
al C
ost
Total cost
67
execution time and that 60 percent of the execution time is wasted. The BSP
however does not give any indication on the quality of the solution.
Figure 31 BSP Vs number of operations
Figure 31 shows the influence of the number of operations on the BSP. An increase
of the number of operations logically decreases the BSP.
Overall, the size of the population increases the BSP. Also this is very logic since it
takes more operations in a big population to become good solutions than in smaller
populations.
In this section, we will zoom in on the algorithm setup with a population of 20
schedules and 100 operations. This algorithm has a BSP of 57% and an expected
total cost of 386,29. Since the objective function calculation, in framework2 of the
algorithm, happens completely at the end of the algorithm, the BSP is only known
afterwards when the whole algorithm is run. Therefore we cannot calculate whether
improvement is made or not during execution. We will again rely on the AVGSQDEV
as a relevant approximation for the total cost objective. When no improvement in this
indicator is assessed, the execution can stop.
We ran the algorithm on an ending condition of 5, 10, 20 and 30. This means that, if
no improvement in AVGSQDEV is respectively found after 5, 10, 20 or 30 operations,
no more operations will be executed and the termination of the algorithm will start by
evaluating the schedules present in the population.
Figure 32 shows the influence of the ending condition on the total cost of the
schedule. A strict ending condition can lead to drastic deterioration of the cost of the
schedule.
Best solution Vs. Number of operations
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
50 100 250 625 1500 3000
Operations
Bes
t Sol
utio
n P
erce
ntag
e
10 schedules
20schedules
50 schedules
100 schedules
68
Figure 32 Total cost Vs ending condition Figure 33 shows the influence of the ending condition on performed operations. The
maximum possible number of operations is 100. Using a very strict ending condition
of 5, we observe that only 11 operations are executed. Using an ending condition of
30, this amount increases to 65 which is still significantly below 100.
Figure 33 Performed operations Vs ending condition
We come to the conclusion that, for a strict ending condition, the gain in number of
performed operations can be significant. Figure 33 shows the number of performed
operations going down from 100 to 11, which is a decrease of 89%.
Total cost Vs ending condition
388
390
392
394
396
398
400
402
5 10 20 30
Ending condition
Tot
al C
ost
Total Cost
Performed operations Vs ending condition
10
20
30
40
50
60
70
5 10 20 30
Ending condition
Per
form
ed O
pera
tions
Performed Operations
69
7. Conclusions and further research The first important conclusion is that a basic genetic algorithm with limited complexity
is able to handle and manage the integrated project scheduling and project staffing
problem. With a total cost expectation of maximum maximorum 7% higher than the
lower bound benchmark, this GA proves it is able to handle this problem.
The power of the resource levelling objective, as an approximation for the total cost
has proven to be effective. The alternative GA framework, i.e. framework2 which
heavily relies on that premise does not perform drastically worse than the original
framework (except for AN1). The observation that the AVGSQDEV is a good
approximation for the total cost is, especially when keeping the project length
constant, is confirmed.
Framework1 always yields better solutions than framework2 but framework2
outperforms framework1 drastically in computational effort.
Extending the classical GA algoritm with local optimization provides a siginificant
boost to the quality of the resulting schedule. While the cross-over operator guides
the population to better solutions steadily, a local search optimization is actively and
aggressively going to look for better solutions.
Another important conclusion is the existence of a time-cost trade-off. Knowing that
this exists and being able to predict the location of the efficient frontier, it is possible
to optimize the expected total cost in function of the computational time one is willing
to spend.
To conclude, we can install intelligent ending conditions. These conditions will never
improve the quality of the solutions since they are an extra constraint on the
execution time spent. However smart ending conditions can determine when that
algorithm no longer makes any progression and thus decide to stop earlier in order to
save computational effort.
70
Further research topics, related to this thesis that should deserve consideration
include a more in-depth analysis of the interdependencies. I conducted research on
different GA methods in a one- or two-dimensional way. Statistical analysis is
necessary to discover more complex interdependencies between the applied
methods.
Another interesting topic for further investigation is the application of hyperheuristics.
This kind of heuristics will, depending on the kind of problem or data it receives, alter
its way of working which fits the problem the best. Applied to the project scheduling
problem, this could mean that characteristics of the activity network are measured
and based upon these characteristics, different methods in the GA are applied.
Examples of these characteristics could be the size of the network, the Serial parallel
indicator and the activity distribution indicator. (Vanhoucke et al., 2008)
Since the local optimization stage holds the most significant value for the complete
algorithm, more research could be done on that in the sense that more complex and
intelligent local search methods should be tested. Large neighborhood search or very
large neighborhood search methods could be interesting options to consider.
An interesting topic to further investigate would be to check the robustness of the
presented algorithm. What happens to the solution quality when the problem
definition is altered slightly? Is the GA still appropriate when executing it on datasets
with larger amount of activities in its network? Framework1 is probably more robust
towards changes in the problem since it is not able to, in the course of the execution
of the algorithm, worsen the resulting solution. Framework2 is able to worsen
throughout the execution of the algorithm and would thus be less robust for certain
changes.
A fifth and last point I would like to mention for further research is the application of
adaptive systems in the GA. This would mean that both used methods and the
parameters are auto-maintained by the algorithm. The algorithm is thus intelligent to
the extent that it can distinguish the different circumstances in which each method or
parameter is the most appropriate.
71
VI. References Ahuja R., Ergun Ö., Orlin J.B. and Punnen A.P. (2002) A survey of very largescale
neighborhood search techniques. Discrete Applied Mathematics, 123:
75–102
Association of Project Management January 1995 (version 2), Body of Knowledge
(BoK) Revised
Atkinson R. (1999), Project management: cost, time and quality, two best guesses
and a phenomenon, its time to accept other success criteria, Edition of book,
Great Britain: Elsevier Science Ltd and IPMA, p. 337-342.
Bäck T. (1993) Optimal mutation rates in genetic search. in Proceedings of the Fifth
International Conference on Genetic Algorithms. pp. 2-8.
Brucker P., Drexl A., Möhring R., Neumann K. and Pesch E. (1999)
Resourceconstrained project scheduling: notation, classi_cation, models, and
methods. Euro-pean Journal of Operational Research, 112:3-41.
Burgess A. R., Killebrew, J. B., (1962). “Variation in Activity Level on a
Cyclic Arrow Diagram”, Industrial Engineering, March-April, pp. 76-83.
Cottrell W. (1999) Simplified program evaluation and review technique (PERT). J.
Constr. Eng. Manage. , 125 (1), 16–22
Dawson C., Dawson R. (1995) Generalised activity-on-the-node networks for
managing uncertainty in projects. International Journal of Project
Management, 13, pp. 353–362
De Jong K. (1975) An analysis of the behavior of a class of genetic adaptive
systems, Dept. Comput. Sci., Univ. Michigan
72
Demeulemeester E. (1995) Minimizing resource availability costs in time-limited
Project networks. Management Science, Vol. 41, No. 10, 1590-1598
Elmaghraby S.E., (1977) Activity networks - Project planning and control by network
models, Wiley Interscience, New York.
Elmaghraby S. (1995). Activity nets: A guided tour through some recent
developments. European Journal of Operational Research, 82:383-408.
Eshelman L., Schaffer J.D. (1992). Real-Coded Genetic Algorithms and
IntervalSchemata. In L Darrel Whitley (ed), Foundations of Genetic
Algorithms 2. San Mateo, CA, Morgan Kaufmann Publishers.
Guldemond T., Hurink J., Paulus J., and Schutten J. (2008). Time-constrained
project scheduling. Journal of Scheduling, 11:137-148.
Hartmann S. and Briskorn D. (2010). A survey of variants and extensions of the
resource-constrained project scheduling problem. European Journal of
Operational Research, 207:1-15.
Herroelen W., Demeulemeester E. and De Reyck B., (1997) Resource-constrained
project scheduling – A survey of recent developments, Computers and
Operations Research, 25 (4), 279-302
Herroelen W., de Reyck B., Demeulemeester E. (1998) Resource-constrained
project scheduling: A survey of recent developments, Computers and
Operations Research 25 (4) 279–302.
Herroelen W., Demeulemeester E. and De Reyck B., (1999) An integrated
classification scheme for resource scheduling. DTEW Research Report 9905,
1-16
Holland J.R. (1975). Adaptation in natural and artificial systems. Ann Arbor, MI:
University of Michigan Press.
73
Icmeli 0., Erengiic S. and Zappe I. (1993) Project scheduling problems: a survey,
International Journal of Production and Operations Management, 13, 80-91.
Lin W., Lee W., Hong T. (2003) Adapting crossover and mutation rates in genetic
algorithms J Info Sci Eng, 19 (5), pp. 889–903
Luke S., Spector L. (1998) A revised comparison of crossover and mutation in
genetic programming, Proc. 3rd Annual Genetic Programming Conf., pp.208
213 1998
Maenhout B. and Vanhoucke M. (2010). Branching strategies in a branch-and-price
approach for a multiple objective nurse scheduling problem. Journal of
Scheduling, 13:77-93.
Maenhout B. and Vanhoucke M. (2014). An exact algorithm for an integrated project
staffing problem with a homogeneous workforce. Working paper
Magalhães J., Mendes A. (2013) Comparative Study of Crossover Operators for
Genetic Algorithms to Solve the Job Shop scheduling Problem, WSEAS
Transactions on computers, Vol. 12, No. 4, pp. 164-173.
Malcolm D., Roseboom J., Clark C. and Frazar W. (1959) Application of a Technique
for Research and Development Program Evaluation. Operations Research,
Vol. 7, pp. 646--669.
Muller L (2009) An adaptive large neighborhood search algorithm for the
Resourceconstrained project scheduling problem.
InMIC 2009: The VIII Metaheuristics International Conference, 2009.
Neumann K., Zimmermann J. (1999) Methods for resource-constrained project
scheduling problem with regular and nonregular objective functions and
schedule-dependent time windows, in: Weglarz [193], pp. 261–288.
74
Noever D. and Baskaran S. (1992) Steady State vs. Generational Genetic
Algorithms: A Comparison of Time Complexity and Convergence Properties.
Santa Fe Institute preprint series, 92-07-032.
Palpant M., Artigues C. C. and Michelon P. (2004) LSSPER: Solving the
Resourceconstrained project scheduling problem with large neighbourhood
search. Annals of Operations Research, 131:237–257, 2004.
Pisinger D., Ropke S. (2010) Large neighborhood search, Handbook of
Metaheuristics of International Series in Operations Research & Management
Science, vol. 146Springer, Boston, pp. 399–419
Project Management Institute (2004) A Guide to the Project Management Body of
Knowledge: PMBOK® Guide, 3rd Edition. Newtown Square, Pennsylvania,
Project Management Institute, p. 5.
Ropke S. and Pisinger D. (2006) A unified heuristic for a large class of vehicle routing
problems with backhauls.
European Journal of Operational Research, 171:750–775, 2006.
Sabuncuoglu I., Lejmi T. (1999) Scheduling for non regular performance measure
under the due window approach. Omega - International Journal of
Management Science, vol. 27, pp. 555-568
Sastry K., Goldberg D. (2001) Modeling tournament selection with replacement
using apparent added noise. Intelligent Engineering Systems Through
Artificial Neural Networks, vol. 11, pp.129 -134
Shaw P. (1998) Using constraint programming and local search methods to solve
vehicle routing problems. In CP-98 (Fourth International Conference on
Principles and Practice of Constraint Programming), volume 1520 of Lecture
Notes in Computer Science, pages 417–431, 1998.
75
Sivaraj R., Ravichandran T. (2011) A review of selection methods in genetic
algorithm, Int. J. Eng. Sci. Tech., 3, p. 3792
Spears W., Anand V., Ras Z. and Zemankova M (1991) A study of crossover
operators in genetic programming, Proc. 6th Int. Symp. Methodologies for
Intelligent Systems (ISMIS',91), pp.409 -418 :Springer-Verlag
Syswerda G. (1991) A Study of Reproduction in Generational and Steady-State
Genetic Algorithms. Rawlins, G.J.E., Foundations of genetic algorithms. San
Mateo, CA:Morgan Kaufmann Publishers.
Takahashi M., Kita H. (2001) A Crossover Operator Using Independent Component
Analysis for Real-Coded Genetic Algorithm, in Proceedings of the 2001
Congress on Evolutionary Computation, pp. 643-649
Turner J., Rodney (1993) The handbook of project-based management. McGraw-Hill,
London, 540p.
Vanhoucke M., Coelho J., Debels D., Maenhout B., Tavares L.V. (2008)
An evaluation of the adequacy of project network generators with
systematically sampled networks European Journal of Operational Research,
187 , pp. 511–524
Wall B. (1996) A Genetic Algorithm for Resource Constrained Scheduling, PhD
Thesis, Department of Mechanical Engineering, Massachusetts Institute of
Technology, USA.
Whitley D. (1989) The GENITOR algorithm and selective pressure. Proceedings of
the Third International Conference on Genetic Algorithms, pp. 116–121.
Morgan Kaufmann, San Mateo, CA.
78
Appendix B: Method elimination in basic cycles
AN1 Framework1
Stage → Initialization Selection Operation Local Optimization Reinsert condition Doubles
Population management Cost
Phase ↓ I1a I1b I1c I2a I2b I2c I3a I3b I3c I3d I4 S1 S2 S3 S4 C1 C2 C3 C4 C5 C6 L1 L2 L3 L4 R1 R2 R3 R4 R5 R6 N Y P1 P2 P3 Base Case x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 400,10 1st Exclusion x x x x x x x x x x x x x x x x x x x x x 368,73 2nd Exclusion x x x x x x x x x x x x x x 363,96 Best X x x x x x x x x x x x 358,23 Best x x x x x x x 355,25
AN1 Framework2
Stage → Initialization Selection Operation Local Optimization Reinsert condition Doubles
Population management Cost
Phase ↓ I1a I1b I1c I2a I2b I2c I3a I3b I3c I3d I4 S1 S2 S3 S4 C1 C2 C3 C4 C5 C6 L1 L2 L3 L4 R1 R2 R3 R4 R5 R6 N Y P1 P2 P3 Base Case x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 400,67 1st Exclusion x x x x x x x x x x x x x x x x x x x 378,34 2nd Exclusion x x x x x x x x x x x x x 371,36 Best X x x x x x x x x x x 366,32 Best x x x x x x x 365,67
AN2 Framework1
Stage → Initialization Selection Operation Local Optimization Reinsert condition Doubles
Population management Cost
Phase ↓ I1a I1b I1c I2a I2b I2c I3a I3b I3c I3d I4 S1 S2 S3 S4 C1 C2 C3 C4 C5 C6 L1 L2 L3 L4 R1 R2 R3 R4 R5 R6 N Y P1 P2 P3 Base Case x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 402,06 1st Exclusion x x x x x x x x x x x x x x x x x x x 391,04 2nd Exclusion x x x x x x x x x x x x x x 388,91 Best X x x x x x x x x x x x 386,74 Best x x x x x x x 385,34
79
AN2 Framework2
Stage → Initialization Selection Operation Local Optimization Reinsert condition Doubles
Population management Cost
Phase ↓ I1a I1b I1c I2a I2b I2c I3a I3b I3c I3d I4 S1 S2 S3 S4 C1 C2 C3 C4 C5 C6 L1 L2 L3 L4 R1 R2 R3 R4 R5 R6 N Y P1 P2 P3 Base Case x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 403,51 1st Exclusion x x x x x x x x x x x x x x x x x x x x x x x x 389,67 2nd Exclusion x x x x x x x x x x x x x x x x x x x x x 389,09 Best X x x x x x x x x x x 387,03 Best x x x x x x x 386,29
AN3 Framework1
Stage → Initialization Selection Operation Local Optimization Reinsert condition Doubles
Population management Cost
Phase ↓ I1a I1b I1c I2a I2b I2c I3a I3b I3c I3d I4 S1 S2 S3 S4 C1 C2 C3 C4 C5 C6 L1 L2 L3 L4 R1 R2 R3 R4 R5 R6 N Y P1 P2 P3 Base Case x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 399,82 1st Exclusion x x x x x x x x x x x x x x x x x x x x x 391,16 2nd Exclusion x x x x x x x x x x x x x x x x 386,60 Best X x x x x x x x x x x 384,42 Best x x x x x x 384,29
AN3 Framework2
Stage → Initialization Selection Operation Local Optimization Reinsert condition Doubles
Population management Cost
Phase ↓ I1a I1b I1c I2a I2b I2c I3a I3b I3c I3d I4 S1 S2 S3 S4 C1 C2 C3 C4 C5 C6 L1 L2 L3 L4 R1 R2 R3 R4 R5 R6 N Y P1 P2 P3 Base Case x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 404,79 1st Exclusion x x x x x x x x x x x x x x x x x x x x 388,14 2nd Exclusion x x x x x x x x x x x x x x 388,19 Best X x x x x x x x x x x 387,83 Best x x x x x x x 386,00
80
Appendix C: Cost evolutions of basic cycles
AN1 Framework 1
Phase Average Min Max Base Case 400,1 355 554 1st Exclusion 368,73 351 390 2nd Exclusion 363,96 354 379 Best X 358,23 352 374 Best 355,25 352 359
AN1 Framework 2
Phase Average Min Max Base Case 400,67 361 414 1st Exclusion 378,34 352 403 2nd Exclusion 371,36 355 405 Best X 366,32 355 396 Best 365,67 355 382
Total Cost evolution
350
400
450
500
550
Base Case 1stExclusion
2ndExclusion
Best X Best
Phase
Tot
al C
ost
Average
Min
Max
81
AN2 Framework 1
Phase Average Min Max Base Case 402,06 383 450 1st Exclusion 391,04 380 410 2nd Exclusion 388,91 378 401 Best X 386,74 373 405 Best 385,34 373 400
Total Cost evolution
350
360
370
380
390
400
410
Base Case 1stExclusion
2ndExclusion
Best X Best
Phase
Tot
al C
ost
Average
Min
Max
Total Cost evolution
370
380
390
400
410
420
430
440
450
460
Base Case 1stExclusion
2ndExclusion
Best X Best
Phase
Tot
al C
ost
Average
Min
Max
82
AN2 Framework 2
Phase Average Min Max Base Case 403,51 383 458 1st Exclusion 389,67 373 416 2nd Exclusion 389,09 377 414 Best X 387,03 378 402 Best 386,29 378 403
AN3 Framework 1
Phase Average Min Max Base Case 399,82 382 439 1st Exclusion 391,16 382 404 2nd Exclusion 386,6 380 394 Best X 384,42 378 392 Best 384,29 381 391
Total Cost evolution
370
380
390
400
410
420
430
440
450
460
Base Case 1stExclusion
2ndExclusion
Best X Best
Phase
Tot
al C
ost
Average
Min
Max
83
AN3 Framework 2
Phase Average Min Max Base Case 404,79 383 514 1st Exclusion 388,14 378 404 2nd Exclusion 388,19 378 400 Best X 387,83 378 400 Best 386 382 390
Total Cost evolution
375
385
395
405
415
425
435
Base Case 1stExclusion
2ndExclusion
Best X Best
Phase
Tot
al C
ost
Average
Min
Max
Total Cost evolution
375
395
415
435
455
475
495
515
Base Case 1stExclusion
2ndExclusion
Best X Best
Phase
Tot
al C
ost
Average
Min
Max
84
Appendix D: Observation link AVGSQDEV – Total cost
All project lengths
Project length = 8
Project Length =11
Average Squared deviation - Objective
170
190
210
230
250
270
0 50 100 150 200 250 300
Average Squared deviation
Obj
ectiv
e
Average Squared deviation - Objective
170
190
210
230
250
270
0 50 100 150 200 250 300
Average Squared deviation
Obj
ectiv
e
Average Squared deviation - Objective
170
190
210
230
250
270
0 20 40 60 80 100 120 140 160 180 200
Average Squared deviation
Obj
ectiv
e
85
Project Length = 14
Project Length = 17
Project Length = 20
Average Squared deviation - Objective
170
190
210
230
250
270
0 20 40 60 80 100 120 140
Average Squared deviation
Obj
ectiv
e
Average Squared deviation - Objective
170
190
210
230
250
270
0 10 20 30 40 50 60 70 80 90 100
Average Squared deviation
Obj
ectiv
e
Average Squared deviation - Objective
170
190
210
230
250
270
0 10 20 30 40 50 60 70 80
Average Squared deviation
Obj
ectiv
e