Experimental Design in Game Testing
Transcript of Experimental Design in Game Testing
Rochester Institute of Technology Rochester Institute of Technology
RIT Scholar Works RIT Scholar Works
Theses
5-19-2016
Experimental Design in Game Testing Experimental Design in Game Testing
Bhargava Rohit Sagi [email protected]
Follow this and additional works at: https://scholarworks.rit.edu/theses
Recommended Citation Recommended Citation Sagi, Bhargava Rohit, "Experimental Design in Game Testing" (2016). Thesis. Rochester Institute of Technology. Accessed from
This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected].
Rochester Institute of Technology
Experimental Design in Game Testing
A Thesis submitted in partial fulfillment of the
requirements for the degree of
Master of Science in Industrial and Systems Engineering in the
Department of Industrial & Systems Engineering
Kate Gleason College of Engineering
by
Bhargava Rohit Sagi
May 19, 2016
2
DEPARTMENT OF INDUSTRIAL AND SYSTEMS ENGINEERING
KATE GLEASON COLLEGE OF ENGINEERING
ROCHESTER INSTITUTE OF TECHNOLOGY
ROCHESTER, NEW YORK
CERTIFICATE OF APPROVAL
M.S. DEGREE THESIS
The M.S. Degree Thesis of Bhargava Rohit Sagi
has been examined and approved by the
thesis committee as satisfactory for the
thesis requirement for the
Master of Science degree
Approved by:
____________________________________ Dr. Rachel Silvestrini(Thesis Advisor), Associate Professor, Industrial and Systems Engineering
____________________________________
Dr. Brian K. Thorn, Associate Professor, Industrial and Systems Engineering
____________________________________ Dr. Jessica Bayliss, Associate Professor, Interactive Games and Media
____________________________________
Dr. David Schwartz, Associate Professor, Interactive Games and Media
3
Abstract
The gaming industry has been on constant rise over the last few years. Companies invest huge amounts
of money for the release of their games. A part of this money is invested in testing the games. Current
game testing methods include manual execution of pre-written test cases in the game. Each test case may
or may not result in a bug. In a game, a bug is said to occur when the game does not behave according to
its intended design. The process of writing the test cases to test games requires standardization. We
believe that this standardization can be achieved by implementing experimental design to video game
testing. In this thesis, we discuss the implementation of combinatorial testing to test games.
Combinatorial testing is a method of experimental design that is used to generate test cases and is
primarily used for commercial software testing. In addition to the discussion of the implementation of
combinatorial testing techniques in video game testing, we present a method for finding combinations
resulting in video game bugs.
4
Contents
Abstract ......................................................................................................................................................... 5
1 Introduction ................................................................................................................................................ 5
2 Literature Review ....................................................................................................................................... 7
2.1 Game testing techniques ..................................................................................................................... 7
2.2 Combinatorial testing .......................................................................................................................... 8
2.2.1 Overview of combinatorial testing ............................................................................................... 8
2.2.2 Tools for generating combinatorial tests: ................................................................................... 10
2.2.3 Identifying failure inducing combinations ................................................................................. 11
3. Methodology ........................................................................................................................................... 13
3.1 Combinatorial testing for video game debugging ............................................................................. 13
3.2 Classification of factors .................................................................................................................... 15
3.3 Methodology for finding bugs: ......................................................................................................... 17
3.3.1 Sorting test cases to find Failure Inducing Combination ........................................................... 17
4. Results: .................................................................................................................................................... 21
4.1 Combinatorial testing in video games ............................................................................................... 21
4.1.1 Previously tested games ............................................................................................................. 21
4.1.2 Games not previously tested: ..................................................................................................... 25
4.2 Identifying failure inducing combinations ........................................................................................ 29
4.2.1 Notional example with Grand Prix............................................................................................. 29
4.2.2 Example from the game Erator .................................................................................................. 30
5. Discussion ............................................................................................................................................... 33
5.1 Conclusion ........................................................................................................................................ 33
5.2 Future work ....................................................................................................................................... 34
Bibliography ............................................................................................................................................... 35
Appendix I: ................................................................................................................................................. 35
5
Tables:
Table 1 Coverage 1 design ............................................................................................................................ 9
Table 2 Coverage 2 design ............................................................................................................................ 9
Table 3 Tools meeting our requirements .................................................................................................... 11
Table 4 Factors and levels of Grand Prix .................................................................................................... 22
Table 5 Number of test cases per coverage for Grand Prix ......................................................................... 22
Table 6 Coverage 1 test set for Grand Prix ................................................................................................. 23
Table 7 Breakdown of bugs into factor for Grand Prix ............................................................................... 24
Table 8 Grand Prix results .......................................................................................................................... 25
Table 9 Factors and levels of game Erator .................................................................................................. 26
Table 10 Number of test case per coverage for Erator ................................................................................ 26
Table 11 Coverage Vs Bugs in Erator ......................................................................................................... 27
Table 12 Factors and levels of Utopia ......................................................................................................... 27
Table 13 Number of test cases per coverage for Utopia ............................................................................. 28
Table 14 Coverage Vs Bugs in Utopia ........................................................................................................ 28
Table 15 Reference test case ....................................................................................................................... 29
Table 16 Test case with different factor levels ........................................................................................... 29
Table 17 Test case with same Kart ............................................................................................................. 30
Table 18 Test case with same Racer ........................................................................................................... 30
Table 19 Test case with same Race ............................................................................................................. 30
Table 20 Test case with same World .......................................................................................................... 30
Table 21 Test case with same Racer and Kart ............................................................................................ 30
Table 22 Test case with same Race and Kart .............................................................................................. 30
Table 23 Reference test case ....................................................................................................................... 31
Table 24 Test case with same Action .......................................................................................................... 31
Table 25 Test case with same dialogue box ................................................................................................ 32
6
Figures:
Figure 1 Game testing cycle .......................................................................................................................... 6
Figure 2 Coverage Vs % of errors ............................................................................................................... 14
Figure 3 Different types of testing performed using combinatorial testing ................................................ 16
Figure 4 Flow chart for finding FCC for a two factor interaction ............................................................... 19
Figure 5 Coverage Vs Number of test cases for Grand Prix ....................................................................... 23
7 | P a g e
1 Introduction
The gaming industry has been on a rise over the last decade. It is predicted that the net value of the
gaming industry will increase from $67 billion in 2012 to $82 billion by 2017[1]. A part of a gaming
company's revenue is used for game testing, to remove or fix software defects. In the US, more than $22.2
billion a year can be saved annually by implementing an improved infrastructure to enable more effective
identification and removal of software defects [2].
Gaming companies are typically divided into the following teams: Development, Production, Distribution
and QA. The development team is responsible for the design and development of games. They are also
responsible for fixing the bugs reported by the testers. The production team finances the development of
the game and is involved in all the monetary transactions. The distribution team is involved in making the
game accessible to the users. They explore the distribution channels for releasing the game into the
market and online stores. The QA's task is to identify the bugs in the game and log them into the
company's specific database for the developers to access and fix them.
Game testing is a quality control process [3]. The main aim of game testing is to find bugs in the game
software. The Quality Assurance (QA) team is responsible for identifying as many bugs as possible that
ruin the gaming experience for the end user. A bug in a game is when the game software does not behave
according to your intended design. The method adopted by QA to test games is called manual testing. The
QA executes pre-written checklists, which cover various aspects of the game. These checklists are
intended to include test cases that cover all the scenarios in the game, thereby making sure that each area of
the game is tested.
Game testing is similar to software testing in many aspects. A standard game testing cycle is shown in
Figure 1. When a game is developed, the development lead and the QA lead develop test cases. The game
testers then execute test cases to find bugs. They report these bugs to the developers, who then fix the
bugs. After the bug fixes are completed, an updated version of the game is sent back to the testers for
testing with a new set of test cases. This cycles repeats until the production team is satisfied with the
number of bugs fixed and the quality of the game.
8 | P a g e
Figure 1 Game testing cycle
Game testing is time and labor intensive. It is important to spend resources in order to provide a defect
free (or nearly defect free) game, but it is impossible to exhaustively test all scenarios. Therefore a balance
between resource spending and finding bugs must be achieved.
There are examples of incidents in popular games where the testing for a game was not comprehensive
enough. In Halo: The Master Chief Collection on Xbox One, the multiplayer mode was inaccessible to the
users. The users could not access a single match of any kind, encountering various error messages or
endless queues, and even one full game crash to the Xbox One dashboard [4]. Also, Sega’s Sonic Boom,
the latest in its long-running Sonic the Hedgehog series, was shipped with bugs. One serious bug was that
the user could jump infinitely into the air by pausing and un-pausing the game, thus completing the game
in under an hour [5]. Such incidents lower the quality of the game and can be detrimental to a company
and its reputation.
Serious bugs, such as these examples and their cost suggest that there is a necessity for a more thorough
game testing process. The test cases that are executed by the game tester are written by the Development
Team Lead or the QA Team Lead. A better and a more comprehensive method to generate test cases may
not guarantee a game completely free of bugs, but it will help increase the number of bugs found,
decrease the time required to do so and thus decrease the overall cost of testing.
One way to improve the game testing infrastructure is through the use of experimental design.
Experimental design is a procedure for planning experiments so that the data can be analyzed to yield
valid conclusions [6]. In the literature, there are no applications of experimental design to manual game
testing. However, there are applications in software testing, which in many aspects is similar to game
Generate test cases
Report bugs
Develop game / Fix
bugs
Find bugs
Test game
9 | P a g e
testing. In this thesis, experimental design and analysis methods are applied to the manual game testing
process. Specifically, we show that combinatorial testing is an effective approach for generating test cases
to test games.
2 Literature Review
The literature review has been divided into two sections. The first section discusses current game testing
techniques. The second section gives insight into combinatorial testing and its application in the software
industry.
2.1 Game testing techniques
The game development cycle, on any platform, has stages which are known as milestones [7]. The
milestones indicate that the game is at a particular level of development. The milestones, generally, are
first playable, alpha stage, beta stage, gold master and code release. The first playable version is similar to
that of a demo version, where the feel of the game is observed and assessed. In the alpha stage, the game
is said to be feature complete, i.e. all the features that the game is intended to have are present. This is
when the testing cycle begins. In this stage, the developers do not make any changes to the features of the
game but only fix bugs. The beta stage represents the feature complete and mostly bug free game. After
the beta stage, a gold master version of the game is released. Ideally, there should be no bugs at this stage.
Then the game is code released into the market.
There are two types of game testing: automated and manual. Runtime monitoring of video games is a
method of automated bug finding [8]. It is called white box testing, where a knowledge of the game's
source code is necessary. These kind of testing methods may be effective but are not simple. Additionally
in runtime monitoring, the rules to be specified for monitors for finding bugs increase with the size of the
game. No amount of rules give a complete enumeration of the expected behavior of the game [8].
In manual testing of a video game, the tester is unaware of the game's source code. This type of testing is
called black box testing. The tester executes the test cases and observes its effect on the game. If a test
case results in a bug, the tester reports the bug and the developers fix it. As a result, manual testing of a
video game is a simpler process than runtime monitoring.
There are various types of manual game testing techniques that can be used to identify bugs in any given
game. Combinatorial testing, test flow diagrams, cleanroom testing, test trees, play testing and adhoc
testing are a few of the examples [7]. Each of these methods can be used to generate a set of test cases. In
a previous research study of game testing methods [7], combinatorial testing is suggested to have the
highest efficiency and reduce cost and resources for testing a game. However, there seem to be no papers
10 | P a g e
that discuss the application of combinatorial testing to manual game testing. To facilitate the process of
generating test cases, experimental design is used in this thesis.
2.2 Combinatorial testing
Combinatorial testing is a choice of experimental design that is used to generate the test cases. It is similar
to a fractionated factorial design. Combinatorial testing is further explained in the following section. This
section is divided into three subsections. The first sub-section gives an overview of combinatorial testing
with an example of the working of combinatorial testing in a simple software application. The second
subsection discusses that tools that can be used to general combinatorial test cases. The third subsection
explains the literature that helps us identify the combination of factors that result in a bug.
2.2.1 Overview of combinatorial testing
While there is no literature that we could find that discusses the implementation of combinatorial testing to
manual testing of video games, literature exists on the applications of combinatorial testing to test
software. This section gives a brief introduction to combinatorial testing and its applications in testing
software.
Combinatorial testing uses covering arrays to generate test cases. A covering array can be denoted as
CA(t,k,v) [9]. ‘t’ stand for the strength of the test case. It is otherwise known as coverage. We explain the
concept of coverage with an example in the later sections. ‘k’ indicates the number of levels of each
variable used to generate the array. ‘v’ represents the number of variables present in the covering array.
Consider a situation in which tests must be done to ensure that a software application can run on a
computer [10]. Let's assume the factors involved in the test are the operating system (Windows, Linux),
processor type (Intel, AMD) and IPv4 or IPv6 protocols. In a test with three factors, each factor at two
levels, a complete factorial design should consist of 2*2*2 = 8 runs.
Combinatorial testing is an alternative to a factorial design that provides a considerably less number of
runs. An essential part of generation of combinatorial test cases involves the concept of coverage.
Coverage is used to identify how well a test set covers the possible combination of a certain number of
factors. By varying the coverage of a test set, we generate varying number of test cases to test the
software. In other words, coverage is a measure of combination of levels between factors (otherwise
known as interaction). Table 1 shows the design for testing the software application example using
combinatorial testing with coverage 1. When coverage is 1, there are no factor interactions required. Each
level of the factor is present in the design irrespective of the combination with another factor, thus the
maximum number of test cases in a coverage 1 test set is equal to the highest number of factor levels of
11 | P a g e
all the factors in the design.
Table 1 Coverage 1 design
Operating System Processor Protocol
1 Windows Intel IPv4
2 Linux AMD IPv6
Table 2 shows the design with coverage 2. It has only four runs, which test the combination of every
component with every other component once. This is also known as pair-wise testing.
Table 2 Coverage 2 design
Operating System Processor Protocol
1 Windows Intel IPv4
2 Windows AMD IPv6
3 Linux Intel IPv6
4 Linux AMD IPv4
Note that in Table 2, combinations such as Operating System = Windows, Processor = Intel and Protocol
= IPv6 is not present in the design. This is because the coverage is 2 and not 3. When the coverage for this
design is increased to 3, we can test the above mentioned combination. Therefore, by varying coverage, it
is possible to test a software exhaustively. The choice of coverage plays an important role in determine the
effectiveness of a combinatorial test set. It determines the interaction between factors and thus results in
better fault location. However, this is at an expense.
As coverage is increased, the number of test cases can dramatically increase. Determining the appropriate
coverage is important. Previous studies [11] [12] on software failures involving large scale tests suggest
that all failures could be triggered by a maximum of 4-way to 6-way interactions. So, a coverage strength
between 3-6 is effective for finding more bugs in a software application.
2.2.2 Tools for generating combinatorial tests
To further understand the combinatorial testing techniques, we need to delve into how the test suites are
generated. As explained in [13], combinatorial test suites can be generated using two techniques:
Orthogonal Array (OA) and Covering Array (CA). For testing video games we believe that CAs are more
suited than OAs for the two reasons: First, game events have constraints. CAs allow the implementation
12 | P a g e
of constraints in developing a test suite whereas OAs do not. Second, our focus is to create test suites with
fewer test cases. Generation of combinatorial test suites with CAs result in fewer test cases than OAs.
In [14], a detailed analysis and comparison of 75 tools/algorithms for generation of combinatorial test
suites in given. Covering arrays generated using greedy techniques were found to be popular due to their
simplicity. Greedy techniques are a type of algorithm that are used to generate covering arrays. They
support large system configuration including constraints and higher strengths. Keeping the testing of
video games in mind, we came up with the following requirements for effective testing. First, we need a
tool that can generate a covering array with a maximum coverage strength of 6. Second, game events
require constraints, so the tool that we use should facilitate the implementation of constraints. Third, base
choice selection criteria can be necessary when testing games of relatively large sizes. Base choice allows
us to test particular sections of a game. A levels of a particular factor can be fixed as the base choice and
then test cases are generated. Fourth, mixed covering array strength could be useful to test the effect of
important factors such as interrupts in the game.
A list of tools that meet those requirements out of the 75 identified are presented in Table 3.
Table 3 Tools meeting our requirements
Software Coverage Constraints Base Choice Variable Strength Uniform Strength Availability
ACTS 6 Full support Yes Yes Yes Yes
tTuples 6 Forbidden Tuples No No Yes Yes
PICT 6 Full support Yes Yes Yes Yes
Intelligent Test Case Handler 6 Forbidden Tuples No Yes Yes Jenny <=8 Forbidden Tuples No No Yes Yes
Test Vector Generator 6 Full support No Yes Yes Yes
IPOD
6
No information
No
No
Yes
Algorithm
present in
ACTS MIPOG
11
No information
No
No
Yes
Modified
IPOG. Present
in ACTS ITTDG 12 No information No Yes Yes Algorithm
Harmony Search Strategy 14 No information No Yes No information No
Particle Swarm Test Generator 6 No information No Yes No information No
HSTCG 7 Full support No Yes No information No
Hexawise 6 Forbidden Tuples No Yes Yes Web based tool
PictMaster 6 Full support No No Yes Yes
Constraint handling is generally of two types – constraint solving based and forbidden tuples based [15].
Constraint solvers are used in constraint solving based approach and a test is valid if it satisfies the
constraint. Forbidden tuples identifies a set of combinations of factor levels that are forbidden and the
validity of the test case is established only if it does not contain those forbidden tuples [15].
Based on the requirements of game testing, we felt four software suites were the most suitable. They are
ACTS, PICT, Hexawise and Jenny. These software were downloaded and compared. They differ in terms
of usability. Of all the software, ACTS has a better user interface and the system creation process is also
13 | P a g e
simple. PICT requires you to write a small, uncomplicated code, which has to be uploaded to the PICT
website to generate the test cases. Hexawise has a decent user interface but the implementation of
constraints is difficult. Jenny works using the command prompt and is complicated when compared to the
other in terms of generating the test cases. Therefore, we proceeded with ACTS for generating
combinatorial test cases.
2.2.3 Identifying failure inducing combinations
To further assist the developers to make the game debugging process easier, we intended to develop a
technique to identify the combination of factors that lead to a bug. Bugs in a game are caused not only by
individual factors but also by an interaction between factors. It is important to identify the interaction of
factors causing a bug as it reduces the effort to fix the bug. We call the interaction causing a bug as Failure
Causing Combination (FCC). The amount of work done by the developers for bug fixes can be reduced
with effective ways of finding of failure inducing combinations.
From the literature, we found that there are two methods of fault detection in software testing: adaptive
method and non-adaptive method. In the adaptive method, the generation of test cases for fault location is
done after the execution of a set of test cases. The output from the executed test cases is used for further
generation of test cases for fault location. In the non-adaptive method, all test cases are executed
simultaneously.
A new fault characterization called the faulty interaction characterization (FIC) has been proposed in [16].
Additionally, a binary alternative (FIC_BS) to locate one failure causing interaction in a single failing test
case has also been introduced. This is a form of adaptive fault detection, where they use a single test case
for reference called the seed test case to generate further adaptive tests. The basic idea here is to
repeatedly compare the factor interactions in the reference test case to the set of parameters. Based on the
result of this comparison, the factor levels in the reference test case are exchanged with that in the set of
parameters. The main drawback with FIC and FIC_BS is that they generate the fault locating test cases
based on a single failed test. This may result in higher number of test cases.
A technique to locate interaction faults in combinatorial testing, which has an iterative interaction fault
location strategy (IterAIFL) has been presented in [17]. In this technique, a complete test set is used for
the generation of new test cases. The combination of factor levels that cause a failure is called the minimal
failure causing schema (MFS). After the execution of a full test set, the test set is separated into two sets.
One set consists of all the failed tests and this set is believed to contain the MFSs. The authors then
formulate the schema sets by subtracting all the schemas that are common between the two sets. The
remaining schema are then used to generate new test cases. A schema set is a set of all the combination of
14 | P a g e
factors in a test case. If a test case is (A, B, C, D) then (A, -, -, -) is a schema of the schema set. In this
particular example, there can be 24 schemas. The process of subtraction of schema sets is repeated till all
the MFSs are found. This process appears to generate higher test cases because for every iteration in
IterAIFL, new test cases are generated resulting higher number of test cases. Additionally, there are many
assumptions made to apply the IterAIFL strategy. One of them is that the test set cannot have constraints.
Since game events require constraints, the implementation of IterAIFL strategy to game testing is not
possible.
An approach to identifying failure inducing combinations based suspicious combinations has been
explained in [18]. The first step is to rank suspicious combinations and then generate tests based on this
ranking. The next step is called reduction. In the reduction step, the final analysis of all the ranking takes
places and the lower ranked suspicions are rejected. This process keeps repeating till a stopping condition
is satisfied. The ranking of suspicious combinations is a combination of three categories – suspiciousness
of component, suspiciousness of combinations and suspiciousness of environment. The test cases are
generated based on the few ranks done in the ranking step. As this process is iterated, the suspiciousness
combination of size 1 is achieved, which indicates that all the combinations in that set cause failure. Due
to complexity, we do not want to use this strategy for our thesis. Our technique for finding the failure
causing combinations (FCC) has been explained in the methodology section.
3. Methodology
The proposed work aims to accomplish two main research objectives:
1) Apply experimental design into a new area of research (specifically game testing).
2) Create a methodology for finding the combination of factors that cause bugs in a game.
The first objective can be broken down into two goals. The first goal is to develop a framework for helping
the game tester implement experimental design in games. The second goal is to illustrate the
implementation of experimental design approach using available games. In this section, we present the
methodology associated with accomplishing the research objectives.
3.1 Combinatorial testing for video game debugging
As the number of factors and levels increases, the number of test cases also increases. Also, with an
increase in coverage, the total number of errors increase. 1-way coverage test set generates 67% of the
bugs. 93% of the bugs are generated with 2-way coverage. 98% with 3-way and reaching 100% between
4-way and 6-way coverage. Figure 2 shows the graphical representation of coverage vs. cumulative
15 | P a g e
percentage of bugs found.
Figure 2 Coverage Vs % of errors
For generating combinatorial test cases to find bugs in video games, we propose using the ACTS software.
ACTS stand for Advanced Combinatorial Testing Software and was developed by National Institute of
Standards and Technology (NIST). ACTS is a software that generates t-way combinatorial test sets with
constraints and variable-strength relations.
The first step in ACTS to generate combinatorial test sets is to identify the parameters and their levels for
the System Under Test (SUT). Our SUT will be the video game to undergo testing. We can have four
different types of parameters – Boolean, Enum, Number and Range. A Boolean parameter has two levels
- True or false. Enum is the type of parameter where we can specify categorical factors. An example of
Enum parameter can be the different type of game modes that can be used for testing. The number
parameter is used to specify if our factors are numerical. An example of a numerical parameter is the
number of times the tester needs to access a particular factor level in a test case. For instance, jumping 3
times before performing another action in the game. A range parameter is to specify a range of values that
the parameter can take. The speed of the car in a racing game can be an example of the range parameter.
In this case we can specify the range to be 60 – 100 mph. A parameter can have any number of levels.
After specifying the parameters and their levels, we move on to identifying the constraints between
parameters. ACTS supports Boolean, relational and arithmetic operators for constraints. An example of a
constraint in ACTS can be (Vehicle = “Car”) => (Speed = 100). This constraint makes sure than when the
factor Vehicle has the level ‘Car’ in a test case, then the factor Speed will only have the level 100.
To build a combinatorial test set, we go to the operations menu and click on ‘Build’. A new session
100
90
80
70
60
50
40
30
20
10
0
1 2 3 4 5 6
16 | P a g e
window pops up where we choose the algorithm to generate our test cases. The different types of
algorithms present in ACTS are IPOG, IPOG–F, IPOG-F2, IPOG-D and Base choice. IPOG, IPOG–F,
IPOG-F2 are recommended for smaller systems (less than 20 parameters and 10 levels of each factor on
average. IPOG–D is for bigger systems [19]. The generalization of IPO (Input-Parameter-Order) strategy
from pairwise testing to t-way testing results in the IPOG (Input-Parameter-Order-General) strategy [20].
For a system with t or more parameters, the IPOG strategy builds a t-way test set for the first t parameters,
extends the test set to build a t-way test set for the first t + 1 parameters, and then continues to extend the
test set until it builds a t-way test set for all the parameters. The extension of an existing t-way test set for
an additional parameter is done in two steps: horizontal growth, which extends each existing test by
adding one value for the new parameter; vertical growth, which adds new tests, if needed, to the test set
produced by horizontal growth[20].
In the session window in ACTS, we also have the option of selecting the strength of the test set, which is
the coverage. We can leave the constraint handling to default and randomize the don’t care values. While
generating combinatorial test cases, ACTS gives us a choice of randomizing the don’t care values. Don’t
care values are those factor levels which are not necessary to satisfy the coverage value. They are called
don’t care values because the coverage of the test set is achieved even if those factor levels are absent
from the test set or are arbitrarily chosen. By randomizing the don’t care values, we assign a factor level
to the test case. By doing this we just make the test case more sensible. When we click on the ‘Build’
option, a combinatorial test set with the selected strength/coverage is generated. ACTS lets us export this
test set into excel and csv formats.
3.2 Classification of factors
The factors that are used to generate combinatorial test cases can be divided into five different types
depending on the type of testing that is to be performed. For our methodology, factors can be categorized
into game behavior or game specific, interrupt based, language based, hardware/software based and first
part requirement based. Other types of testing such as soak testing, where the game is left running for
long periods of time without any user input, has different factors that can be used. Since soak testing is a
part of game logic, we are not introducing using these factors as a separate category.
Factors can be basically broken down into 5 categories as mentioned in Figure 3. If the factor is specific
to the game then it can be used for game logic or functionality testing. In this type of testing, the tester is
looking for bugs that are not complying with the game logic. For example, a character moving backwards
in a game when the forward key is pressed. Interrupt based factors are used for interrupt testing. This
testing is used to interrupt the normal functioning of the game using external factors. Receiving phone
calls or test messages on a cell phone while playing a game on a cell phone is an example of interrupt
17 | P a g e
testing. Games are generally released in different languages and the tester has to make sure that the
translation in different languages is accurate. Compatibility testing is done to assess the game’s
performance is similar on both high end and low end devices. Testing the same game with different levels
of RAM on a computer is a form of compatibility testing. To release a game into the market, it has to meet
with certain requirements. These requirements are specific to the platform on which the game is being
released. If a game is to be released on Apple devices, then Apple has a checklist to which the game should
comply. Compliance testing is done to check if the game is meeting all the requirements.
Figure 3 Different types of testing performed using combinatorial testing
The first part of our methodology is to identify the factors. It helps us to then categorize these factors
based on the testing that is to be performed. If we categorize these factors based on the type of testing,
then logging an issue and writing the bug report becomes much simpler. Also, by categorizing factors into
types of testing, we would know what kind of bugs to look for. For example, a game logic factor
resulting in a bug will be a crash, freeze, progression block, text or graphic issues.
18 | P a g e
Choosing the factor levels depends on the type of factor. Factor levels for game logic/functionality are
based on the game. They are specific to the game that is being tested and hence change from game to
game. Generally, the factor levels for game logic testing are the choices that the player makes in the game.
Localization testing is also game specific. The translations change based on the original test in English and
vary for games. Factor levels for interrupt testing factor remain the same irrespective of the game. For
computer games, the interrupt factor’s levels would be minimize the game, lock the screen and turn off
the computer when the game is running. These factor levels would be the same for all games.
Compatibility testing depends on the testers’ access to various devices that can be used for running the
game. The factor levels would be the different types of devices/configurations that the tester can use.
Compliance testing remains the same for all games. It is dependent only on the platform on which the
game is being released.
Constraints form a major part of generation of test cases using combinatorial testing. Game events
generally have constraints. ACTS lets the tester form the constraints for any system. For example, the
controls for a character in a game can be accessed only in the game. They cannot be accessed on the main
menu. Hence, a constraint specifying the use of the controls has to be applied to generate meaningful test
cases.
3.3 Methodology for finding bugs
Now that we have identified the method to generate test cases using ACTS software and the classification
of factors for different types of testing, we explain how combinatorial test cases are executed in the game.
All the factor levels mentioned in a particular test case are to be run in the game to see if they are resulting in
a bug. This way the game can be tested for varying levels of coverage.
3.3.1 Sorting test cases to find Failure Inducing Combination
As mentioned in the literature review, sorting tests for identifying the FCC can greatly reduce the effort of
developers in bug fixes. We used python to develop a code for sorting the combinatorial test cases. After
reviewing all the methods discussed in the literature, we came up with the idea of reordering the existing
test cases of a combinatorial test suite as opposed to using an algorithm to generate new test cases. The
reordering of the test cases can be done using the SOFOT (Simplified One Factor One Time) to identify
the factors or their combinations that cause bugs in a game.
The first step was to develop a logic for finding the FCC. When a bug is discovered, the tester logs it in an
online database for the developer to read it. This bug report follows a standard format. It consists of
information on how to reproduce the bugs. The steps to reproduce the bug can be traced backwards to
identify the FCC. The logic has been depicted in Figure 4.
19 | P a g e
The efficiency of our method for identifying the failure inducing combination highly depends on the type
of bug. All the bugs for any game can be classified into two types – a general bug or a specific bug. A bug
is general when it occurs at the factor “screens”, i.e. the bug is caused at all factor levels. On the other
hand, a specific bug occurs because of the combination of definite factor levels. This type of bug is specific
to the combination of the two factor levels. Figure 4 present the logic only for two factor interactions. Our
code can be used for more than two factor interactions.
20 | P a g e
Start
Execute test
Move to next test
case
No
Bug?
Yes
Run code with the
same 'screen'
It's a specific No
Bug?
Yes
Bug
?
Run code with same
penultimate factor
Yes
It's a general
No
Run code with same
Antepenultimate factor
Bug?
Yes
This is our FCC
No
This is our
FCC
Check for two factor
interaction
Bug?
Yes
No
Proceed to check for 3
factor interactions
End
Figure 4 Flow chart for finding FCC for a two factor interaction
20 | P a g e
Let the test case that results in a bug be called the reference test case. When this reference test case is
executed, we can see the screen at which the bug occurs. The next step would be to identify if the bug is a
general bug or specific bug. To reach this conclusion we first need to reorder our test cases in a way that
only the factor “screens” remains the same in the next test case. There are two ways we can proceed from
here. First, if the new test case results in a bug, we then reorder our test set again with the ‘screens’ factor
level remaining the same. If the next test case results in a bug then it is an indication that the bug is a
general bug. This process can be repeated a couple of times to see if the bug is occurring with every new
reordering of test cases. If all the iterations result in a bug then it a general bug that is caused at the
particular screen irrespective of any other factor interaction. Second, if the new test case does not create a
bug, it is an indication that it is a specific bug. It is being caused due to specific combination of factor
levels. At this point we do not know what factor level or a combination of factor levels is causing this bug.
It is necessary to test individual factors and a combination of different factors to determine the failure
inducing combination. We proceed to the penultimate step in the execution of the reference test case.
After checking each individual factor levels in this fashion, we proceed to check for interactions among
factors. We again start with the interaction between the penultimate factor level and the others. We can
continue to check all two-way interactions before proceeding to 3-way interactions. If at any point, in the
analysis, a test case results in a bug then the factor levels that this test case has in common with our
reference test case is the failure inducing combination.
It is to be noted that in a t-way combinatorial test set, it is possible to find the FCC which has a t-way
interaction. The reason is that there may not be enough test cases in a test set to determine the FCC if the
failure is caused by an interaction of factors that is greater than the coverage of the test set.
Bugs with one, two and three factor interactions were used to check the efficiency of our code. The code
seems to function fairly well with the one factor and two factor interactions. It can be seen that as the
number of factor interaction increase, the number of steps required to find a bug also increase.
Additionally, a general formula for the number of steps required to identify the failure inducing
combination for specific with our code is 2k and for general issues it is 2k-1, where k = number of factors
that resulted in the bug.
The python code for reordering our test cases to find the failure inducing combinations has been given in
Appendix I.
22 | P a g e
4. Results
The results have been divided into two section. The first section presents the comparison of the efficiency
combinatorial testing techniques with a game that has already been tested. It also presents the
implementation of combinatorial testing to two new games. The second section discusses the results of
the methodology for finding failure inducing combinations with an example from one of the games we
tested.
4.1 Combinatorial testing in video games
After the generation of test cases using the ACTS software, the test cases are then executed in the game.
All the factor levels present in a test case are executed in the game to see if they result in a bug. Depending
on the complexity of the game, the number of test cases can vary from a few hundreds to thousands. Also,
the complexity of the game dictates the time required to perform the testing. Here we present the results
of combinatorial testing on games.
4.1.1 Previously tested games
We present the results of combinatorial testing on the game Grand Prix in this section. Disney's XD Grand
Prix game developed by Workinman games, Rochester. This is a racing game where the user can race in
different modes. The choices in this game are the worlds, the races, the racers and the kart that the user
can select to play the game.
As discussed in the methodology section, the Grand Prix game has been broken down into factors and
their respective levels. This game consists of 10 factors with varying levels. We decided to generate test
cases are generated for game logic, interrupts and network testing for this game. We could not perform
compliance and compatibility testing as the game was running on Unity. The factors and their levels have
been mentioned in Table 4. The last column indicates the kind of testing that can be performed using that
particular factor.
Table 4 Factors and levels of Grand Prix
Factors
Levels
Number
Type of
factor
Worlds Grand Prix, Phineas and Ferb, Gravity Falls, Lab Rats, Kicking it, Mighty
Red, Star Wars, Rebels
7 Game
logic
Races Race, Elimination, Coin Challenge, Extreme Coin Challenge, Boost
Challenge, Extreme Boost Challenge, Missile Mania Challenge, Missile
Mania Xtreme
8 Game
logic
Racer Wander, Rob the Shark, Agent P, Bruce the Sumo, Phineas, Dipper, Lord
Hater, Randy Cunnigham, Waddles, Steve the Llama, Ezra, Chopper
12 Game
logic
Kart Mighty Med, Wasabi Dragster, Davenport SFC, Coolest Coaster, Nomirocket
Bus, Mystery Shack Cart, Silvia, Speeder Bike
8 Game
logic
Power-Ups Jump +1, Jump +2, Jump +3, Speed +1, Speed +2, Speed +3, Boost +1, Boost
+2,
Boost +3
9 Game
logic
23 | P a g e
Controls Left, Right, Jump, Deploy Power-Up, No input, Ignore 6 Game
logic Device
Orientation
Portrait, Landscape 2 Game
logic
Interrupts Call, Text, Lock, Minimize, Back, Force stop 6 Interrupt
Network 4G, Wi-Fi 2 Network
Screens Initial loading, World Select, Race Select, Racer Select, Kart Select, Pre
game loading, In game, Post game results, Pause/Quit, Post-game loading,
Options, Setting, About, Store
13 Game
logic
Table 5 shows the number of test cases for Grand Prix game for coverage 1-5. Due to the complexity of
the system and various constraints, ACTS takes around 10 minutes to generate the test set for coverage
5. Also, it could not generate the tests for coverage 6. The reason is that there are factors that have high
number of levels (close to 10 and one factor over 10). The algorithms IPOG-F and IPOG-F2 do not
support for factors over 10 levels. Even though the algorithm IPOG-D is compatible with factors with
higher levels, it does not support constraints. So the number of test cases for coverage 5 is non-
constrained.
Table 5 Number of test cases per coverage for Grand Prix
Coverage
Coverage 1
Coverage 2
Coverage 3
Coverage 4
Coverage 5
No of test
cases 16 193 2196 20675 222,032
As discussed in the methodology section, the number of test cases increase with an increase in coverage.
We present a graph to indicate the difference between the number of test cases with and without coverage
vs. the number of test cases generated for this game. Figure 5 shows that when constraints are
implemented for the game Grand Prix, the number of test cases increase.
24 | P a g e
Figure 5 Coverage Vs Number of test cases for Grand Prix
Table 6 shows the test set for coverage 1.
Table 6 Coverage 1 test set for Grand Prix
Test Case# Worlds Races Racer Kart PowerUp Controls Orientation Network Interrupts Screens
1 Phineas and Ferb Elimination Wander Wasabi Dragster Speed +1 Ignore Landscape Wi-Fi Text World Select Screen
2 Gravity Falls Coin Challenge Rob the Shark Davenport SFC Boost +1 Ignore Portrait 4G Lock/Unlock Race Select Screen
3
Lab Rats
Xtreme Coin Challenge
Agent P
Coolest Coaster
Jump +2
Ignore
Portrait
Wi-Fi
Home key
Racer Select Screen
4 Kickin It Boost Challenge Bruce the Sumo Nomirocket Bus Speed +2 Ignore Portrait 4G Back key Kart Select Screen
5
Mighty Red
Xtreme Boost
Challenge
Phineas
Mystery Shack Cart
Boost +2
Ignore
Landscape
Wi-Fi
Call
Pre-game Loading
Screen
6
Star Wars Rebels Missile Mania
Challenge
Dipper
Sylvia
Jump +3
Left
Landscape
4G
Call
In-game Screen
7 Phineas and Ferb Missile Mania Xtreme Lord Hater Mr.Tank Speed +3 Ignore Landscape 4G Lock/Unlock Pause/Quit game
8
Grand Prix
Race
Randy Cunningham
Speeder Bike
Boost +3
Ignore
Landscape
4G
Call Post-game Results
Screen
9
Kickin It
Coin Challenge
Waddles
Mighty Red 4x4
Jump +1
Ignore
Landscape
Wi-Fi
Home key Post-game Loading
Screen
0
Star Wars Rebels
Missile Mania
Challenge
Steve the Llama
Mr.Tank
Boost +3
Ignore
Landscape
Wi-Fi
Lock/Unlock
Options
11 Kickin It Elimination Ezra Sylvia Speed +3 Ignore Landscape Wi-Fi Home key About/Information
12 Mighty Red Elimination Chopper Coolest Coaster Jump +3 Ignore Portrait Wi-Fi Text Initial Loading Screen
13 Star Wars Rebels Missile Mania Xtreme Dipper Nomirocket Bus Speed +1 Right Landscape Wi-Fi Lock/Unlock In-game Screen
14
Lab Rats Missile Mania
Challenge
Wander
Davenport SFC
Speed +2
Jump
Landscape
4G
Call
In-game Screen
15
Phineas and Ferb
Missile Mania Xtreme
Steve the Llama
Wasabi Dragster
Speed +3 Deploy
Power-Up
Portrait
Wi-Fi
Call
In-game Screen
16
Lab Rats Missile Mania
Challenge
Agent P
Mighty Red 4x4
Boost +2
No input
Landscape
Wi-Fi
Text
In-game Screen
This game has already been through the testing process, so a record of bugs was available. We used the
recorded bugs to map each bug flagged by Workinman to an appropriate test case that would have flagged
the bug. From the record of bugs posted for the Grand Prix game, factor interactions for each bug were
Chart Title
160000
140000
120000
100000
80000
60000
40000
20000
0
1 2 3
Coverage
4 5
No constraints Constraints
Nu
mb
er o
f te
st c
ases
25 | P a g e
identified. The maximum number of interactions in the game that resulted in a bug is determined to be
three. All the bugs were classified based on the number of interactions that caused them. Mapping was
then performed between the bugs and the test sets generated. Table 7 gives an example of a few bugs and
their associated interactions. It shows the factors responsible for the cause of the particular bug.
Table 7 Breakdown of bugs into factor for Grand Prix
Bug Factors Responsible Number
Loading screen lasts longer than 25 seconds Post-game loading screen 1
User is unable to tap on Terms of Use button Information screen 1
Pixelated image appears for 'Bruce the Sumo'
character
Racer select screen, Bruce
the Sumo
2
Result screen loops the jumping animation if
user taps on jump icon at the end race point
Star wars rebels, Jump and
Rob the Shark
3
The first bug Table 7 in "Loading screen lasts longer than 25 seconds" was posted by the testers at
Workinman games for the Grand Prix game. This bug is present at the post-game loading screen of the
game and is caused only due to one factor. Therefore, every generated test case that had 'Initial loading
screen' as a factor level for screens was marked as a bug. The response variable took the value of 1 if a
test case resulted in a bug and 0 otherwise. Similarly, this method was applied to all the other bugs.
Table 8 shows the amount of bugs found for each design: coverage 1, 2, and 3. Close to 50% of the bugs
were covered just from executing the coverage 1 test cases. Approximately 90% of the bugs could be
discovered from running the coverage 3 design. The total number of test cases required to discover 90%
of the bugs is around 2360. If compared to the original 56 million test cases required by running a full
factorial, combinatorial testing discovered 90% of the bugs in mere 0.004% of the test cases that are
obtained from a general factorial design. This demonstrates that combinatorial testing is highly efficient.
Generation of test cases for coverage 4, 5 and 6 was not necessary for this particular game. From the
breakdown of bugs, we identified that none of the bugs were caused by more than 3 factor interactions.
Testing the game for coverage values above 3 is not necessary in this case as we would not be able to map
them to the bugs identified.
Table 8 Grand Prix results
Design No.of issues Percentage Cumulative Percentage
Coverage 1 15 45.45 45.45
Coverage 2 10 30.3 75.75
Coverage 3 5 15.15 90.91
26 | P a g e
Note that there were a total of 33 issues posted by Workinman games. There were 22 bugs that were not
caused by interaction of factors. For instance, "The incorrect version of MoPub ad server is being used".
The MoPub is a mobile ad server through which advertisements are pushed into the game. The functioning
or server calls are not a part of the tester's job. The only thing that a tester can make sure is to check if the
advertisements are being displayed or not.
With coverage 3, we found 90% of the bugs. Three bugs were not found. One of them was "Game crashes
after locking/unlocking the screen and then tapping on the device back button twice". These issues call for
multiple interrupts in one run (lock/unlock and back key in this case). Since our design only consists of one
interrupt factor, this bug could not be tracked. Another bug that could not be tracked in the game was
“Perry the Platypus remains in a static T pose”. The character Perry the Platypus was not present starting
from the versions of the game we were testing. It is not one of the factor levels for our Racer factor and
hence the bug could not be found. The last bug that we could not account for was “Return to main menu
appears if the user taps on the loading screen”. The tapping on the screen action does not come under any
specific category of factors that we decided for this game. It would have to be a whole another factor by
itself. Also, tapping on the screen is arbitrary. Even if we had a factor that mentions the actions, we cannot
be sure of how many levels to add to this factor.
This game has around 30 bugs that were caused due to interaction of factors. As explained earlier, each
bug has been broken down into factor interactions that were causing the bug. By using our method for
finding the faulty interactions, all the bugs posted by Workinman games can be found with just around
80-90 test cases. This accounts for about 3.7% of the total number of test cases.
4.1.2 Games not previously tested
Erator - Game of war
Erator is a turn based card game. The player and the opponent each have chosen their heroes. The goal of
the game is to destroy the opponent’s hero. For every turn, depending on the mode of the game, the player
and the opponent are dealt with a single card or a set of cards which have a predetermined functionality.
Using these cards, the player can attack the opponent’s hero. The opponent’s hero has to be attacked till
his health reaches zero for the player to win.
We have the pre-alpha version of this game. There are many functionalities of the game which do not
function. There are only two levels in the training which can be tested. Playing against another player is
also disabled. Table 9 shows the factors and levels associated with this game.
Table 9 Factors and levels of game Erator
27 | P a g e
Factors Levels Number Type
Game modes
Basic Traninig,Advancements in Face Punching , Heroes and Villians,
Trail by fire, Secrets and Lies, Against AI, Against player, Online
8
Game logic
Heroes Alanran Elmere Cenaturs, Tan Fillian Bards College 2 Game logic
Decks
Overgrowing with love, The bigger they are, Rotten rascals, Another one
bites the dust
4
Game logic
Actions
Lock, Minimize, Pause, Do not follow.
4
Interrupt,
Functionality
The number of general test cases for this game are shown in Table 10.
Table 10 Number of test case per coverage for Erator
Coverage1 Coverage 2 Coverage 3 Coverage 4
General tests 8 32 127 256
Basic Training 31 127 * *
Advanced Training 26 107 * *
For this game, the general testing using the factors mentioned in Table 9 did not result in bugs. Also, there
are many modes of the game which are not accessible yet. In the tutorial section, only basic training and
trial by fire are working. In other game modes, against player and online play have not yet been updated
in the game. Therefore, we decided to identify the bugs by perform in depth testing for the training modes
that were available. Due to the unavailability of all tutorial levels, the functionality of the cards for majority
of the game against AI is unknown. Hence, testing these cards without being aware of their functionality
might not result in accurate results.
For the in-depth testing, we divided the training mode of the game into dialogue boxes and sessions. The
dialogue boxes are the instructions for the player to follow to advance in the tutorial and the sessions are
where the instructions from the dialogue boxes are executed. Here, functionality and interrupts testing can
be done by combining both factor levels under a common name called ‘Actions’. The factors and levels for
were chosen based on the dialogue boxes and sessions. The basic training had 25 dialogue boxes and 5
sessions in between those boxes. The trial by fire training or the advanced training had 21 dialogue boxes
and 6 sessions. The results for this game after performing testing on the available training modes have
been given in Table 11.
Table 11 Coverage Vs Bugs in Erator
Basic
Training
Advanced training
Coverage 1 3 3
Coverage 2 4 5
The bugs in both basic training and the advance training are progression blockers and graphical bugs. The
28 | P a g e
game exhibits this block based on the ‘Do not follow’ action in the in-depth testing for the training modes.
The graphics bug is present throughout the game. At any point by clicking on a card, it disappears of the
screen before it is cast.
Utopia
This is a character based game, where the player has to protect his tower. The tower is attacked by the
computer generated opponents and the game ends when the tower is destroyed. There are roughly two
kinds of AI. A small AI that causes less damage and a big AI that causes high damage. The player is
equipped with the power to shoot the AI. He can also build shooting towers as a special power. There are
three different game modes. This game has been very well developed so we could not find any
functionality or game logic bugs. However, we illustrate the factors and coverage required for testing at
this stage. Table 12 shows the factors and levels for this game.
Table 12 Factors and levels of Utopia
Factors Levels Number Type
Levels Defense demo, Offense 1, Offense 2, Tutorial, Advanced Tutorial, Survival 6 Game logic
Controls Up, Down, Left, Right, Dash, Power 1, Power 2, Power 3 8 Game logic
Resolution
512x384, 640x400, 640x400, 800x600, 1024x768, 1280x600, 1280x720,
1280x768, 1360x768, 1366x768
10
Compatibility
Graphics Fast, Fastest, Simple, Good, Beautiful, Fantastic 6 Compatibility
Interrupts Lock, Minimize 2 Interrupt
Modes 1, 2, 3 3 Game logic
We generated test cases for coverage values 1 through 6. The results can be seen in Table 13.
Table 13 Number of test cases per coverage for Utopia
Coverage 1 Coverage 2 Coverage 3 Coverage 4 Coverage 5 Coverage 6
No of total tests 10 81 514 2889 8640 17280
Compatibility tests 8 43 * * * *
Since the game was very well developed, we could not uncover functionality or game logic bugs. However,
we discovered some bugs related to compatibility. The results of compatibility check have been shown in
Table 14.
Table 14 Coverage Vs Bugs in Utopia
No of bugs
Coverage 1 5
Coverage 2 5
29 | P a g e
A screenshot from the game depicting one of the compatibility issue has been shown in Figure 6.
Figure 6 Compatibility bug in Utopia
All the bugs that were caused in this game are compatibility issues. When the game is set to lower
resolution, the items in the main menu overlap with each other leading to a graphical bug. Also, the icons
lose their functionality. This bug is present throughout the game. Other bugs related to functionality or
interrupts were not present in this game.
4.2 Identifying failure inducing combinations
4.2.1 Notional example with Grand Prix
An example of how our sorting code works in further explained here. We made up an issue to explain the
working of our code. Let us assume, from the Grand Prix game, that the test case shown in Table 15
resulted in a bug. The bug is that the game crashes when Phineas and Ferb world, Elimination race,
Wander racer and Wasabi Dragster cart are combined together. Let us assume that this bug is caused by
the two factor interaction of Wasabi Dragster cart with Elimination race. This crash occurs when the game
is launched. We do not know the combination of factors that caused the crash. But we do know the steps
we have taken while finding this bug. Let us call this test case as the reference test case.
Table 15 Reference test case
30 | P a g e
Since the game crashed with the combination of four factors given above, we assume that it occurred at
the in-game screen. Now we reorder our test cases so that all the factor levels are different leading to the
same in-game screen. This test case can be seen in Table 16. This test case does not result in a bug. So we
can infer that this is not a general bug. This bug occurs only when specific factor levels interact with each
other.
Table 16 Test case with different factor levels
Since there was no bug we move onto the last step in the reference test case. We reorder our test cases
with Wasabi Dragster cart remaining the same and all the other factor levels are different (as shown in
Table 17). This test case also does not result in a bug. We repeat this process for the same world, race and
racer in three different test cases.
Table 17 Test case with same Kart
Table 18 Test case with same Racer
Table 19 Test case with same Race
Table 20 Test case with same World
We now move on to checking the interaction between factors. We keep the Wasabi Dragster cart and
Wander racer same as that in the reference test case. This test case can be seen in Table 21. As this test
case does not result in a bug, we move on the checking the interaction of Wasabi Dragster cart with
31 | P a g e
Elimination race as shown in Table 22. This test case will result in a bug. Since the Wasabi Dragster cart
and Elimination race in the last test case are the same as that in our reference test case, this is our FCC.
Table 21 Test case with same Racer and Kart
Table 22 Test case with same Race and Kart
4.2.2 Example from the game Erator
We now present a practical example for our logic for finding FCCs. Consider the game Erator. Since the
Erator game is not a complicated one, we decided to combine game logic testing with interrupt testing. In
the test cases generated for Erator, a test case that resulted in a bug has been shown in Table 23. This is
our reference test case. A screenshot of that particular bug has been shown in Figure 7.
Table 23 Reference test case
Figure 7 Screenshot of bug from reference test case
The tutorial in this game has been divided in Dialogue boxes (D) and session screens (S). D1 indicates that
we have to execute the test case at dialogue box 1. So to execute the given test case, we have to not follow
the dialogue box 14. When this happens, a progression block occurs as shown in Figure 7. The next
dialogue box points to an empty space where another card should have been.
32 | P a g e
So now we reorder our test cases with the ‘Do not follow’ command being the same and the other factor
level should be different. This new test case is shown in Table 24.
Table 24 Test case with same Action
The above test case shows a combination of dialogue box 25 and the do not follow action. This test case
does not result in a bug. The screenshot from the game has been shown in Figure 8.
Figure 8 Screenshot of test case with same Action
So now we go back to checking if there is a bug with dialogue box 14. We reorder our test cases to have
the same dialogue box 14 but with a different factor level under actions. This test case has been shown in
Table 25.
Table 25 Test case with same dialogue box
The screenshot from the game after executing the above test case can be seen below in Figure 9.
33 | P a g e
Figure 9 Screenshot of test case with same Dialogue box
Accessing the pause menu when the dialogue box 14 is displayed does not result in any kind of bug.
Therefore, we can conclude that the bug has been cause by the combination of dialogue box 14 and the
action ‘Do Not Follow’.
There is a difference between the number of steps required vary for general and specific bugs. For
example, a crash occurring when a call interrupt is made at the loading screen is a general bug where the
game crashes irrespective of the steps taken to approach the loading screen. In this case, our code takes
only 3 steps to find the failure inducing combination in spite of having 2 factors causing the failure. On the
other hand, for a specific bug like a progression block when the race elimination is selected with the
character Ezra takes 4 steps to get discovered. This is because we first check to see if the bug is general
and then proceed to the specifics of the bug.
Compared to the other methods for finding failure inducing combination mentioned in the literature, our
method is comparatively easy. The main advantage is that we only reorder the test cases to find the failure
inducing combination thereby keeping the number of test cases low. The only disadvantage in this case
would be modifying a section of the code after executing every test cases. It may not seem like a lot for
one and two factor interactions, but as the number of factor interactions increases, the number of steps to
discover the failure causing combination also increase resulting in more number of modifications to the
code.
34 | P a g e
5. Discussion
5.1 Conclusion
We used combinatorial testing to establish a framework to improve the testing efficiency of video games.
The methodology section explained the steps necessary to test games using combinatorial testing. First we
decide the factor and categorize them based on the type of testing. Then we decide the number of levels
for each factor. The combinatorial test cases are generated using the ACTS software. These test cases are
then executed in the game to check for bugs. We proposed a methodology to identify the FCC.
In the results section, we showed the implementation of combinatorial testing to games. A comparison of
general testing and combinatorial testing was presented using the game Grand Prix. We applied
combinatorial testing to games that have not been tested and the results were discussed.
The results indicate that our methodology generates far fewer test cases for complete testing of the game. It
is also effective in finding the bugs that were missed by the developers. Our logic for finding FCCs was
capable of finding the faulty interactions without generating additional test cases. Overall, combinatorial
testing can be implemented to improve the current game testing methods.
5.2 Future work
The future work for this thesis includes analyzing to what extent Ad-hoc testing can be accomplished using
combinatorial testing. It is evident that we cannot associate it with a percentage value. But,
implementation of mixed covering arrays or base choice covering arrays can uncover bugs that do not
occur with the traditional combinatorial testing process.
Also, a decision should be made regarding the implementation of combinatorial testing in games with
respect to the stage of development. Identifying how and when to implement combinatorial testing during
the game development should be the research focus in this regard.
Since the implementation of combinatorial testing for untested games was done on games from
independent developers, we could not find many bugs. The implementation of combinatorial testing for a
game that is currently in its development stages can be done to effectively conclude its benefits over other
forms of game testing.
35 | P a g e
Bibliography:
[1] “New Reports Forecast Global Video Game Industry Will Reach $82 Billion By 2017 - Forbes.”
[Online]. Available: http://www.forbes.com/sites/johngaudiosi/2012/07/18/new-reports-forecasts- global-
video-game-industry-will-reach-82-billion-by-2017/. [Accessed: 25-Feb-2015].
[2] B. Schechner, “Getting Started with Software Testing,” pp. 1–9, 2008.
[3] B. Bates, Game Design (2nd Ed.). 2004.
[4] “Halo: The Master Chief Collection review: the library | Polygon.” [Online].
Available: http://www.polygon.com/2014/11/7/7076007/halo-the-master-chief-collection-review-xbox-
one. [Accessed: 27-Apr-2015].
[5] “Sonic Boom speed run takes less than an hour, thanks to a Knuckles glitch | Polygon.” [Online].
Available: http://www.polygon.com/2014/11/12/7211863/sonic-boom-speed-run-takes-less-than-an- hour.
[Accessed: 27-Apr-2015].
[6] “5.1.1. What is experimental design?” [Online].
Available: http://www.itl.nist.gov/div898/handbook/pri/section1/pri11.htm.
[Accessed: 30-Nov-2015].
[7] C. Redavid and a Farid, “An Overview of Game Testing Techniques,” Idt.Mdh.Se.
[8] S. Varvaressos, K. Lavoie, A. B. Massé, S. Gaboury, and S. Hallé, “Automated bug finding in video
games: A case study for runtime monitoring,” Proc. - IEEE 7th Int. Conf. Softw. Testing, Verif.
Validation, ICST 2014, pp. 143–152, 2014.
[9] “NIST Covering Array Tables - What is a covering array?” [Online]. Available:
http://math.nist.gov/coveringarrays/coveringarray.html. [Accessed: 10-Apr-2015].
36 | P a g e
[10] R. Kuhn, R. Kacker, Y. Lei, and J. Hunter, “Combinatorial software testing,” Computer (Long.
Beach. Calif)., vol. 42, no. 8, pp. 94–96, 2009.
[11] “Practical Combinatorial Testing.” [Online].
Available: http://csrc.nist.gov/groups/SNS/acts/documents/SP800-142-
101006.pdf. [Accessed: 10-Apr-2015].
[12] D. R. Kuhn and M. J. Reilly, “An investigation of the applicability of design of experiments to
software testing,” 27th Annu. NASA Goddard/IEEE Softw. Eng. Work. 2002. Proceedings., 2002.
[13] R. N. Kacker, D. R. Kuhn, Y. Lei, and J. F. Lawrence, “Combinatorial testing for software: An
adaptation of design of experiments,” Meas. J. Int. Meas. Confed., vol. 46, no. 9, pp. 3745–3752, 2013.
[14] S. K. Khalsa and Y. Labiche, “An orchestrated survey of available algorithms and tools for
Combinatorial Testing,” 2014.
[15] L. Yu, “Constraint Handling In Combinatorial Test Generation Using Forbidden Tuples.”
[16] A. Testing, “Characterizing Failure-Causing Parameter Interactions by Categories and Subject
Descriptors,” ISSTA ’11 Proc. 2011 Int. Symp. Softw. Test. Anal., pp. 331–341, 2011.
[17] Z. Wang, B. Xu, L. Chen, and L. Xu, “Adaptive interaction fault location based on combinatorial
testing,” Proc. - Int. Conf. Qual. Softw., pp. 495–502, 2010.
[18] L. S. G. Ghandehari, Y. Lei, T. Xie, R. Kuhn, and R. Kacker, “Identifying failure-inducing
combinations in a combinatorial test set,” Proc. - IEEE 5th Int. Conf. Softw. Testing, Verif. Validation,
ICST 2012, pp. 370–379, 2012.
[19] T. T. S. Generation, “User Guide for ACTS Core Features,” pp. 1–15.
37 | P a g e
[20] Y. Lei, R. Kacker, D. R. Kuhn, V. Okun, and J. Lawrence, “IPOG: A general strategy for
T-way software testing,” Proc. Int. Symp. Work. Eng. Comput. Based Syst., pp. 549–556, 2007.
Appendix I:
The code for FCC is given
below. import csv
def getRow(file_name,row_number):
with open(file_name, 'rb') as f:
mycsv =
csv.reader(f) mycsv
= list(mycsv)
text = mycsv[row_number]
f.close()
return text
def search(file_name,reference_test_case,reference_test_line):
result = []
with open(file_name, 'rb') as f:
mycsv = csv.reader(f)
mycsv = list(mycsv)
for i in range(reference_test_line+1,len(mycsv)):
if mycsv[i][0] != reference_test_case[0] and mycsv[i][1] != reference_test_case[1] and
mycsv[i][2] != reference_test_case[2] and mycsv[i][3] != reference_test_case[3] and mycsv[i][4]
!=
reference_test_case[4] and mycsv[i][5]
!= reference_test_case[5] and mycsv[i][6]
!=
reference_test_case[6] and mycsv[i][7]
!= reference_test_case[7] and mycsv[i][8]
!=
reference_test_case[8] and mycsv[i][9] == reference_test_case[9] :
f.close()
result = swap_row(mycsv,i,reference_test_line+1)
writeCSV(file_name,result)
def swap_row(mycsv,first_row,second_row):
tmp = mycsv[first_row]
38 | P a g e
mycsv[first_row] =
mycsv[second_row]
mycsv[second_row] = tmp
return mycsv
def writeCSV(file_name,result):
with open(file_name,'w') as f:
for i in range(0,len(result)):
for j in range(0,len(result[i])):
f.write(result[i][j]+","
)
f.write("\n")
file_name = 'expt.csv'
reference_test_case = getRow(file_name,28)
search(file_name,reference_test_case,28)