Experimental Design in Game Testing

Rochester Institute of Technology Rochester Institute of Technology

RIT Scholar Works RIT Scholar Works

Theses

5-19-2016

Experimental Design in Game Testing Experimental Design in Game Testing

Bhargava Rohit Sagi [email protected]

Follow this and additional works at: https://scholarworks.rit.edu/theses

Recommended Citation Recommended Citation Sagi, Bhargava Rohit, "Experimental Design in Game Testing" (2016). Thesis. Rochester Institute of Technology. Accessed from

This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected].

https://scholarworks.rit.edu/

https://scholarworks.rit.edu/theses

https://scholarworks.rit.edu/theses?utm_source=scholarworks.rit.edu%2Ftheses%2F9007&utm_medium=PDF&utm_campaign=PDFCoverPages

https://scholarworks.rit.edu/theses/9007?utm_source=scholarworks.rit.edu%2Ftheses%2F9007&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]

Rochester Institute of Technology

Experimental Design in Game Testing

A Thesis submitted in partial fulfillment of the

requirements for the degree of

Master of Science in Industrial and Systems Engineering in the

Department of Industrial & Systems Engineering

Kate Gleason College of Engineering

by

Bhargava Rohit Sagi

May 19, 2016

2

DEPARTMENT OF INDUSTRIAL AND SYSTEMS ENGINEERING

KATE GLEASON COLLEGE OF ENGINEERING

ROCHESTER INSTITUTE OF TECHNOLOGY

ROCHESTER, NEW YORK

CERTIFICATE OF APPROVAL

M.S. DEGREE THESIS

The M.S. Degree Thesis of Bhargava Rohit Sagi

has been examined and approved by the

thesis committee as satisfactory for the

thesis requirement for the

Master of Science degree

Approved by:

____________________________________ Dr. Rachel Silvestrini(Thesis Advisor), Associate Professor, Industrial and Systems Engineering

____________________________________

Dr. Brian K. Thorn, Associate Professor, Industrial and Systems Engineering

____________________________________ Dr. Jessica Bayliss, Associate Professor, Interactive Games and Media

____________________________________

Dr. David Schwartz, Associate Professor, Interactive Games and Media

3

Abstract

The gaming industry has been on constant rise over the last few years. Companies invest huge amounts

of money for the release of their games. A part of this money is invested in testing the games. Current

game testing methods include manual execution of pre-written test cases in the game. Each test case may

or may not result in a bug. In a game, a bug is said to occur when the game does not behave according to

its intended design. The process of writing the test cases to test games requires standardization. We

believe that this standardization can be achieved by implementing experimental design to video game

testing. In this thesis, we discuss the implementation of combinatorial testing to test games.

Combinatorial testing is a method of experimental design that is used to generate test cases and is

primarily used for commercial software testing. In addition to the discussion of the implementation of

combinatorial testing techniques in video game testing, we present a method for finding combinations

resulting in video game bugs.

4

Contents

Abstract ......................................................................................................................................................... 5

1 Introduction ................................................................................................................................................ 5

2 Literature Review ....................................................................................................................................... 7

2.1 Game testing techniques ..................................................................................................................... 7

2.2 Combinatorial testing .......................................................................................................................... 8

2.2.1 Overview of combinatorial testing ............................................................................................... 8

2.2.2 Tools for generating combinatorial tests: ................................................................................... 10

2.2.3 Identifying failure inducing combinations ................................................................................. 11

3. Methodology ........................................................................................................................................... 13

3.1 Combinatorial testing for video game debugging ............................................................................. 13

3.2 Classification of factors .................................................................................................................... 15

3.3 Methodology for finding bugs: ......................................................................................................... 17

3.3.1 Sorting test cases to find Failure Inducing Combination ........................................................... 17

4. Results: .................................................................................................................................................... 21

4.1 Combinatorial testing in video games ............................................................................................... 21

4.1.1 Previously tested games ............................................................................................................. 21

4.1.2 Games not previously tested: ..................................................................................................... 25

4.2 Identifying failure inducing combinations ........................................................................................ 29

4.2.1 Notional example with Grand Prix............................................................................................. 29

4.2.2 Example from the game Erator .................................................................................................. 30

5. Discussion ............................................................................................................................................... 33

5.1 Conclusion ........................................................................................................................................ 33

5.2 Future work ....................................................................................................................................... 34

Bibliography ............................................................................................................................................... 35

Appendix I: ................................................................................................................................................. 35

5

Tables:

Table 1 Coverage 1 design ............................................................................................................................ 9

Table 2 Coverage 2 design ............................................................................................................................ 9

Table 3 Tools meeting our requirements .................................................................................................... 11

Table 4 Factors and levels of Grand Prix .................................................................................................... 22

Table 5 Number of test cases per coverage for Grand Prix ......................................................................... 22

Table 6 Coverage 1 test set for Grand Prix ................................................................................................. 23

Table 7 Breakdown of bugs into factor for Grand Prix ............................................................................... 24

Table 8 Grand Prix results .......................................................................................................................... 25

Table 9 Factors and levels of game Erator .................................................................................................. 26

Table 10 Number of test case per coverage for Erator ................................................................................ 26

Table 11 Coverage Vs Bugs in Erator ......................................................................................................... 27

Table 12 Factors and levels of Utopia ......................................................................................................... 27

Table 13 Number of test cases per coverage for Utopia ............................................................................. 28

Table 14 Coverage Vs Bugs in Utopia ........................................................................................................ 28

Table 15 Reference test case ....................................................................................................................... 29

Table 16 Test case with different factor levels ........................................................................................... 29

Table 17 Test case with same Kart ............................................................................................................. 30

Table 18 Test case with same Racer ........................................................................................................... 30

Table 19 Test case with same Race ............................................................................................................. 30

Table 20 Test case with same World .......................................................................................................... 30

Table 21 Test case with same Racer and Kart ............................................................................................ 30

Table 22 Test case with same Race and Kart .............................................................................................. 30

Table 23 Reference test case ....................................................................................................................... 31

Table 24 Test case with same Action .......................................................................................................... 31

Table 25 Test case with same dialogue box ................................................................................................ 32

6

Figures:

Figure 1 Game testing cycle .......................................................................................................................... 6

Figure 2 Coverage Vs % of errors ............................................................................................................... 14

Figure 3 Different types of testing performed using combinatorial testing ................................................ 16

Figure 4 Flow chart for finding FCC for a two factor interaction ............................................................... 19

Figure 5 Coverage Vs Number of test cases for Grand Prix ....................................................................... 23

7 | P a g e

1 Introduction

The gaming industry has been on a rise over the last decade. It is predicted that the net value of the

gaming industry will increase from $67 billion in 2012 to $82 billion by 2017[1]. A part of a gaming

company's revenue is used for game testing, to remove or fix software defects. In the US, more than $22.2

billion a year can be saved annually by implementing an improved infrastructure to enable more effective

identification and removal of software defects [2].

Gaming companies are typically divided into the following teams: Development, Production, Distribution

and QA. The development team is responsible for the design and development of games. They are also

responsible for fixing the bugs reported by the testers. The production team finances the development of

the game and is involved in all the monetary transactions. The distribution team is involved in making the

game accessible to the users. They explore the distribution channels for releasing the game into the

market and online stores. The QA's task is to identify the bugs in the game and log them into the

company's specific database for the developers to access and fix them.

Game testing is a quality control process [3]. The main aim of game testing is to find bugs in the game

software. The Quality Assurance (QA) team is responsible for identifying as many bugs as possible that

ruin the gaming experience for the end user. A bug in a game is when the game software does not behave

according to your intended design. The method adopted by QA to test games is called manual testing. The

QA executes pre-written checklists, which cover various aspects of the game. These checklists are

intended to include test cases that cover all the scenarios in the game, thereby making sure that each area of

the game is tested.

Game testing is similar to software testing in many aspects. A standard game testing cycle is shown in

Figure 1. When a game is developed, the development lead and the QA lead develop test cases. The game

testers then execute test cases to find bugs. They report these bugs to the developers, who then fix the

bugs. After the bug fixes are completed, an updated version of the game is sent back to the testers for

testing with a new set of test cases. This cycles repeats until the production team is satisfied with the

number of bugs fixed and the quality of the game.

8 | P a g e

Figure 1 Game testing cycle

Game testing is time and labor intensive. It is important to spend resources in order to provide a defect

free (or nearly defect free) game, but it is impossible to exhaustively test all scenarios. Therefore a balance

between resource spending and finding bugs must be achieved.

There are examples of incidents in popular games where the testing for a game was not comprehensive

enough. In Halo: The Master Chief Collection on Xbox One, the multiplayer mode was inaccessible to the

users. The users could not access a single match of any kind, encountering various error messages or

endless queues, and even one full game crash to the Xbox One dashboard [4]. Also, Sega’s Sonic Boom,

the latest in its long-running Sonic the Hedgehog series, was shipped with bugs. One serious bug was that

the user could jump infinitely into the air by pausing and un-pausing the game, thus completing the game

in under an hour [5]. Such incidents lower the quality of the game and can be detrimental to a company

and its reputation.

Serious bugs, such as these examples and their cost suggest that there is a necessity for a more thorough

game testing process. The test cases that are executed by the game tester are written by the Development

Team Lead or the QA Team Lead. A better and a more comprehensive method to generate test cases may

not guarantee a game completely free of bugs, but it will help increase the number of bugs found,

decrease the time required to do so and thus decrease the overall cost of testing.

One way to improve the game testing infrastructure is through the use of experimental design.

Experimental design is a procedure for planning experiments so that the data can be analyzed to yield

valid conclusions [6]. In the literature, there are no applications of experimental design to manual game

testing. However, there are applications in software testing, which in many aspects is similar to game

Generate test cases

Report bugs

Develop game / Fix

bugs

Find bugs

Test game

http://www.polygon.com/2014/11/12/7211863/sonic-boom-speed-run-takes-less-than-an-hour

9 | P a g e

testing. In this thesis, experimental design and analysis methods are applied to the manual game testing

process. Specifically, we show that combinatorial testing is an effective approach for generating test cases

to test games.

2 Literature Review

The literature review has been divided into two sections. The first section discusses current game testing

techniques. The second section gives insight into combinatorial testing and its application in the software

industry.

2.1 Game testing techniques

The game development cycle, on any platform, has stages which are known as milestones [7]. The

milestones indicate that the game is at a particular level of development. The milestones, generally, are

first playable, alpha stage, beta stage, gold master and code release. The first playable version is similar to

that of a demo version, where the feel of the game is observed and assessed. In the alpha stage, the game

is said to be feature complete, i.e. all the features that the game is intended to have are present. This is

when the testing cycle begins. In this stage, the developers do not make any changes to the features of the

game but only fix bugs. The beta stage represents the feature complete and mostly bug free game. After

the beta stage, a gold master version of the game is released. Ideally, there should be no bugs at this stage.

Then the game is code released into the market.

There are two types of game testing: automated and manual. Runtime monitoring of video games is a

method of automated bug finding [8]. It is called white box testing, where a knowledge of the game's

source code is necessary. These kind of testing methods may be effective but are not simple. Additionally

in runtime monitoring, the rules to be specified for monitors for finding bugs increase with the size of the

game. No amount of rules give a complete enumeration of the expected behavior of the game [8].

In manual testing of a video game, the tester is unaware of the game's source code. This type of testing is

called black box testing. The tester executes the test cases and observes its effect on the game. If a test

case results in a bug, the tester reports the bug and the developers fix it. As a result, manual testing of a

video game is a simpler process than runtime monitoring.

There are various types of manual game testing techniques that can be used to identify bugs in any given

game. Combinatorial testing, test flow diagrams, cleanroom testing, test trees, play testing and adhoc

testing are a few of the examples [7]. Each of these methods can be used to generate a set of test cases. In

a previous research study of game testing methods [7], combinatorial testing is suggested to have the

highest efficiency and reduce cost and resources for testing a game. However, there seem to be no papers

10 | P a g e

that discuss the application of combinatorial testing to manual game testing. To facilitate the process of

generating test cases, experimental design is used in this thesis.

2.2 Combinatorial testing

Combinatorial testing is a choice of experimental design that is used to generate the test cases. It is similar

to a fractionated factorial design. Combinatorial testing is further explained in the following section. This

section is divided into three subsections. The first sub-section gives an overview of combinatorial testing

with an example of the working of combinatorial testing in a simple software application. The second

subsection discusses that tools that can be used to general combinatorial test cases. The third subsection

explains the literature that helps us identify the combination of factors that result in a bug.

2.2.1 Overview of combinatorial testing

While there is no literature that we could find that discusses the implementation of combinatorial testing to

manual testing of video games, literature exists on the applications of combinatorial testing to test

software. This section gives a brief introduction to combinatorial testing and its applications in testing

software.

Combinatorial testing uses covering arrays to generate test cases. A covering array can be denoted as

CA(t,k,v) [9]. ‘t’ stand for the strength of the test case. It is otherwise known as coverage. We explain the

concept of coverage with an example in the later sections. ‘k’ indicates the number of levels of each

variable used to generate the array. ‘v’ represents the number of variables present in the covering array.

Consider a situation in which tests must be done to ensure that a software application can run on a

computer [10]. Let's assume the factors involved in the test are the operating system (Windows, Linux),

processor type (Intel, AMD) and IPv4 or IPv6 protocols. In a test with three factors, each factor at two

levels, a complete factorial design should consist of 2*2*2 = 8 runs.

Combinatorial testing is an alternative to a factorial design that provides a considerably less number of

runs. An essential part of generation of combinatorial test cases involves the concept of coverage.

Coverage is used to identify how well a test set covers the possible combination of a certain number of

factors. By varying the coverage of a test set, we generate varying number of test cases to test the

software. In other words, coverage is a measure of combination of levels between factors (otherwise

known as interaction). Table 1 shows the design for testing the software application example using

combinatorial testing with coverage 1. When coverage is 1, there are no factor interactions required. Each

level of the factor is present in the design irrespective of the combination with another factor, thus the

maximum number of test cases in a coverage 1 test set is equal to the highest number of factor levels of

11 | P a g e

all the factors in the design.

Table 1 Coverage 1 design

Operating System Processor Protocol

1 Windows Intel IPv4

2 Linux AMD IPv6

Table 2 shows the design with coverage 2. It has only four runs, which test the combination of every

component with every other component once. This is also known as pair-wise testing.

Table 2 Coverage 2 design

Operating System Processor Protocol

1 Windows Intel IPv4

2 Windows AMD IPv6

3 Linux Intel IPv6

4 Linux AMD IPv4

Note that in Table 2, combinations such as Operating System = Windows, Processor = Intel and Protocol

= IPv6 is not present in the design. This is because the coverage is 2 and not 3. When the coverage for this

design is increased to 3, we can test the above mentioned combination. Therefore, by varying coverage, it

is possible to test a software exhaustively. The choice of coverage plays an important role in determine the

effectiveness of a combinatorial test set. It determines the interaction between factors and thus results in

better fault location. However, this is at an expense.

As coverage is increased, the number of test cases can dramatically increase. Determining the appropriate

coverage is important. Previous studies [11] [12] on software failures involving large scale tests suggest

that all failures could be triggered by a maximum of 4-way to 6-way interactions. So, a coverage strength

between 3-6 is effective for finding more bugs in a software application.

2.2.2 Tools for generating combinatorial tests

To further understand the combinatorial testing techniques, we need to delve into how the test suites are

generated. As explained in [13], combinatorial test suites can be generated using two techniques:

Orthogonal Array (OA) and Covering Array (CA). For testing video games we believe that CAs are more

suited than OAs for the two reasons: First, game events have constraints. CAs allow the implementation

12 | P a g e

of constraints in developing a test suite whereas OAs do not. Second, our focus is to create test suites with

fewer test cases. Generation of combinatorial test suites with CAs result in fewer test cases than OAs.

In [14], a detailed analysis and comparison of 75 tools/algorithms for generation of combinatorial test

suites in given. Covering arrays generated using greedy techniques were found to be popular due to their

simplicity. Greedy techniques are a type of algorithm that are used to generate covering arrays. They

support large system configuration including constraints and higher strengths. Keeping the testing of

video games in mind, we came up with the following requirements for effective testing. First, we need a

tool that can generate a covering array with a maximum coverage strength of 6. Second, game events

require constraints, so the tool that we use should facilitate the implementation of constraints. Third, base

choice selection criteria can be necessary when testing games of relatively large sizes. Base choice allows

us to test particular sections of a game. A levels of a particular factor can be fixed as the base choice and

then test cases are generated. Fourth, mixed covering array strength could be useful to test the effect of

important factors such as interrupts in the game.

A list of tools that meet those requirements out of the 75 identified are presented in Table 3.

Table 3 Tools meeting our requirements

Software Coverage Constraints Base Choice Variable Strength Uniform Strength Availability

ACTS 6 Full support Yes Yes Yes Yes

tTuples 6 Forbidden Tuples No No Yes Yes

PICT 6 Full support Yes Yes Yes Yes

Intelligent Test Case Handler 6 Forbidden Tuples No Yes Yes Jenny <=8 Forbidden Tuples No No Yes Yes

Test Vector Generator 6 Full support No Yes Yes Yes

IPOD

6

No information

No

No

Yes

Algorithm

present in

ACTS MIPOG

11

No information

No

No

Yes

Modified

IPOG. Present

in ACTS ITTDG 12 No information No Yes Yes Algorithm

Harmony Search Strategy 14 No information No Yes No information No

Particle Swarm Test Generator 6 No information No Yes No information No

HSTCG 7 Full support No Yes No information No

Hexawise 6 Forbidden Tuples No Yes Yes Web based tool

PictMaster 6 Full support No No Yes Yes

Constraint handling is generally of two types – constraint solving based and forbidden tuples based [15].

Constraint solvers are used in constraint solving based approach and a test is valid if it satisfies the

constraint. Forbidden tuples identifies a set of combinations of factor levels that are forbidden and the

validity of the test case is established only if it does not contain those forbidden tuples [15].

Based on the requirements of game testing, we felt four software suites were the most suitable. They are

ACTS, PICT, Hexawise and Jenny. These software were downloaded and compared. They differ in terms

of usability. Of all the software, ACTS has a better user interface and the system creation process is also

13 | P a g e

simple. PICT requires you to write a small, uncomplicated code, which has to be uploaded to the PICT

website to generate the test cases. Hexawise has a decent user interface but the implementation of

constraints is difficult. Jenny works using the command prompt and is complicated when compared to the

other in terms of generating the test cases. Therefore, we proceeded with ACTS for generating

combinatorial test cases.

2.2.3 Identifying failure inducing combinations

To further assist the developers to make the game debugging process easier, we intended to develop a

technique to identify the combination of factors that lead to a bug. Bugs in a game are caused not only by

individual factors but also by an interaction between factors. It is important to identify the interaction of

factors causing a bug as it reduces the effort to fix the bug. We call the interaction causing a bug as Failure

Causing Combination (FCC). The amount of work done by the developers for bug fixes can be reduced

with effective ways of finding of failure inducing combinations.

From the literature, we found that there are two methods of fault detection in software testing: adaptive

method and non-adaptive method. In the adaptive method, the generation of test cases for fault location is

done after the execution of a set of test cases. The output from the executed test cases is used for further

generation of test cases for fault location. In the non-adaptive method, all test cases are executed

simultaneously.

A new fault characterization called the faulty interaction characterization (FIC) has been proposed in [16].

Additionally, a binary alternative (FIC_BS) to locate one failure causing interaction in a single failing test

case has also been introduced. This is a form of adaptive fault detection, where they use a single test case

for reference called the seed test case to generate further adaptive tests. The basic idea here is to

repeatedly compare the factor interactions in the reference test case to the set of parameters. Based on the

result of this comparison, the factor levels in the reference test case are exchanged with that in the set of

parameters. The main drawback with FIC and FIC_BS is that they generate the fault locating test cases

based on a single failed test. This may result in higher number of test cases.

A technique to locate interaction faults in combinatorial testing, which has an iterative interaction fault

location strategy (IterAIFL) has been presented in [17]. In this technique, a complete test set is used for

the generation of new test cases. The combination of factor levels that cause a failure is called the minimal

failure causing schema (MFS). After the execution of a full test set, the test set is separated into two sets.

One set consists of all the failed tests and this set is believed to contain the MFSs. The authors then

formulate the schema sets by subtracting all the schemas that are common between the two sets. The

remaining schema are then used to generate new test cases. A schema set is a set of all the combination of

14 | P a g e

factors in a test case. If a test case is (A, B, C, D) then (A, -, -, -) is a schema of the schema set. In this

particular example, there can be 24 schemas. The process of subtraction of schema sets is repeated till all

the MFSs are found. This process appears to generate higher test cases because for every iteration in

IterAIFL, new test cases are generated resulting higher number of test cases. Additionally, there are many

assumptions made to apply the IterAIFL strategy. One of them is that the test set cannot have constraints.

Since game events require constraints, the implementation of IterAIFL strategy to game testing is not

possible.

An approach to identifying failure inducing combinations based suspicious combinations has been

explained in [18]. The first step is to rank suspicious combinations and then generate tests based on this

ranking. The next step is called reduction. In the reduction step, the final analysis of all the ranking takes

places and the lower ranked suspicions are rejected. This process keeps repeating till a stopping condition

is satisfied. The ranking of suspicious combinations is a combination of three categories – suspiciousness

of component, suspiciousness of combinations and suspiciousness of environment. The test cases are

generated based on the few ranks done in the ranking step. As this process is iterated, the suspiciousness

combination of size 1 is achieved, which indicates that all the combinations in that set cause failure. Due

to complexity, we do not want to use this strategy for our thesis. Our technique for finding the failure

causing combinations (FCC) has been explained in the methodology section.

3. Methodology

The proposed work aims to accomplish two main research objectives:

1) Apply experimental design into a new area of research (specifically game testing).

2) Create a methodology for finding the combination of factors that cause bugs in a game.

The first objective can be broken down into two goals. The first goal is to develop a framework for helping

the game tester implement experimental design in games. The second goal is to illustrate the

implementation of experimental design approach using available games. In this section, we present the

methodology associated with accomplishing the research objectives.

3.1 Combinatorial testing for video game debugging

As the number of factors and levels increases, the number of test cases also increases. Also, with an

increase in coverage, the total number of errors increase. 1-way coverage test set generates 67% of the

bugs. 93% of the bugs are generated with 2-way coverage. 98% with 3-way and reaching 100% between

4-way and 6-way coverage. Figure 2 shows the graphical representation of coverage vs. cumulative

15 | P a g e

percentage of bugs found.

Figure 2 Coverage Vs % of errors

For generating combinatorial test cases to find bugs in video games, we propose using the ACTS software.

ACTS stand for Advanced Combinatorial Testing Software and was developed by National Institute of

Standards and Technology (NIST). ACTS is a software that generates t-way combinatorial test sets with

constraints and variable-strength relations.

The first step in ACTS to generate combinatorial test sets is to identify the parameters and their levels for

the System Under Test (SUT). Our SUT will be the video game to undergo testing. We can have four

different types of parameters – Boolean, Enum, Number and Range. A Boolean parameter has two levels

- True or false. Enum is the type of parameter where we can specify categorical factors. An example of

Enum parameter can be the different type of game modes that can be used for testing. The number

parameter is used to specify if our factors are numerical. An example of a numerical parameter is the

number of times the tester needs to access a particular factor level in a test case. For instance, jumping 3

times before performing another action in the game. A range parameter is to specify a range of values that

the parameter can take. The speed of the car in a racing game can be an example of the range parameter.

In this case we can specify the range to be 60 – 100 mph. A parameter can have any number of levels.

After specifying the parameters and their levels, we move on to identifying the constraints between

parameters. ACTS supports Boolean, relational and arithmetic operators for constraints. An example of a

constraint in ACTS can be (Vehicle = “Car”) => (Speed = 100). This constraint makes sure than when the

factor Vehicle has the level ‘Car’ in a test case, then the factor Speed will only have the level 100.

To build a combinatorial test set, we go to the operations menu and click on ‘Build’. A new session

100

90

80

70

60

50

40

30

20

10

0

1 2 3 4 5 6

16 | P a g e

window pops up where we choose the algorithm to generate our test cases. The different types of

algorithms present in ACTS are IPOG, IPOG–F, IPOG-F2, IPOG-D and Base choice. IPOG, IPOG–F,

IPOG-F2 are recommended for smaller systems (less than 20 parameters and 10 levels of each factor on

average. IPOG–D is for bigger systems [19]. The generalization of IPO (Input-Parameter-Order) strategy

from pairwise testing to t-way testing results in the IPOG (Input-Parameter-Order-General) strategy [20].

For a system with t or more parameters, the IPOG strategy builds a t-way test set for the first t parameters,

extends the test set to build a t-way test set for the first t + 1 parameters, and then continues to extend the

test set until it builds a t-way test set for all the parameters. The extension of an existing t-way test set for

an additional parameter is done in two steps: horizontal growth, which extends each existing test by

adding one value for the new parameter; vertical growth, which adds new tests, if needed, to the test set

produced by horizontal growth[20].

In the session window in ACTS, we also have the option of selecting the strength of the test set, which is

the coverage. We can leave the constraint handling to default and randomize the don’t care values. While

generating combinatorial test cases, ACTS gives us a choice of randomizing the don’t care values. Don’t

care values are those factor levels which are not necessary to satisfy the coverage value. They are called

don’t care values because the coverage of the test set is achieved even if those factor levels are absent

from the test set or are arbitrarily chosen. By randomizing the don’t care values, we assign a factor level

to the test case. By doing this we just make the test case more sensible. When we click on the ‘Build’

option, a combinatorial test set with the selected strength/coverage is generated. ACTS lets us export this

test set into excel and csv formats.

3.2 Classification of factors

The factors that are used to generate combinatorial test cases can be divided into five different types

depending on the type of testing that is to be performed. For our methodology, factors can be categorized

into game behavior or game specific, interrupt based, language based, hardware/software based and first

part requirement based. Other types of testing such as soak testing, where the game is left running for

long periods of time without any user input, has different factors that can be used. Since soak testing is a

part of game logic, we are not introducing using these factors as a separate category.

Factors can be basically broken down into 5 categories as mentioned in Figure 3. If the factor is specific

to the game then it can be used for game logic or functionality testing. In this type of testing, the tester is

looking for bugs that are not complying with the game logic. For example, a character moving backwards

in a game when the forward key is pressed. Interrupt based factors are used for interrupt testing. This

testing is used to interrupt the normal functioning of the game using external factors. Receiving phone

calls or test messages on a cell phone while playing a game on a cell phone is an example of interrupt

17 | P a g e

testing. Games are generally released in different languages and the tester has to make sure that the

translation in different languages is accurate. Compatibility testing is done to assess the game’s

performance is similar on both high end and low end devices. Testing the same game with different levels

of RAM on a computer is a form of compatibility testing. To release a game into the market, it has to meet

with certain requirements. These requirements are specific to the platform on which the game is being

released. If a game is to be released on Apple devices, then Apple has a checklist to which the game should

comply. Compliance testing is done to check if the game is meeting all the requirements.

Figure 3 Different types of testing performed using combinatorial testing

The first part of our methodology is to identify the factors. It helps us to then categorize these factors

based on the testing that is to be performed. If we categorize these factors based on the type of testing,

then logging an issue and writing the bug report becomes much simpler. Also, by categorizing factors into

types of testing, we would know what kind of bugs to look for. For example, a game logic factor

resulting in a bug will be a crash, freeze, progression block, text or graphic issues.

18 | P a g e

Choosing the factor levels depends on the type of factor. Factor levels for game logic/functionality are

based on the game. They are specific to the game that is being tested and hence change from game to

game. Generally, the factor levels for game logic testing are the choices that the player makes in the game.

Localization testing is also game specific. The translations change based on the original test in English and

vary for games. Factor levels for interrupt testing factor remain the same irrespective of the game. For

computer games, the interrupt factor’s levels would be minimize the game, lock the screen and turn off

the computer when the game is running. These factor levels would be the same for all games.

Compatibility testing depends on the testers’ access to various devices that can be used for running the

game. The factor levels would be the different types of devices/configurations that the tester can use.

Compliance testing remains the same for all games. It is dependent only on the platform on which the

game is being released.

Constraints form a major part of generation of test cases using combinatorial testing. Game events

generally have constraints. ACTS lets the tester form the constraints for any system. For example, the

controls for a character in a game can be accessed only in the game. They cannot be accessed on the main

menu. Hence, a constraint specifying the use of the controls has to be applied to generate meaningful test

cases.

3.3 Methodology for finding bugs

Now that we have identified the method to generate test cases using ACTS software and the classification

of factors for different types of testing, we explain how combinatorial test cases are executed in the game.

All the factor levels mentioned in a particular test case are to be run in the game to see if they are resulting in

a bug. This way the game can be tested for varying levels of coverage.

3.3.1 Sorting test cases to find Failure Inducing Combination

As mentioned in the literature review, sorting tests for identifying the FCC can greatly reduce the effort of

developers in bug fixes. We used python to develop a code for sorting the combinatorial test cases. After

reviewing all the methods discussed in the literature, we came up with the idea of reordering the existing

test cases of a combinatorial test suite as opposed to using an algorithm to generate new test cases. The

reordering of the test cases can be done using the SOFOT (Simplified One Factor One Time) to identify

the factors or their combinations that cause bugs in a game.

The first step was to develop a logic for finding the FCC. When a bug is discovered, the tester logs it in an

online database for the developer to read it. This bug report follows a standard format. It consists of

information on how to reproduce the bugs. The steps to reproduce the bug can be traced backwards to

identify the FCC. The logic has been depicted in Figure 4.

19 | P a g e

The efficiency of our method for identifying the failure inducing combination highly depends on the type

of bug. All the bugs for any game can be classified into two types – a general bug or a specific bug. A bug

is general when it occurs at the factor “screens”, i.e. the bug is caused at all factor levels. On the other

hand, a specific bug occurs because of the combination of definite factor levels. This type of bug is specific

to the combination of the two factor levels. Figure 4 present the logic only for two factor interactions. Our

code can be used for more than two factor interactions.

20 | P a g e

Start

Execute test

Move to next test

case

No

Bug?

Yes

Run code with the

same 'screen'

It's a specific No

Bug?

Yes

Bug

?

Run code with same

penultimate factor

Yes

It's a general

No

Run code with same

Antepenultimate factor

Bug?

Yes

This is our FCC

No

This is our

FCC

Check for two factor

interaction

Bug?

Yes

No

Proceed to check for 3

factor interactions

End

Figure 4 Flow chart for finding FCC for a two factor interaction

20 | P a g e

Let the test case that results in a bug be called the reference test case. When this reference test case is

executed, we can see the screen at which the bug occurs. The next step would be to identify if the bug is a

general bug or specific bug. To reach this conclusion we first need to reorder our test cases in a way that

only the factor “screens” remains the same in the next test case. There are two ways we can proceed from

here. First, if the new test case results in a bug, we then reorder our test set again with the ‘screens’ factor

level remaining the same. If the next test case results in a bug then it is an indication that the bug is a

general bug. This process can be repeated a couple of times to see if the bug is occurring with every new

reordering of test cases. If all the iterations result in a bug then it a general bug that is caused at the

particular screen irrespective of any other factor interaction. Second, if the new test case does not create a

bug, it is an indication that it is a specific bug. It is being caused due to specific combination of factor

levels. At this point we do not know what factor level or a combination of factor levels is causing this bug.

It is necessary to test individual factors and a combination of different factors to determine the failure

inducing combination. We proceed to the penultimate step in the execution of the reference test case.

After checking each individual factor levels in this fashion, we proceed to check for interactions among

factors. We again start with the interaction between the penultimate factor level and the others. We can

continue to check all two-way interactions before proceeding to 3-way interactions. If at any point, in the

analysis, a test case results in a bug then the factor levels that this test case has in common with our

reference test case is the failure inducing combination.

It is to be noted that in a t-way combinatorial test set, it is possible to find the FCC which has a t-way

interaction. The reason is that there may not be enough test cases in a test set to determine the FCC if the

failure is caused by an interaction of factors that is greater than the coverage of the test set.

Bugs with one, two and three factor interactions were used to check the efficiency of our code. The code

seems to function fairly well with the one factor and two factor interactions. It can be seen that as the

number of factor interaction increase, the number of steps required to find a bug also increase.

Additionally, a general formula for the number of steps required to identify the failure inducing

combination for specific with our code is 2k and for general issues it is 2k-1, where k = number of factors

that resulted in the bug.

The python code for reordering our test cases to find the failure inducing combinations has been given in

Appendix I.

22 | P a g e

4. Results

The results have been divided into two section. The first section presents the comparison of the efficiency

combinatorial testing techniques with a game that has already been tested. It also presents the

implementation of combinatorial testing to two new games. The second section discusses the results of

the methodology for finding failure inducing combinations with an example from one of the games we

tested.

4.1 Combinatorial testing in video games

After the generation of test cases using the ACTS software, the test cases are then executed in the game.

All the factor levels present in a test case are executed in the game to see if they result in a bug. Depending

on the complexity of the game, the number of test cases can vary from a few hundreds to thousands. Also,

the complexity of the game dictates the time required to perform the testing. Here we present the results

of combinatorial testing on games.

4.1.1 Previously tested games

We present the results of combinatorial testing on the game Grand Prix in this section. Disney's XD Grand

Prix game developed by Workinman games, Rochester. This is a racing game where the user can race in

different modes. The choices in this game are the worlds, the races, the racers and the kart that the user

can select to play the game.

As discussed in the methodology section, the Grand Prix game has been broken down into factors and

their respective levels. This game consists of 10 factors with varying levels. We decided to generate test

cases are generated for game logic, interrupts and network testing for this game. We could not perform

compliance and compatibility testing as the game was running on Unity. The factors and their levels have

been mentioned in Table 4. The last column indicates the kind of testing that can be performed using that

particular factor.

Table 4 Factors and levels of Grand Prix

Factors

Levels

Number

Type of

factor

Worlds Grand Prix, Phineas and Ferb, Gravity Falls, Lab Rats, Kicking it, Mighty

Red, Star Wars, Rebels

7 Game

logic

Races Race, Elimination, Coin Challenge, Extreme Coin Challenge, Boost

Challenge, Extreme Boost Challenge, Missile Mania Challenge, Missile

Mania Xtreme

8 Game

logic

Racer Wander, Rob the Shark, Agent P, Bruce the Sumo, Phineas, Dipper, Lord

Hater, Randy Cunnigham, Waddles, Steve the Llama, Ezra, Chopper

12 Game

logic

Kart Mighty Med, Wasabi Dragster, Davenport SFC, Coolest Coaster, Nomirocket

Bus, Mystery Shack Cart, Silvia, Speeder Bike

8 Game

logic

Power-Ups Jump +1, Jump +2, Jump +3, Speed +1, Speed +2, Speed +3, Boost +1, Boost

+2,

Boost +3

9 Game

logic

23 | P a g e

Controls Left, Right, Jump, Deploy Power-Up, No input, Ignore 6 Game

logic Device

Orientation

Portrait, Landscape 2 Game

logic

Interrupts Call, Text, Lock, Minimize, Back, Force stop 6 Interrupt

Network 4G, Wi-Fi 2 Network

Screens Initial loading, World Select, Race Select, Racer Select, Kart Select, Pre

game loading, In game, Post game results, Pause/Quit, Post-game loading,

Options, Setting, About, Store

13 Game

logic

Table 5 shows the number of test cases for Grand Prix game for coverage 1-5. Due to the complexity of

the system and various constraints, ACTS takes around 10 minutes to generate the test set for coverage

5. Also, it could not generate the tests for coverage 6. The reason is that there are factors that have high

number of levels (close to 10 and one factor over 10). The algorithms IPOG-F and IPOG-F2 do not

support for factors over 10 levels. Even though the algorithm IPOG-D is compatible with factors with

higher levels, it does not support constraints. So the number of test cases for coverage 5 is non-

constrained.

Table 5 Number of test cases per coverage for Grand Prix

Coverage

Coverage 1

Coverage 2

Coverage 3

Coverage 4

Coverage 5

No of test

cases 16 193 2196 20675 222,032

As discussed in the methodology section, the number of test cases increase with an increase in coverage.

We present a graph to indicate the difference between the number of test cases with and without coverage

vs. the number of test cases generated for this game. Figure 5 shows that when constraints are

implemented for the game Grand Prix, the number of test cases increase.

24 | P a g e

Figure 5 Coverage Vs Number of test cases for Grand Prix

Table 6 shows the test set for coverage 1.

Table 6 Coverage 1 test set for Grand Prix

Test Case# Worlds Races Racer Kart PowerUp Controls Orientation Network Interrupts Screens

1 Phineas and Ferb Elimination Wander Wasabi Dragster Speed +1 Ignore Landscape Wi-Fi Text World Select Screen

2 Gravity Falls Coin Challenge Rob the Shark Davenport SFC Boost +1 Ignore Portrait 4G Lock/Unlock Race Select Screen

3

Lab Rats

Xtreme Coin Challenge

Agent P

Coolest Coaster

Jump +2

Ignore

Portrait

Wi-Fi

Home key

Racer Select Screen

4 Kickin It Boost Challenge Bruce the Sumo Nomirocket Bus Speed +2 Ignore Portrait 4G Back key Kart Select Screen

5

Mighty Red

Xtreme Boost

Challenge

Phineas

Mystery Shack Cart

Boost +2

Ignore

Landscape

Wi-Fi

Call

Pre-game Loading

Screen

6

Star Wars Rebels Missile Mania

Challenge

Dipper

Sylvia

Jump +3

Left

Landscape

4G

Call

In-game Screen

7 Phineas and Ferb Missile Mania Xtreme Lord Hater Mr.Tank Speed +3 Ignore Landscape 4G Lock/Unlock Pause/Quit game

8

Grand Prix

Race

Randy Cunningham

Speeder Bike

Boost +3

Ignore

Landscape

4G

Call Post-game Results

Screen

9

Kickin It

Coin Challenge

Waddles

Mighty Red 4x4

Jump +1

Ignore

Landscape

Wi-Fi

Home key Post-game Loading

Screen

0

Star Wars Rebels

Missile Mania

Challenge

Steve the Llama

Mr.Tank

Boost +3

Ignore

Landscape

Wi-Fi

Lock/Unlock

Options

11 Kickin It Elimination Ezra Sylvia Speed +3 Ignore Landscape Wi-Fi Home key About/Information

12 Mighty Red Elimination Chopper Coolest Coaster Jump +3 Ignore Portrait Wi-Fi Text Initial Loading Screen

13 Star Wars Rebels Missile Mania Xtreme Dipper Nomirocket Bus Speed +1 Right Landscape Wi-Fi Lock/Unlock In-game Screen

14

Lab Rats Missile Mania

Challenge

Wander

Davenport SFC

Speed +2

Jump

Landscape

4G

Call

In-game Screen

15

Phineas and Ferb

Missile Mania Xtreme

Steve the Llama

Wasabi Dragster

Speed +3 Deploy

Power-Up

Portrait

Wi-Fi

Call

In-game Screen

16

Lab Rats Missile Mania

Challenge

Agent P

Mighty Red 4x4

Boost +2

No input

Landscape

Wi-Fi

Text

In-game Screen

This game has already been through the testing process, so a record of bugs was available. We used the

recorded bugs to map each bug flagged by Workinman to an appropriate test case that would have flagged

the bug. From the record of bugs posted for the Grand Prix game, factor interactions for each bug were

Chart Title

160000

140000

120000

100000

80000

60000

40000

20000

0

1 2 3

Coverage

4 5

No constraints Constraints

Nu

mb

er o

f te

st c

ases

25 | P a g e

identified. The maximum number of interactions in the game that resulted in a bug is determined to be

three. All the bugs were classified based on the number of interactions that caused them. Mapping was

then performed between the bugs and the test sets generated. Table 7 gives an example of a few bugs and

their associated interactions. It shows the factors responsible for the cause of the particular bug.

Table 7 Breakdown of bugs into factor for Grand Prix

Bug Factors Responsible Number

Loading screen lasts longer than 25 seconds Post-game loading screen 1

User is unable to tap on Terms of Use button Information screen 1

Pixelated image appears for 'Bruce the Sumo'

character

Racer select screen, Bruce

the Sumo

2

Result screen loops the jumping animation if

user taps on jump icon at the end race point

Star wars rebels, Jump and

Rob the Shark

3

The first bug Table 7 in "Loading screen lasts longer than 25 seconds" was posted by the testers at

Workinman games for the Grand Prix game. This bug is present at the post-game loading screen of the

game and is caused only due to one factor. Therefore, every generated test case that had 'Initial loading

screen' as a factor level for screens was marked as a bug. The response variable took the value of 1 if a

test case resulted in a bug and 0 otherwise. Similarly, this method was applied to all the other bugs.

Table 8 shows the amount of bugs found for each design: coverage 1, 2, and 3. Close to 50% of the bugs

were covered just from executing the coverage 1 test cases. Approximately 90% of the bugs could be

discovered from running the coverage 3 design. The total number of test cases required to discover 90%

of the bugs is around 2360. If compared to the original 56 million test cases required by running a full

factorial, combinatorial testing discovered 90% of the bugs in mere 0.004% of the test cases that are

obtained from a general factorial design. This demonstrates that combinatorial testing is highly efficient.

Generation of test cases for coverage 4, 5 and 6 was not necessary for this particular game. From the

breakdown of bugs, we identified that none of the bugs were caused by more than 3 factor interactions.

Testing the game for coverage values above 3 is not necessary in this case as we would not be able to map

them to the bugs identified.

Table 8 Grand Prix results

Design No.of issues Percentage Cumulative Percentage

Coverage 1 15 45.45 45.45

Coverage 2 10 30.3 75.75

Coverage 3 5 15.15 90.91

26 | P a g e

Note that there were a total of 33 issues posted by Workinman games. There were 22 bugs that were not

caused by interaction of factors. For instance, "The incorrect version of MoPub ad server is being used".

The MoPub is a mobile ad server through which advertisements are pushed into the game. The functioning

or server calls are not a part of the tester's job. The only thing that a tester can make sure is to check if the

advertisements are being displayed or not.

With coverage 3, we found 90% of the bugs. Three bugs were not found. One of them was "Game crashes

after locking/unlocking the screen and then tapping on the device back button twice". These issues call for

multiple interrupts in one run (lock/unlock and back key in this case). Since our design only consists of one

interrupt factor, this bug could not be tracked. Another bug that could not be tracked in the game was

“Perry the Platypus remains in a static T pose”. The character Perry the Platypus was not present starting

from the versions of the game we were testing. It is not one of the factor levels for our Racer factor and

hence the bug could not be found. The last bug that we could not account for was “Return to main menu

appears if the user taps on the loading screen”. The tapping on the screen action does not come under any

specific category of factors that we decided for this game. It would have to be a whole another factor by

itself. Also, tapping on the screen is arbitrary. Even if we had a factor that mentions the actions, we cannot

be sure of how many levels to add to this factor.

This game has around 30 bugs that were caused due to interaction of factors. As explained earlier, each

bug has been broken down into factor interactions that were causing the bug. By using our method for

finding the faulty interactions, all the bugs posted by Workinman games can be found with just around

80-90 test cases. This accounts for about 3.7% of the total number of test cases.

4.1.2 Games not previously tested

Erator - Game of war

Erator is a turn based card game. The player and the opponent each have chosen their heroes. The goal of

the game is to destroy the opponent’s hero. For every turn, depending on the mode of the game, the player

and the opponent are dealt with a single card or a set of cards which have a predetermined functionality.

Using these cards, the player can attack the opponent’s hero. The opponent’s hero has to be attacked till

his health reaches zero for the player to win.

We have the pre-alpha version of this game. There are many functionalities of the game which do not

function. There are only two levels in the training which can be tested. Playing against another player is

also disabled. Table 9 shows the factors and levels associated with this game.

Table 9 Factors and levels of game Erator

27 | P a g e

Factors Levels Number Type

Game modes

Basic Traninig,Advancements in Face Punching , Heroes and Villians,

Trail by fire, Secrets and Lies, Against AI, Against player, Online

8

Game logic

Heroes Alanran Elmere Cenaturs, Tan Fillian Bards College 2 Game logic

Decks

Overgrowing with love, The bigger they are, Rotten rascals, Another one

bites the dust

4

Game logic

Actions

Lock, Minimize, Pause, Do not follow.

4

Interrupt,

Functionality

The number of general test cases for this game are shown in Table 10.

Table 10 Number of test case per coverage for Erator

Coverage1 Coverage 2 Coverage 3 Coverage 4

General tests 8 32 127 256

Basic Training 31 127 * *

Advanced Training 26 107 * *

For this game, the general testing using the factors mentioned in Table 9 did not result in bugs. Also, there

are many modes of the game which are not accessible yet. In the tutorial section, only basic training and

trial by fire are working. In other game modes, against player and online play have not yet been updated

in the game. Therefore, we decided to identify the bugs by perform in depth testing for the training modes

that were available. Due to the unavailability of all tutorial levels, the functionality of the cards for majority

of the game against AI is unknown. Hence, testing these cards without being aware of their functionality

might not result in accurate results.

For the in-depth testing, we divided the training mode of the game into dialogue boxes and sessions. The

dialogue boxes are the instructions for the player to follow to advance in the tutorial and the sessions are

where the instructions from the dialogue boxes are executed. Here, functionality and interrupts testing can

be done by combining both factor levels under a common name called ‘Actions’. The factors and levels for

were chosen based on the dialogue boxes and sessions. The basic training had 25 dialogue boxes and 5

sessions in between those boxes. The trial by fire training or the advanced training had 21 dialogue boxes

and 6 sessions. The results for this game after performing testing on the available training modes have

been given in Table 11.

Table 11 Coverage Vs Bugs in Erator

Basic

Training

Advanced training

Coverage 1 3 3

Coverage 2 4 5

The bugs in both basic training and the advance training are progression blockers and graphical bugs. The

28 | P a g e

game exhibits this block based on the ‘Do not follow’ action in the in-depth testing for the training modes.

The graphics bug is present throughout the game. At any point by clicking on a card, it disappears of the

screen before it is cast.

Utopia

This is a character based game, where the player has to protect his tower. The tower is attacked by the

computer generated opponents and the game ends when the tower is destroyed. There are roughly two

kinds of AI. A small AI that causes less damage and a big AI that causes high damage. The player is

equipped with the power to shoot the AI. He can also build shooting towers as a special power. There are

three different game modes. This game has been very well developed so we could not find any

functionality or game logic bugs. However, we illustrate the factors and coverage required for testing at

this stage. Table 12 shows the factors and levels for this game.

Table 12 Factors and levels of Utopia

Factors Levels Number Type

Levels Defense demo, Offense 1, Offense 2, Tutorial, Advanced Tutorial, Survival 6 Game logic

Controls Up, Down, Left, Right, Dash, Power 1, Power 2, Power 3 8 Game logic

Resolution

512x384, 640x400, 640x400, 800x600, 1024x768, 1280x600, 1280x720,

1280x768, 1360x768, 1366x768

10

Compatibility

Graphics Fast, Fastest, Simple, Good, Beautiful, Fantastic 6 Compatibility

Interrupts Lock, Minimize 2 Interrupt

Modes 1, 2, 3 3 Game logic

We generated test cases for coverage values 1 through 6. The results can be seen in Table 13.

Table 13 Number of test cases per coverage for Utopia

Coverage 1 Coverage 2 Coverage 3 Coverage 4 Coverage 5 Coverage 6

No of total tests 10 81 514 2889 8640 17280

Compatibility tests 8 43 * * * *

Since the game was very well developed, we could not uncover functionality or game logic bugs. However,

we discovered some bugs related to compatibility. The results of compatibility check have been shown in

Table 14.

Table 14 Coverage Vs Bugs in Utopia

No of bugs

Coverage 1 5

Coverage 2 5

29 | P a g e

A screenshot from the game depicting one of the compatibility issue has been shown in Figure 6.

Figure 6 Compatibility bug in Utopia

All the bugs that were caused in this game are compatibility issues. When the game is set to lower

resolution, the items in the main menu overlap with each other leading to a graphical bug. Also, the icons

lose their functionality. This bug is present throughout the game. Other bugs related to functionality or

interrupts were not present in this game.

4.2 Identifying failure inducing combinations

4.2.1 Notional example with Grand Prix

An example of how our sorting code works in further explained here. We made up an issue to explain the

working of our code. Let us assume, from the Grand Prix game, that the test case shown in Table 15

resulted in a bug. The bug is that the game crashes when Phineas and Ferb world, Elimination race,

Wander racer and Wasabi Dragster cart are combined together. Let us assume that this bug is caused by

the two factor interaction of Wasabi Dragster cart with Elimination race. This crash occurs when the game

is launched. We do not know the combination of factors that caused the crash. But we do know the steps

we have taken while finding this bug. Let us call this test case as the reference test case.

Table 15 Reference test case

30 | P a g e

Since the game crashed with the combination of four factors given above, we assume that it occurred at

the in-game screen. Now we reorder our test cases so that all the factor levels are different leading to the

same in-game screen. This test case can be seen in Table 16. This test case does not result in a bug. So we

can infer that this is not a general bug. This bug occurs only when specific factor levels interact with each

other.

Table 16 Test case with different factor levels

Since there was no bug we move onto the last step in the reference test case. We reorder our test cases

with Wasabi Dragster cart remaining the same and all the other factor levels are different (as shown in

Table 17). This test case also does not result in a bug. We repeat this process for the same world, race and

racer in three different test cases.

Table 17 Test case with same Kart

Table 18 Test case with same Racer

Table 19 Test case with same Race

Table 20 Test case with same World

We now move on to checking the interaction between factors. We keep the Wasabi Dragster cart and

Wander racer same as that in the reference test case. This test case can be seen in Table 21. As this test

case does not result in a bug, we move on the checking the interaction of Wasabi Dragster cart with

31 | P a g e

Elimination race as shown in Table 22. This test case will result in a bug. Since the Wasabi Dragster cart

and Elimination race in the last test case are the same as that in our reference test case, this is our FCC.

Table 21 Test case with same Racer and Kart

Table 22 Test case with same Race and Kart

4.2.2 Example from the game Erator

We now present a practical example for our logic for finding FCCs. Consider the game Erator. Since the

Erator game is not a complicated one, we decided to combine game logic testing with interrupt testing. In

the test cases generated for Erator, a test case that resulted in a bug has been shown in Table 23. This is

our reference test case. A screenshot of that particular bug has been shown in Figure 7.

Table 23 Reference test case

Figure 7 Screenshot of bug from reference test case

The tutorial in this game has been divided in Dialogue boxes (D) and session screens (S). D1 indicates that

we have to execute the test case at dialogue box 1. So to execute the given test case, we have to not follow

the dialogue box 14. When this happens, a progression block occurs as shown in Figure 7. The next

dialogue box points to an empty space where another card should have been.

32 | P a g e

So now we reorder our test cases with the ‘Do not follow’ command being the same and the other factor

level should be different. This new test case is shown in Table 24.

Table 24 Test case with same Action

The above test case shows a combination of dialogue box 25 and the do not follow action. This test case

does not result in a bug. The screenshot from the game has been shown in Figure 8.

Figure 8 Screenshot of test case with same Action

So now we go back to checking if there is a bug with dialogue box 14. We reorder our test cases to have

the same dialogue box 14 but with a different factor level under actions. This test case has been shown in

Table 25.

Table 25 Test case with same dialogue box

The screenshot from the game after executing the above test case can be seen below in Figure 9.

33 | P a g e

Figure 9 Screenshot of test case with same Dialogue box

Accessing the pause menu when the dialogue box 14 is displayed does not result in any kind of bug.

Therefore, we can conclude that the bug has been cause by the combination of dialogue box 14 and the

action ‘Do Not Follow’.

There is a difference between the number of steps required vary for general and specific bugs. For

example, a crash occurring when a call interrupt is made at the loading screen is a general bug where the

game crashes irrespective of the steps taken to approach the loading screen. In this case, our code takes

only 3 steps to find the failure inducing combination in spite of having 2 factors causing the failure. On the

other hand, for a specific bug like a progression block when the race elimination is selected with the

character Ezra takes 4 steps to get discovered. This is because we first check to see if the bug is general

and then proceed to the specifics of the bug.

Compared to the other methods for finding failure inducing combination mentioned in the literature, our

method is comparatively easy. The main advantage is that we only reorder the test cases to find the failure

inducing combination thereby keeping the number of test cases low. The only disadvantage in this case

would be modifying a section of the code after executing every test cases. It may not seem like a lot for

one and two factor interactions, but as the number of factor interactions increases, the number of steps to

discover the failure causing combination also increase resulting in more number of modifications to the

code.

34 | P a g e

5. Discussion

5.1 Conclusion

We used combinatorial testing to establish a framework to improve the testing efficiency of video games.

The methodology section explained the steps necessary to test games using combinatorial testing. First we

decide the factor and categorize them based on the type of testing. Then we decide the number of levels

for each factor. The combinatorial test cases are generated using the ACTS software. These test cases are

then executed in the game to check for bugs. We proposed a methodology to identify the FCC.

In the results section, we showed the implementation of combinatorial testing to games. A comparison of

general testing and combinatorial testing was presented using the game Grand Prix. We applied

combinatorial testing to games that have not been tested and the results were discussed.

The results indicate that our methodology generates far fewer test cases for complete testing of the game. It

is also effective in finding the bugs that were missed by the developers. Our logic for finding FCCs was

capable of finding the faulty interactions without generating additional test cases. Overall, combinatorial

testing can be implemented to improve the current game testing methods.

5.2 Future work

The future work for this thesis includes analyzing to what extent Ad-hoc testing can be accomplished using

combinatorial testing. It is evident that we cannot associate it with a percentage value. But,

implementation of mixed covering arrays or base choice covering arrays can uncover bugs that do not

occur with the traditional combinatorial testing process.

Also, a decision should be made regarding the implementation of combinatorial testing in games with

respect to the stage of development. Identifying how and when to implement combinatorial testing during

the game development should be the research focus in this regard.

Since the implementation of combinatorial testing for untested games was done on games from

independent developers, we could not find many bugs. The implementation of combinatorial testing for a

game that is currently in its development stages can be done to effectively conclude its benefits over other

forms of game testing.

35 | P a g e

Bibliography:

[1] “New Reports Forecast Global Video Game Industry Will Reach $82 Billion By 2017 - Forbes.”

[Online]. Available: http://www.forbes.com/sites/johngaudiosi/2012/07/18/new-reports-forecasts- global-

video-game-industry-will-reach-82-billion-by-2017/. [Accessed: 25-Feb-2015].

[2] B. Schechner, “Getting Started with Software Testing,” pp. 1–9, 2008.

[3] B. Bates, Game Design (2nd Ed.). 2004.

[4] “Halo: The Master Chief Collection review: the library | Polygon.” [Online].

Available: http://www.polygon.com/2014/11/7/7076007/halo-the-master-chief-collection-review-xbox-

one. [Accessed: 27-Apr-2015].

[5] “Sonic Boom speed run takes less than an hour, thanks to a Knuckles glitch | Polygon.” [Online].

Available: http://www.polygon.com/2014/11/12/7211863/sonic-boom-speed-run-takes-less-than-an- hour.

[Accessed: 27-Apr-2015].

[6] “5.1.1. What is experimental design?” [Online].

Available: http://www.itl.nist.gov/div898/handbook/pri/section1/pri11.htm.

[Accessed: 30-Nov-2015].

[7] C. Redavid and a Farid, “An Overview of Game Testing Techniques,” Idt.Mdh.Se.

[8] S. Varvaressos, K. Lavoie, A. B. Massé, S. Gaboury, and S. Hallé, “Automated bug finding in video

games: A case study for runtime monitoring,” Proc. - IEEE 7th Int. Conf. Softw. Testing, Verif.

Validation, ICST 2014, pp. 143–152, 2014.

[9] “NIST Covering Array Tables - What is a covering array?” [Online]. Available:

http://math.nist.gov/coveringarrays/coveringarray.html. [Accessed: 10-Apr-2015].

http://www.forbes.com/sites/johngaudiosi/2012/07/18/new-reports-forecasts-

http://www.polygon.com/2014/11/7/7076007/halo-the-master-chief-collection-review-xbox-one

http://www.polygon.com/2014/11/7/7076007/halo-the-master-chief-collection-review-xbox-one

http://www.polygon.com/2014/11/12/7211863/sonic-boom-speed-run-takes-less-than-an-

http://www.itl.nist.gov/div898/handbook/pri/section1/pri11.htm

http://math.nist.gov/coveringarrays/coveringarray.html

36 | P a g e

[10] R. Kuhn, R. Kacker, Y. Lei, and J. Hunter, “Combinatorial software testing,” Computer (Long.

Beach. Calif)., vol. 42, no. 8, pp. 94–96, 2009.

[11] “Practical Combinatorial Testing.” [Online].

Available: http://csrc.nist.gov/groups/SNS/acts/documents/SP800-142-

101006.pdf. [Accessed: 10-Apr-2015].

[12] D. R. Kuhn and M. J. Reilly, “An investigation of the applicability of design of experiments to

software testing,” 27th Annu. NASA Goddard/IEEE Softw. Eng. Work. 2002. Proceedings., 2002.

[13] R. N. Kacker, D. R. Kuhn, Y. Lei, and J. F. Lawrence, “Combinatorial testing for software: An

adaptation of design of experiments,” Meas. J. Int. Meas. Confed., vol. 46, no. 9, pp. 3745–3752, 2013.

[14] S. K. Khalsa and Y. Labiche, “An orchestrated survey of available algorithms and tools for

Combinatorial Testing,” 2014.

[15] L. Yu, “Constraint Handling In Combinatorial Test Generation Using Forbidden Tuples.”

[16] A. Testing, “Characterizing Failure-Causing Parameter Interactions by Categories and Subject

Descriptors,” ISSTA ’11 Proc. 2011 Int. Symp. Softw. Test. Anal., pp. 331–341, 2011.

[17] Z. Wang, B. Xu, L. Chen, and L. Xu, “Adaptive interaction fault location based on combinatorial

testing,” Proc. - Int. Conf. Qual. Softw., pp. 495–502, 2010.

[18] L. S. G. Ghandehari, Y. Lei, T. Xie, R. Kuhn, and R. Kacker, “Identifying failure-inducing

combinations in a combinatorial test set,” Proc. - IEEE 5th Int. Conf. Softw. Testing, Verif. Validation,

ICST 2012, pp. 370–379, 2012.

[19] T. T. S. Generation, “User Guide for ACTS Core Features,” pp. 1–15.

http://csrc.nist.gov/groups/SNS/acts/documents/SP800-142-101006.pdf

http://csrc.nist.gov/groups/SNS/acts/documents/SP800-142-101006.pdf

37 | P a g e

[20] Y. Lei, R. Kacker, D. R. Kuhn, V. Okun, and J. Lawrence, “IPOG: A general strategy for

T-way software testing,” Proc. Int. Symp. Work. Eng. Comput. Based Syst., pp. 549–556, 2007.

Appendix I:

The code for FCC is given

below. import csv

def getRow(file_name,row_number):

with open(file_name, 'rb') as f:

mycsv =

csv.reader(f) mycsv

= list(mycsv)

text = mycsv[row_number]

f.close()

return text

def search(file_name,reference_test_case,reference_test_line):

result = []

with open(file_name, 'rb') as f:

mycsv = csv.reader(f)

mycsv = list(mycsv)

for i in range(reference_test_line+1,len(mycsv)):

if mycsv[i][0] != reference_test_case[0] and mycsv[i][1] != reference_test_case[1] and

mycsv[i][2] != reference_test_case[2] and mycsv[i][3] != reference_test_case[3] and mycsv[i][4]

!=

reference_test_case[4] and mycsv[i][5]

!= reference_test_case[5] and mycsv[i][6]

!=

reference_test_case[6] and mycsv[i][7]

!= reference_test_case[7] and mycsv[i][8]

!=

reference_test_case[8] and mycsv[i][9] == reference_test_case[9] :

f.close()

result = swap_row(mycsv,i,reference_test_line+1)

writeCSV(file_name,result)

def swap_row(mycsv,first_row,second_row):

tmp = mycsv[first_row]

38 | P a g e

mycsv[first_row] =

mycsv[second_row]

mycsv[second_row] = tmp

return mycsv

def writeCSV(file_name,result):

with open(file_name,'w') as f:

for i in range(0,len(result)):

for j in range(0,len(result[i])):

f.write(result[i][j]+","

)

f.write("\n")

file_name = 'expt.csv'

reference_test_case = getRow(file_name,28)

search(file_name,reference_test_case,28)

Experimental Design in Game Testing

Documents

Transcript of Experimental Design in Game Testing