Solving the Times T2 Crossword Robert Yaw BSc … · Solving the Times T2 Crossword Robert Yaw BSc...

66
Solving the Times T2 Crossword Robert Yaw BSc Computer Science 2004/2005 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others. I understand that failure to attribute material which is obtained from another source may be considered as plagiarism. (Signature of student)

Transcript of Solving the Times T2 Crossword Robert Yaw BSc … · Solving the Times T2 Crossword Robert Yaw BSc...

Solving the Times T2 Crossword

Robert Yaw

BSc Computer Science

2004/2005

The candidate confirms that the work submitted is their own and the appropriate credit has been given

where reference has been made to the work of others.

I understand that failure to attribute material which is obtained from another source may be considered

as plagiarism.

(Signature of student)

Summary

This aim of this project was to create a piece of software that attempted to automatically solve

the Times T2 crossword. The software incorporated web search technologies and elements of Natural

Language Processing in order to obtain answers to clues.

On average, the solver managed to correctly find answers to 33% of clues given to it. Whilst this

compares poorly to existing systems (and especially to unassisted human performance), this report high-

lights in its conclusion some of the failings of the current system, and how these can be overcome in

order to improve the crossword solver’s performance.

i

Acknowledgements

Thanks go to the following people:

• Tony Jenkins, for his support and guidance throughout the project,

• Katja Markert, for her invaluable guidance given in the Mid-Project Report,

• Nobuo Tamemasa, whose originalMThumbSlider class1 was taken and customised for use in

the crossword solver,

• Everyone who helped in the crossword survey,

• My parents Lynne and Peter, who have supported me throughout the last three years!

1Downloaded fromhttp://physci.org/codes/tame/index.jsp/

ii

Contents

1 Introduction 1

1.1 The Problem Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Minimum Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.6 Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6.1 Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6.2 Revised Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.6.3 Revised Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.7 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background Reading 7

2.1 Crossword History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Crossword Grid Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Other Crossword Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4 The Times T2 Crossword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Clue Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5.1 Definition Clues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5.2 Knowledge Clues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5.3 Other Clue Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5.4 Clue Type Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

iii

3 Existing Crossword Solvers 14

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 PROVERB and One Across . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Jumble and Crossword Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Crossword Maestro for Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 Testing of Existing Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Answer Acquisition Technologies 20

4.1 Google Web APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Question Answering and RASP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 Word Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3.1 WordNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3.2 Moby Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3.3 CELEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.3.4 Word List Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4 Brute Forcing and Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Methodology and Design 27

5.1 Development Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.2 Requirements Gathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2.1 Crossword Puzzle Representation . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2.2 GUI Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.3 Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.3.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.3.2 Perl and Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.4 UML and Class Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.4.1 CrosswordGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.4.2 Clue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.4.3 Answer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

iv

5.4.4 CrosswordGridGUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.4.5 RASPService . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.4.6 Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.4.6.1 MobyListSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.4.6.2 WordNetSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.4.6.3 GoogleService . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.4.7 Documentation of Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.5 Development Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.6 The Solving Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Development and Testing 37

6.1 Test Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.2.1 Solving Process Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.2.2 Entry of Crossword Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7 Evaluation 43

7.1 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7.3 Minimum Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7.4 Comparison with Existing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7.5 Comparison with Human Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

8 Conclusion 48

8.1 Failings of the Solving Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

8.2 Suggested Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Bibliography 51

9 Appendix A 53

9.1 Project Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

v

10 Appendix B 55

10.1 Results of Human Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

11 Appendix C - Gantt Chart Schedule 59

vi

Chapter 1

Introduction

1.1 The Problem Domain

The crossword is one of the most popular word games in the world. The aim of the game is simple: to

answer as many of the crossword’s clues as possible, and consequently fill in the crossword grid. The

Times T2 Crossword was first published in 1993 and as of the start of 2005, has had over 3400 editions

published, six times a week for the last eleven years.

The majority of clues in the T2 crossword are definition clues. These are mostly comprised of no

more than four words, or in some instances, many single words where the answer fits each word, usually

in a different context. The remainder of the clues are general-knowledge based, and therefore cannot be

solved simply by looking up possible answers in a dictionary. The difficulty of the T2 crossword lies in

the ambiguous nature of these clues. For example, some of the clues have more than one possible answer,

given only the answer length. For example: “River of NE England”(4) (3427: 23A) could have been

eitherTYNE , TEES or WEAR , to name three perfectly acceptable alternatives - until other answers

have been found, the human solver has no inclination as to which alternative is the correct answer (the

answer, it turns out, wasTYNE ). Other answers to clues may not be as immediately obvious, due to

the clue’s minimal context. An example of this is in the clue “Common; inexperienced”(5) (3415:

9A). Taking the first part of the clue, the word “common”, in modern English, is used primarily in the

everyday language as an adjective; example phrases such as, “he is common”, or “that is a common

1

occurrence these days” illustrate this point. However, in this particular instance, the word “common” is

being used as a noun, to describe an area of parkland or greenery. Indeed, it is the wordGREEN that is

the correct answer. The second part of the clue confirms this, asGREEN is a fitting, yet once-again not

oft-used synonym for the the word “inexperienced”.

Traditionally, human solvers have had to rely on printed dictionaries and thesauruses to find answers

to those clues they could not solve by themselves. In more recent times Internet search engines such

as Google and Ask have provided additional means to acquiring answers. However, there are still few

freely-available computerised systems that allow users to retrieve answers to specific crossword clues.

Those that do exist are tailored to solve American crosswords or do not allow users to enter clues into

their system.

By utilising developments in computing power and Internet connection speeds, along with research

done in the field of Natural Language Processing, this project aims to develop a system for solving the

Times T2 crossword.

1.2 Project Aim

To develop a piece of software which, given any edition of the Times T2 crossword (comprising of its

grid and corresponding clues), can automatically give correct answers to the clues, and thus solve the

crossword.

1.3 Objectives

The objectives of the project are as follows:

• Research the history of crosswords, and understand the difficulties face by human solvers in com-

pleting crosswords,

• Evaluate existing crossword solvers, noting any features used by such solvers that could be incor-

porated into a final solution,

• Research possible technologies used to acquire answers to clues, which could be incorporated into

a final solution,

2

• Evaluate software development methodologies and programming languages which could be used

to develop the system, giving reasoning for final choices,

• Develop a software solution which incorporates at least all the minimum requirements stated in

this report, and

• Test the developed system using various editions of the Times T2 crossword, against a number of

evaluation criteria.

1.4 Minimum Requirements

The minimum requirements of the project are:

• To develop a system which can return a list of possible answers given a clue and the length of its

answer.

• To develop a method of inputting a crossword grid into an application.

• To develop a method of inputting a crossword clue into an application, such that it relates to a

specific location on a crossword grid.

Possible extensions to the project include:

• To allow for the loading and saving of a crossword grid, with an optional clue set.

• To develop a GUI for the software.

• To provide a means of informing the end user of the solving process for both individual clues and

the grid as a whole.

• To allow users to manually enter answers to clues they have already solved.

1.5 Deliverables

The project deliverables are:

• A piece of software which can attempt to solve the Times T2 crossword.

• A final report.

3

1.6 Schedule

The original project schedule was drawn up as a Gantt chart, and has been included in Appendix C. The

overview of the original schedule was as follows:

01/11/2004 - 12/11/2005 Research The Times T2 crossword.

01/11/2004 - 19/11/2005 Research existing crossword solvers, and if possible, test them

against the Times T2 crossword.

08/11/2004 - 03/12/2005 Research available Internet resources and available word lists that

could be incorporated into a final solution.

15/11/2004 - 03/12/2005 Look into possible programming languages to be used to code a final

solution.

22/10/2004 - 03/12/2005 Make final choices regarding programming languages and technolo-

gies to use.

29/11/2004 - 10/12/2005 Mid Project Report

13/12/2004 - 14/01/2005 HOLIDAY AND EXAM PERIODS

06/12/2004 - 21/01/2005 Complete evaluation criteria

15/12/2005 - 18/03/2005 Software development and testing

21/02/2005 - 11/03/2005 Submission of draft chapter and table of contents

07/03/2005 - 08/04/2005 Evaluation of software product

21/03/2005 - 08/04/2005 Project reflection

1.6.1 Milestones

10/12/04 Mid-Project Report

28/01/05 Research complete

11/03/05 Draft Chapter and Table of Contents

25/03/05 Software development and testing complete

08/04/05 Evaluation complete

27/04/05 Final Report

1.6.2 Revised Schedule

Following feedback from the mid-project report, the schedule was heavily revised due to the need to in-

corporate research from the field of natural language processing into the project. Other critical stages of

4

the project were also missing from the original schedule and have been included in the revised schedule,

which is detailed below, and commenced from the 24th January 2005:

06/12/2004 - 28/01/2005 Compile evaluation criteria

24/01/2005 - 25/02/2005 Research NLP technologies to be incorporated into the final solution

14/02/2005 - 25/02/2005 Research software development methodologies

31/01/2005 - 01/04/2005 Development of software components

21/02/2005 - 11/03/2005 Submit table of contents and draft chapter

28/02/2005 - 08/04/2005 Software testing and additional improvements

28/03/2005 - 15/04/2005 Software evaluation

04/04/2005 - 22/04/2005 Project evaluation

04/04/2005 - 25/04/2005 Project write-up

1.6.3 Revised Milestones

The milestones of the project were revised to reflect the changes to the schedule, and are listed below.

10/12/04 Mid-Project Report

25/02/05 Research into NLP and methodologies complete

11/03/05 Draft Chapter and Table of Contents

15/04/05 Software development and testing complete

22/04/05 Evaluation complete

27/04/05 Final Report

1.7 Evaluation Criteria

The success of the project will be judged at the end of this report in the evaluation. To ensure this could

be done successfully, a set of evaluation criteria were drawn up. These criteria are as follows:

1. How successfully does the software solve the problem, and thus achieve the project’s aim?

2. Have the objectives of the project been fully realised?

3. Have the minimum requirements been met and exceeded?

4. How does the system compare to existing solutions?

5. How does the system compare to unassisted human performance?

5

The first criterion was based on results gained in testing the software developed as part of this project.

The second compared the objectives stated in this chapter with the findings in the report. The third

judged how well the project completed its stated minimum requirements, and how these were expanded

upon. The last two criteria were designed to judge the success of the system, relative to other existing

crossword solvers and human performance respectively.

6

Chapter 2

Background Reading

This chapter will detail how the modern crossword came into being, and what trends modern crosswords

take. The Times T2 crossword will then be defined in terms of these trends. The remainder of this chapter

will define the clue types found within the T2 crossword.

2.1 A History of the Crossword

The origins of the modern crossword can be traced back to 19th century Britain, where they were found

in children’s books and periodicals. Simple grids, like the one shown in Figure 2.1 were used, with the

answers reading the same horizontally as they did vertically.

C I R C L E

I C A R U S

R A R E S T

C R E A T E

L U S T R E

E S T E E M

Figure 2.1: An example of an early form of crossword, taken fromGodey’s Lady’s Book and Magazine

(1862): Note how the answers are identical both across and down.

7

The first modern crossword was devised by British-born journalist Arthur Wynne, and was published

in the New York World on the 21st December 1913, the grid for which can be seen in Figure 2.2.

Figure 2.2: The grid from Arthur Wynne’s first modern-day crossword.

The popularity of the crossword increased from this point onwards, eventually catching on in Britain

when Pearson’s Magazine published a crossword in February 1922. The first Times cryptic crossword

followed in 1930. Today, crosswords in Britain can be found in all national daily newspapers, as well as

in many magazines and local newspapers.

Typical modern crosswords consist of two parts - the grid, which comprises of white answer squares

and (usually) black ‘blank’ squares, and the crossword clues themselves, where each clue’s answer is a

word or number of words that fit into a specific position on the grid, either ‘across’ (going along a row

of the grid) or ‘down’ (going down a column of the grid). The answer squares themselves can be either

‘checked’ or ‘unchecked’ - checked squares are those whose letter intersects with two answers (that is,

an across clue and a down clue), and unchecked letters are those that only intersect with one. As answers

are placed into the grid, some letters of other answers are revealed via the checked squares, helping the

solver to determine the answer to these other clues.

8

2.2 Crossword Grid Types

In the English-speaking world, there are two main types of crossword grid. North American style grids

are tightly packed, where many, if not all, the letter squares are checked. Blank squares are used as infre-

quently as possible. In these grids, 180o (sometimes even 90o) rotational symmetry is commonplace; for

this, grid sizes are usually odd-numbered squares [9]. British (or cryptic) style grids on the other hand

look quite different, with more blank squares and fewer checked letters, resulting in relatively fewer

clues. The grid may have rotational symmetry; however, this may depend on what will be referred to

in this report as the “house rules” of a particular crossword. For example, the two “house rules” of the

British-style T2 crossword grid are:

• That the grid is always thirteen squares by thirteen squares in size, and

• That the grid has 180o rotational symmetry.

2.3 Other Crossword Styles

It should be noted that, in the UK at least, there are other word puzzles that though of as derivatives of

the standard crossword; a few examples include:

• “barred crosswords” - thick black lines, as opposed to squares, separate the clues, the result being

that every square on the grid is white.

• “arrow words” - the clues are contained within squares on the grid; this crossword variant is

particularly popular in puzzle magazines in the UK.

• “Codebreakers”, where each white square is given a number between 1 and 26; the number rep-

resents a unique letter of the alphabet; these letters are discovered using a knowledge of the fre-

quency of letters in the English language, and through a process of elimination.

2.4 The Times T2 Crossword

As the focus of this project was on solving the Times T2 crossword, it was important to gain an under-

standing of the clue types which appear within it. The Times T2 grid, like most published in the United

9

Kingdom, is in the British style and comprises of thirteen rows by thirteen columns, with 180o rotational

symmetry. An example of a completed Times T2 crossword grid is shown in Figure 2.3.

Figure 2.3: The grid of Times T2 crossword #3415

The grid in Figure 2.3 will be one of three examples of the Times T2 crossword used throughout

this project to test existing crossword solvers, to judge the performance of human solvers in attempting

the Times T2 crossword, and to test the software developed through the course of this project during

its development. Throughout this report, the clues of the Times T2 crossword will be referred to in the

following format:

“Clue text” (answer-length) -ANSWER (Crossword edition: clue number)

The format above is for a reference to a clue and number of letters in the clue’s answer, with the

actual answer, together with the crossword and clue number from which it was taken.

2.5 Clue Types

The Times T2 crossword has two main clue types; definition clues and knowledge clues. It was not

unrealistic to assume that both clue types will require different forms of processing in order to obtain

10

possible answers. In the case of the human solver, trying to obtain the answer to a definition clue, they

may resort to using a thesaurus to look up synonyms of the clue. However, for a knowledge clue, a

web search may prove to be more useful to them in finding the correct answer. Therefore, an automatic

crossword solver, given any clue from a T2 crossword, needs to have the ability to determine what type

that clue belongs to.

2.5.1 Definition Clues

The most basic form of definition clues can be categorised as “synonym” clues, as answers to these are

a synonym of the clue word. Two examples of this clue are, “Select”(6) -CHOOSE (3414: 1D), and

“Blissful”(7) - IDYLLIC (3419: 3D). As synonym clues are always one word in length, no consideration

of the word’s type and (to some extent) context would be required by an automated solver, and could be

instantly labelled as such.

Definition clues which could prove harder for a computer system to identify as such are those that

are made up of multiple words. These were identified as “multiple-word definition” clues. Taking as an

example, “Ran away”(4) -FLED (3415: 17A), although a human can easily determine that the clue is

clearly a definition of the answer, a computer system would have to perform some kind of word overlap

technique to check the answer against the clue.

In some instances, synonym and multiple-word definition clues were made more complex by a con-

text specified in the clue. Fortunately, context in the Times T2 can be easily identified due to being

italicised text that is parenthesised, and placed at the end of the clue (or section of clue). For example,

“Sleeping (archaic)”(4) - ABED (3415: 19D). The number of different contexts used in the T2 cross-

word is limited to word classes, such as acronyms and colloquialisms, and a few European languages.

2.5.2 Knowledge Clues

Although knowledge clues usually have more words in the actual text of the clue than definition-type

clues, they can be much harder to determine as such. For example, “Respectful of opinions different

from one’s own”(7) -LIBERAL (3415: 21A) contains seven words, but the clue itself is essentially a

definition of the answer. Despite this, there are some give-away signs of a knowledge clue:

• Missing words in the clue: for example: “Yehudi -, violinist”(7) -MENUHIN (3410: 14A).

The answer will always be the missing word, and will come immediately after a “target word” in

11

the clue, that is, the word in the clue immediately preceding the blank, or the word immediately

succeeding the blank, depending on the clue’s structure. By using this “target word”, and ensuring

the potential answer fits the answer pattern, it should be possible to return a highly confident result

from these clue types.

• A proper noun within the clue’s text is usually a strong indicator of it being a knowledge clue, as

the answer more often than not will relate to that noun. For example, “Mountain where Delphi

is located”(9) -PARNASSUS(3427: 5D). The difficulty in using this approach in an automated

system would be determining which words if any are proper nouns. The rules of the English

language state that a proper noun is always capitalised. It is not enough to simply look for a

capitalised word in a given T2 clue, as the first word in all the Times T2 clues are capitalised.

Therefore, one of the checks a working solution may need to do is to determine the class of the

first word of a clue (if necessary).

• Italicised phrases or words in a clue are once again used, though this time to indicate the title of

a work, such as a book or a television programme. The answer will relate to the work mentioned.

For example, “A Study in Scarletauthor” (5,5) -CONAN DOYLE (3415: 22A); Arthur Conan

Doyle being the author of the book “A Study in Scarlet”.

In all cases, having knowledge of some of the letters of an answer limits the number of potential

answers, giving higher confidence for any words that still match the answer pattern.

2.5.3 Other Clue Types

One extension from these basic categories are clues which consist of multiple parts, delimited by semi-

colons (or in some cases, commas). In one sense, an automated solver could have greater confidence

finding the correct answer here as potential answers found using one part of the clue can be cross-

checked with those of another. An example of this clue type is “Rod; shaft; European”(4) -POLE

(3414: 8A).

A further complication in clue categorisation, though one that occurs very rarely (as shown later in

Figure 2.4), is that the text of a clue may be dependent on another clue’s answer. For example, “East

14 Dn town”(5) (3413: 17A) cannot be solved without first knowing the answer to 14 Down, “English

breed of chicken”(6) -SUSSEX(3413: 14D). Substitution this answer into 17A gives “East SUSSEX

town”(5), and this in turn gives the correct answer ofLEWES.

12

2.5.4 Clue Type Frequencies

Different clue types will undoubtedly require different processes in order to retrieve an answer or set

of potential answers. To help establish which clue types form the majority of clues in the Times T2

crossword, a frequency count of clues types was acquired using Times T2 Crosswords #1 - #10 found in

[2]. For multi-part clues, each part of the clue was categorised independently.

Clue Type Times Found % of Matches

Synonym 89 0.301

Multiple-Word Definition 116 0.392

Context-Dependent 19 0.064

ALL DEFINITION CLUES 224 0.757

Missing Word 5 0.017

Proper Noun answers 21 0.071

Other noun answers 45 0.152

Answer-dependent clues 1 0.003

ALL KNOWLEDGE CLUES 72 0.243

Figure 2.4: The frequencies for the clue categories of 296 clues or part-clues, taken from Crosswords

#1-#10 ofThe Times T2 Crossword Book 7(2004)

As can be seen from Figure 2.4, the overwhelming majority of clues fit into one of the two main

definition clue categories. These, together with the noun answer clues, make up over 90% of the clues

studied, and therefore are the two clue types which the final solution needs to have the most developed

and accurate search methods for. The category of “Missing Word”, although making up less than 2% of

clues studied, is arguably the one category for which the most accurate answers can be be retrieved, and

therefore if such clues are present within a given crossword, would make for the most logical place to

begin solving that crossword; consequently, it is logical to also develop a robust method for accurately

solving clues of this type.

13

Chapter 3

Existing Crossword Solvers

3.1 Overview

For such a popular pastime, it may come as a surprise that there are few existing applications that can

give out answers to entered crossword clues. The more popular ones that were available are detailed

below. Each was tested using three editions of the Times T2 crossword, with the results compared to a

single human solver to get an estimated of how well these systems performed.

3.2 PROVERB and One Across

The PROVERB system was developed as part of research carried out in 2002 by Littman et al[10].

The system was designed to automatically solved editions of the New York Times crossword. It uses

a probabilistic model that generates for each clue a “weighted candidate [answer] word list”, generated

from one or more “expert modules”. Over a number of iterations, answers are inserted into the grid, as

other candidate answers are ruled out. The system achieved a success rate of 95.3%.

One Across[20] is the on-line crossword clue solver system that stemmed from the research of

Littman et al[10]. The site allows a user to input a clue, together with the number of letters in the clue’s

answer. An unknown letter is marked using the ‘?’ character. Known letters can be given one of two

confidence ratings indicated by the letter’s case - lower-case letters for a 60% confidence, and upper-case

letters for the higher 95% confidence. Once submitted, the site returns a list of possible answers; these

14

are sorted by the system’s own confidence rating, which ranges from **** (very confident) down to *

(not so confident). A example of these results is shown in Figure 3.1.

The system works in a straightforward manner. All the user has to do is enter their clue and answer

pattern, and press “Go!” to retrieve the set of answers. However in practise, these answers are not

always accurate; sometimes the system returns words which clearly do not fit the user’s input. As the

PROVERB system was designed to solve American-style crossword grid, the implication is that that it

has a greater dependency on checked letters being present in answers (indeed, the example grid cited in

[10] has every white square checked). To the system’s credit, although it was developed with American

crosswords in mind, it does return both American and Anglicised spellings when such a difference oc-

curs.

Figure 3.1: The One Across system

3.3 Jumble and Crossword Solver

Similar to One Across in concept, the Jumbo and Crossword Solver (JCS) is an on-line answer retrieval

system[6]. Users enter a pattern (‘?’ for blanks, letters if known otherwise), selects whether the pattern is

an anagram or a standard crossword answer pattern, then submits this information to obtain results. The

15

system’s main strength is the depth of its word list; it claims to have “over 300,000 words” in its database;

indeed, based on some of the patterns entered, its word list does appear to be more comprehensive than

that of One Across. Unfortunately, there is no way to enter a clue into the system to help narrow the

search further.

3.4 Crossword Maestro for Windows

This is a commercial package that claims to be “world’s first expert system for solving cryptic and

non-cryptic crosswords”. From both the descriptions on the website[11], the system’s most prominent

feature is the way it tells the user, in natural language, the processes it goes through to obtain its answer,

as can be seen in Figure 3.2. Unfortunately, due to its high retail value ($79.99), and the lack of a

freely-available demonstration version, the system was unable to be tested.

Figure 3.2: Screenshot of the Crossword Maestro system.

3.5 Testing of Existing Solutions

To establish the success rate of the available existing computer-based solvers (CBSs), each was tested

out, using three editions of the Times T2 crossword. The intention of these tests was to find out how

these automated solvers compare to a human solver. As One Across was the only readily-available

system that could take both a given crossword clue and it’s answer pattern, it was used as the foundation

for the tests. The basic algorithm for the test was as follows:

16

1. Number of newly inserted answersN = 0.

2. For each unsolved across clue, input the clue and pattern into the One Across system.

3. If one answerA appears with a confidence rating of *** or ****, and all other alternate answers

have rating of ** or less, insertA into the grid; incrementN by 1.

4. If all answers have scores given ** or less, do not insert anything into the grid.

5. Repeat steps 2 to 4 for all unsolved down clues.

6. If N > 0, go to step 1; else we have either solved the crossword, or the system cannot find any

more answers. (Other Solvers Only) If|N|= 0, go to Step 7.

7. (Other Solvers Only) If there exists an incomplete answerAi = (l1, ..., ln) (wheren is the number

of letters in the answer) within the crossword gridG such that at least 1l in A is a letter, input

A into the solver. LetS be the set of possible answers returned by the solver. If|S| = 1 then let

A = S1, else leaveA unchanged.

8. (Other Solvers Only) Repeat Step 7 until|Ai |= 0, or if |Ai |> 0,∀Ai , |S|> 1.

Steps 1-6 were used for the One Across system only. Steps 3 and 4 ensured that only the most

confident, most probably correct answers were inserted into the grid. Across clues were solved first,

then down clues afterwards, in each iteration to maximise the use of checked letters.

For all other solving systems used in conjunction with One Across, steps 7 and 8 were additional.

Step 7 ensures that totally incomplete answers are not searched, as this would return all words of that

answer’s length; as there would undoubtedly be thousands of alternative matches returned for words

between three and thirteen letters in length, the task of determining the correct answer would be nigh on

impossible. Step 8 ensures that once the crossword has been completed, or the crossword cannot find

anymore confident solutions, the algorithm terminates.

As well as testing the CBSs, the author attempted to solve each of the three crosswords, to get a

rough estimate of how the CBSs compared to a human solver. To ensure a fair comparison could be

made with the CBSs, the author’s attempts to solve the crossword were done before testing the CBSs to

eliminate the possibility of answers being recalled from memory. No dictionaries, thesauruses or other

sources of help were used during the author’s tests.

17

Solver Used T2 Crossword Total Correct % of Correct

Clues Letters Clues Letters Clues Letters

One Across 3415 24 118 8 40 0.333 0.339

T2 Book 7 #1 28 118 12 59 0.428 0.500

T2 Book 7 #2 22 108 10 56 0.455 0.519

AVERAGE 0.405 0.453

One Across & J&CS 3415 24 118 8 40 0.333 0.339

T2 Book 7 #1 28 118 12 59 0.428 0.500

T2 Book 7 #2 22 108 10 56 0.455 0.519

AVERAGE 0.405 0.453

Human Solver 3415 24 118 16 91 0.667 0.771

T2 Book 7 #1 28 118 17 78 0.607 0.661

T2 Book 7 #2 22 108 13 68 0.591 0.630

AVERAGE 0.622 0.687

Figure 3.3: CBS and human solver testing results.

As can be seen from Figure 3.3, the performance of the CBSs was reasonably consistent, though

comparatively poor. One Across, on average, managed to find answers to only 41% of the clues fed into

it, with a success range of just over 12%. Combining this system with the Jumbo and Crossword Solver

did nothing to improve these results. Compared to the human solver, the tested systems did not compare

favourably, with an average of 62.2% of clues solved.

One clue/answer pair from the crosswords above and two clue/answer pairs from two other editions

of the the T2 crossword have been selected to highlight some of the downfalls of the One Across system:

“Baghdad its capital”(4) -IRAQ (T2 Book 7, #1: 1A)

Searching with any number of checked letters did not bring up the answerIRAQ at all. However,

if the word “Baghdad” was altered to the incorrect spelling of “Bagdad”, the answer was returned with

maximum confidence. This was a typographical error within the One Across system.

18

“Buffet with roasts”(7) -CARVERY (3416: 17D)

Searching forCARVERY in the Oxford English Dictionary (OED) [18] returned up the following

definition, which closely fits with the clue: “c. A buffet or restaurant where meat is carved from a joint

as required”. However, searching within an American English dictionary such as the on-line implemen-

tation of the American Heritage Dictionary of the English Language [15] gave no result for the word.

A check on One Across reveals that “carvery” is not in their repertoire of words either, rendering this

system alone useless for solving the Times T2 crossword.

“Greek dish of yoghurt with cucumber and garlic”(8) -TZATZIKI (3414: 5D)

Searching forTZATZIKI using the OED did return a result: “Greek dish consisting of yoghurt with

chopped cucumber, garlic, and (sometimes) mint, esp. as an hors d’oeuvre or dip”; Although not a word

of English etymology, the word has been used within the English-speaking world, and therefore finds

itself included in the OED. The AHD did not return any results forTZATZIKI . One Across, however,

did return the word, which may be an indication that the word has appeared in other crosswords before.

These case studies illustrated the fact that a final solution to the problem needed to use highly com-

prehensive word lists containing British English spellings to look up answers from.

3.6 Conclusions

The existing crossword solvers contain a number of features that were considered desirable to have

in this project’s own crossword solver. One Across uses a confidence system to rate its answers to

clues. A similar feature could be used to judge answers in the crossword solver. Crossword Maestro’s

GUI allowed users to see both the grid and the solving process simultaneously. These features were

considered as extensions to the minimum requirements of the project.

19

Chapter 4

Answer Acquisition Technologies

The clues of the Times T2 crossword were categorised in Chapter 2, where it was also established that

for each type of clue, different processes were required to come up with candidate answers. This chapter

will assess some of the possible answer acquisition technologies which were considered to be used in

the crossword solver. The chapter concludes with final choices made regarding which technologies will

be used in the final solution.

4.1 Google Web APIs

Google, one of the world’s widely known and widely used Web search engines, developed a set of APIs,

whereby software engineers can incorporate the search facilities of the Google website within their own

programs. In the context of knowledge clues, using a search engine was considered to be one of the

more successful ways to find an answer to a clue. Although search engines such as Google cannot be

fed any information regarding answer length, a human solver could quickly detect which word or words

are likely to be the answer. Using the example of “Capital of Queensland”(8) -BRISBANE (3418: 1D),

the following snippets are from the first three results given by Google, with the search term “Capital of

Queensland”:

• “... cootha’s travel reports. Brisbane - Capital of Queensland, Australia. 8 votes. Brisbane is a

beautiful river city, and is the capital city of Queensland. ...”

20

• “Brisbane Australia. South East Queensland. Brisbane - Capital of Queensland. Brisbane, the

sub-tropical capital of Queensland, is approximately ...”

• “... technology industries. My Government has embarked on three key strategies to attract more

venture capital to Queensland: ? Identifying ...”

From these three snippets alone, if only the eight-letter words were taken, the word “Brisbane”

appears 5 times, and “embarked” only once. Furthermore, the word “Brisbane” is easy to determine

as a proper noun (as its first letter was capitalised on all appearances within the snippet), gave greater

confidence that this was indeed the correct answer to the clue.

The Google Web API also allows developers to use all the search modifiers found on the Google

website. One of these modifiers is the ability pass whole phrases to the search engine, done by literally

quoting sections of the search term using the quotation (“ and ”) characters. By applying this to a clue it

was possible to narrow the web search with the intention of finding results relevant to the clue. Using as

a example the clue “Russian composer (Prince Igor)”(7) (3432: 6D), Figure 4.1 shows that by passing

the phrase “Prince Igor” to the search query, the correct answer ofBORODIN appears with a higher

frequency and proportion.

Russian composer (Prince Igor) (7) Russian composer (”Prince Igor”) (7)

Word Times Found % of Matches Word Times Found % of Matches

include 2 14.29 Borodin 4 40

Elibron 1 7.14 Steppes 1 10

edition 1 7.14 Central 1 10

Borodin 3 21.43 pianist 1 10

Steppes 2 14.29 ancient 1 10

Central 2 14.29 already 1 10

unusual 1 7.14 MEMOIRS 1 10

already 1 7.14

MEMOIRS 1 7.14

Figure 4.1: A comparison of search engine results on the same clue, one without the use of phrasing and

one with.

21

There are a number of disadvantages with relying on the Google Web API within a piece of soft-

ware. The most obvious of these is the need to be connected to the Internet. For users who only have

narrowband connections, the solving process will appear to be much slower as the retrieval of results

will take more time to complete. Secondly, the current version of the API only returns the first ten results

of a search; the upshot of this is that searches done through the Google will need to be as specific as

possible in order to retrieve only the most relevant results. Finally, the current Web API limits searches

per user to one thousand per day.

4.2 Question Answering and the RASP system

Question Answering (QA) is a natural language processing (NLP) technique whose aim is to give an-

swers to questions phrased in natural language. Applied to knowledge clues, the QA problem becomes

one of finding an answer of pre-determined length to the clue given. It needs to be noted that none of the

Times T2 knowledge clues are ever posed as questions - they never ask, for example “who was X?”, or

“what are Y?” or “where is Z?”. However, for a system to find answers to clues, some of the processes

applied to input text strings in QA can be applied here.

For successful question answering, there is usually a reliance on a vast data store from which to

retrieve facts. There has been much research into using the World Wide Web as this data store, and

is usually done via a process of formatting the question into a suitable query for the search engine,

performing the query and retrieving the results, and then these results used to extract the correct answer.

Radev et al[19] use the idea of “Query Modulation” to turn a question into a search term, using supported

operators such as OR and the use of phrase delimiter.

For pre-processing data, the Robust Accurate Statistical Parsing (RASP)[1] system may be a very

useful tool. RASP can process a given body of text using the following steps:

1. Tokenisation - the process of separating words from punctuation

2. Part of Speech Tagging- individual words are categorised by their type. RASP recognises that

some words can having different meaning depending on the context in which they are used, and

accommodates for this by assigning probabilities to each type.

3. Lemmatization - words are deconstructed into their individual parts. For example, the word

“unenviable” would be broken down into “un + envy + able”.

22

4. Parsing - this final step constructs a parse tree based on information form the previous steps.

The resulting parse tree generated will require post processing in order to strip away the encoding,

in order to leave the desired end result of phrase groups. These can then be passed into a Google Web

API search or wordlist, with the aim of improving the probability of finding the correct answer from the

results. The main RASP program,runall.sh , is a shell script, therefore can be integrated into most

programming languages easily.

4.3 Word Lists

4.3.1 WordNet

WordNet is an “on-line lexical reference system whose design is inspired by current psycholinguistic

theories of human lexical memory”[14]. It has been in development since 1985, and was first published

in 1990. The most basic use of WordNet come through the WordNet browser, which works in a similar

manner to an online dictionary such as that of the OED[18]. However, its power lies in the way it labels

words and groups them together. It uses the idea of synonym sets, or “synsets” to group words.[14].

The WordNet data files themselves are organised by word types, and are plain text files that contain the

words, their definitions and links to related words, such as the word’s synonyms and hyponyms.

WordNet has already been used in the development of other applications. It was used in the PROVERB

system that powers One Across[10]. Hovy et al developed a system called Webclopedia[7], which

used the WordNet system and their own specially-built search engine and parser to retrieve answers to

manually-entered questions. The WordNet project is in version 2.0 as of the beginning of 2005, and is

still active.

4.3.2 Moby Project Word Lists

The Moby Lexicon Project was carried out by Grady Ward at the Institute for Language Speech and

Hearing at the University of Sheffield. The project comprises of various ASCII-text files, each which

holds different type of words.

With regards to the crossword solving problem, the Moby Project contains three files that can be

regarded as being of great significance. Two of these are pure word lists files; one contains a list of

over 350,000 words, “excluding proper names, acronyms, or compound words and phrases”[21]. The

23

other list contains 250,000 instances of words and phrases excluded from the first list. There are other

word lists available that could prove to be useful. Such as those contain male and female names. How-

ever these are merely subsets of words from the two aforementioned lists, which when combined, it is

claimed, from “the largest [English] word list in the world”[21].

The third file is the Moby Thesaurus. This “is the largest and most comprehensive thesaurus data

source in English available for commercial use”[21]. The file contains in excess of 2.5 million synonyms,

spread across 30,260 root words or phrases. The sheer volume of data from this file alone should prove

to be extremely useful in finding synonyms of words, which, as detailed previously, form a considerable

percentage of the clues in the Times T2 crossword. One of the disadvantages of the thesaurus file

however, is that it has no formal structure - it is in effect one very large text file, and searching through

the whole file each time will be computationally expensive. As the file essentially contains raw data,

with no word senses attached to the synonyms, some form of word sense disambiguation post-processing

may be required on results returned using the thesaurus[12].

The other problem that affects the thesaurus file, and the two main word list files, is their size -

all are several megabytes in size. This will no doubt result in access times being slower than for other

answer acquisition techniques. However, the huge number of words documented in these word lists has

the potential to provide an invaluable resource of data for use in the final solution.

4.3.3 CELEX

CELEX is similar to WordNet in that it is a electronic database of lexical information. Whereas WordNet

placed more of an emphasis on relationships between words, CELEX focuses on the phonetic makeup

of words and word morphology[3]. Databases for CELEX are available in Dutch, German and English,

with the former two languages having the bigger data sets. The CELEX project was ended at the start of

2001.

4.3.4 Word List Comparisons

As 4.2 shows, the Moby Project Thesaurus contained by far the largest collection of lexical data1. As

it was the only thesaurus file from the various projects that have been researched here, its inclusion into

the solving system was considered essential for solving synonym clues.

1estimated figure taken from the Moby Project website.

24

List Number of word strings Word definitions

WordNet 2.0 152,059 YES

Moby Project Word Lists 611,756 NO

Moby Project Thesaurus over 2.5 million NO

CELEX 160,594 NO

Figure 4.2: Comparison of Word list files

For dictionary-style searches, WordNet was the only resource listed in Figure 4.2 that offered full

definitions of word forms. As CELEX was intended to be used a tool for morphological and phonological

study rather than semantic analysis (as shown by Peters[16], its use within the crossword solver was

disregarded. The smaller number of words offered by the WordNet project would have not posed much

of a disadvantage for the reason that, unlike CELEX, the WordNet project was (as of April 2005) still

active, so it is highly likely that in a future release, the content offered by WordNet will eclipse that of

CELEX.

The word lists of the Moby Project, given the size of their content2, contain a number of words used

in the English language comparable to that of world-authority dictionaries such as the Oxford English

Dictionary. For this reason, these lists were used as the basis for any brute-force searching needed to be

done in the final solution.

4.4 Brute Forcing and Pattern Matching

One of the ways in which a human solver may attempt to find an answer to given clue is to simply look

up all possibilities which fit the answer. For example, if a human solver has a six-letter answer and

knows the first two letters, but cannot progress further, they may resort to simply looking up every word

which fits the given pattern, and then use the definitions of the words to see which fits the clue best.

A computerised approach to this would be to give a program a pattern in the form of a regular

expression and return a list of all words (or compound words) which match that pattern. Using the

returned list, it can be possible to eliminate letters at certain positions. This can be especially useful

for checked letters, as by reducing the number of possible letters at these positions, it is likely that the

number of possible words which fit the answer in the alternative direction will also decrease.

2354,984 single words + 256,772 compound words, taken from 2 files of the Moby project

25

Ideally, a brute-force search should be used only as a last resort for retrieving answers. To brute-force

effectively, there is the need to consider all possibly answers used in the Times T2 crossword. There is

infrequent usage of archaic words, acronyms and colloquialisms, as well as proper nouns, which cover

a wide range of topics from people’s names, to geographical locations and works of film and literature.

In short, brute-forcing will require very comprehensive sources of data from which to match the pattern

with words. This in turn will lead to efficiency and memory issues if this it is required to look through a

number of large word list files.

4.5 Conclusions

As single-word synonym clues were the easiest clue types to categorise, these therefore could be fed

directly into a Moby thesaurus or WordNet search (or some combination of the two), in order to retrieve

matching words. Multi-part clues that are made up of synonym clues can be solved in a very similar

manner, with the additional step of cross-checking results for each definition. For multi-worded defini-

tion clues, the RASP system could be used to extract a key phrase word from the clue and then use this

to process the clue similarly to synonym clues.

Knowledge clues could also be processed using RASP, to again determine key phrases and/or words;

these can then be passed into the Google Web API. If this fails to produce an outright correct answer,

then the key phrases or words be passed to the WordNet dictionary or the Moby wordlists.

In the event that no satisfactory possible answers are returned by one answer acquisition process,

the alternative process should be used. In the worse case scenario that no answers can be found at all, a

brute-force search on the Moby Project wordlists, using the answer string as a regular expression, will

be used to retrieve words. These words can themselves be checked using WordNet to determine their

viability as correct answers.

26

Chapter 5

Methodology and Design

5.1 Development Methodologies

There are a number of development methodologies which were considered for developing the final

solution. These are briefly described below:

• Waterfall Model - The waterfall model is one of the more well-known development methodolo-

gies in software engineering. The model, although have a varying number of stages, essentially

keeps the same rigid structure. As Jesty[8] states, the waterfall model is particularly useful for

short projects. For larger projects, its rigid structure is unsuitable, as software requirements are

likely to change over time.

• Prototyping - Prototyping has the advantage of accelerated production of software, and allows

for changes in design. Its main disadvantage is that maintenance of code becomes an issue as

documentation of an evolutionary design becomes more difficult[8].

• Spiral Model - This model splits development into four ordered stages, which can be roughly

labelled as “Determination of Objectives” “Evaluation of Objectives”, “Development” and “Plan-

ning”. Each stage is continually visited in the same order, allowing for incremental development

of software. As the spiral model is risk-based, rather than product based[8], it is suited for large

projects.

27

• Evolutionary Development - this model’s emphasis is on planned incremental delivery[4, 5].

Software functions are added at planned iterations, with existing functionality either removed,

modified or left unchanged as a result of testing previous iterations. The advantage of this ap-

proach is that feedback can be gained from early iterations (which act more as prototypes). How-

ever careful planning is required, as the model does not consider time constraints[8].

As a consequence of research into Natural Language Processing techniques, that the time allocated

for software development has decreased. To work with a rigid methodology such as the Waterfall Model

leaves no time to correct any errors found within the software design. The solving process is also

likely to require many refinements before performing at a satisfactory level, leading to direct conflict

with the Waterfall Model which “assumes everything works perfectly”[8]. The Spiral Model is also

inappropriate for this project, as it suited to longer periods of development. The other two methodologies

focus on iterative development. However, Evolutionary Development has better provision for addition

of functionality over time. “Functionality”, in the case of the crossword solver, can be thought of as the

various solving techniques which can be applied to a crossword puzzle - by adding these to a system in

an iterative manner then evaluating the solver’s success rate, these can be refined to improve the solving

process. Therefore Evolutionary Development will be the methodology used to develop the software.

5.2 Requirements Gathering

To establish the attributes needed to describe the various components of a crossword puzzle, Natural

Language Requirements Gathering was used.

5.2.1 Crossword Puzzle Representation

• A crosswordconsists of agrid and aset of clues.

• A grid consists of a number ofsquares; these squares can either beblackor white.

• A grid’s black squares are always empty. A grid’swhite squaresare either empty, or contain a

single alphabetic character.

• Adjacentwhite squaresform answerswithin thegrid. Every answer has anumber, adirectionand

a fixedlength. An answer only has one correctword which fits its squares.

28

• A set of cluesdirectly corresponds to theanswerson the grid. Every clue containsclue text, and

at least oneanswer word. The total length of these answers words must be exactly equal to the

length of the answer on the grid.

The list above shows that three distinct objects were required to represent a crossword puzzle. A

Grid object will be needed to represent the physical layout of a crossword grid. A Clue object will be

required to hold the attributes of a single crossword clue; a set of Clue objects will hold all clues in the

puzzle. An equal number of Answer objects will also be needed to store all possible candidate answers

which fit the corresponding pattern on the grid.

To allow end users to easily extract information from the crossword grid, some kind of visual repre-

sentation of the crossword grid was required. A number of ways to implement this were considered:

• Via the console, using ASCII characters.

• GUI representation, using graphical components.

• HTML representation, using< table> tags.

ASCII representation, although relatively easy to code, poses two immediate problems: developing

a way of distinguishing between black squares and empty white squares, and developing a suitable

method for inserting the layout of the grid. Using, for example, a co-ordinate input method for setting

black squares is likely to be very time consuming. GUI representation is likely to be the most complex of

these options to implement, but once created will have has numerous benefits; users will be able to select

which squares are to be black through direct interaction with the grid using the mouse. The GUI can

also provide the interface for entering clues into the grid, and provide users with feedback with regards

to the solving process as clues are entered into the grid. HTML tables are simple to design, though will

require dynamic components such as CGI to switch squares between black and white, and for the grid

to interact with any automated solving process so it can be updated.

Based on the information above, a GUI is the favoured option of the three as it can be design can be

tailored to fit the requirements of the system.

5.2.2 GUI Functional Requirements

The functional requirements of the GUI will take the form of user inputs. The most essential user inputs

that the GUI should be capable of performing are:

29

1. Selection/deselection of crossword grid squares,

2. Confirmation of the grid’s design,

3. Clue selection and clue text entry,

4. Entry of the number of answer sections in an answer (and, if necessary, selection of word delimiter),

and

5. Starting the solving process.

Extensions to these basic user inputs include:

6. Resetting the design of the grid and clearing the grid of answers,

7. Manual entry of answers into the grid, and

8. Providing rotational symmetry design options,

Additionally, the GUI will need to display across and down clues as two separate groups, as is

conventional for nearly all crossword puzzles.

5.3 Programming Languages

Below are some of the possible languages considered for developing the final solution.

5.3.1 Java

Advantages for development of a system using Java include its direct compatibility with the Google Web

API, meaning this technology could be incorporated straight into an application. Java has strong GUI

support, provided by the two class hierarchies, the older, less portable AWT (java.awt.*) and the newer

Swing libraries (java.swing.*). Java code was design to be portable, which will help speed up develop-

ment as there will be no restrictions as to choice of development platform. The author’s prior knowledge

of working with the Java language is also a strong advantage in the language’s favour. Documentation

of Java code is highly comprehensive and freely-available, which will aid system development. Finally,

support for regular expression pattern matching is available via the classesPattern andMatcher .

30

Disadvantages to using Java are, firstly, it has a slow runtime speed; tests by Prechelt[17] show that

for basic text manipulation programs, Java runs approximately three times slower than C and C++. Java

is also a complex language to write code for - the number of lines of code needed to write identical

programs in C++ and Perl is much smaller, as Java coding standards require that accessors and mutators

be declared for individual attributes of a class.

5.3.2 Perl and Python

The biggest advantages to using either Perl or Python as a development language are their built-in regular

expression support. Perl especially excels at text processing, which has been established as forming a

substantial part of the final solution’s solving process. Both languages computation speeds are also better

than that of Java’s. Prechelt[17, p.17] showed that Perl runs approximately eight times faster than Java,

Python slightly slower at around four times faster

Disadvantages to using either language are the author’s has no prior experience coding in Perl, and

relatively little experience; the time needed to learn the language would have to been considered as part

of the development stage. Also, neither has native GUI support; Perl does not support the use of GUIs

directly. Python GUI’s rely on the use of third-party modules, or custom-designed modules. This runs

the risk of introducing new problems into the design of the final software solution.

5.3.3 Conclusions

In considering choice of programming language, three criteria were used to evaluate the three languages

detailed above. These were, prior knowledge, computation speed, and GUI capabilities. As all languages

support the use of regular expressions, this was not considered as a criterion. Of the three languages, the

author has had most experience using Java, which, although a computationally expensive language, has

in-built GUI capabilities. Perl and Python are both faster computationally, but given that development

time has been shortened, the added time required to learn either language to the degree that a high-

quality solution can be produced using them meant that these languages had to be disregarded as viable

options. Therefore, Java was the programming language chosen to develop the crossword solver.

31

5.4 UML and Class Descriptions

In order to visualise the systems requirements as Java classes, UML was used to outline the interaction

between the components of the crossword solver. Explanations of the classes outlined in Figure 5.1 are

given below:

Figure 5.1: The UML class diagram for the crossword solver.

5.4.1 CrosswordGrid

This class holds all the information needed to describe an actual crossword grid. It contained the actual

grid, and stored the set of clue/answer pairs asVector objects. Two auxiliary classes (not shown in

Figure 5.1) were used by this class, one which described the actual data contents of the grid, and one

used for rendering the grid within a GUI, to ensure it had the intended look of a printed crossword.

5.4.2 Clue

The Clue class holds all details relating to an individual clue; that is, the clue’s text, its number and

direction, and its starting coordinates on the grid. This class was also where clues were categorised; using

theses categories, the solver could select what it thought to be the most appropriate solving technique

for the clue/answer pair.

5.4.3 Answer

The Answer class holds all details related to an individual answer; this included the current regular

expression of the answer (as derived from the grid) andVector objects to hold possible answers with

their corresponding frequencies (and, when relevant, the bias). The main concept behind this class was

speed; when a given clue/answer combination was first searched using one of the search methods, it was

anticipated that the process of retrieving answers would be relatively slow; given the size of word list

32

and thesaurus files, performing a search was expected to use much processing time (more so when using

Java). Similarly, a Google web search may take a number of seconds to retrieve its data. On the first

iteration of the solving process, results for a given clue/answer are saved into an instance of the Answer

class. Any subsequent searches (that is, once all clue/answer combinations have been search for the first

time) were then performed using the data stored within the Answer object.

5.4.4 CrosswordGridGUI

The CrosswordGridGUI class extended from the CrosswordGrid class, to provide a visual representation

of the attributes stored in its superclass. The decision was taken early to have this class be the one

to interact with theSearch hierarchy, as it was felt that by separating the data representation from

the crossword solving techniques, implementing theSearch hierarchy could be done in iteratively

(following the principles of the chosen methodology).

5.4.5 RASPService

The main function of theRASPService was to run the RASP system using a clue object’s text as

input. Once the output from the generated parse tree has been interpreted, it was intended to return

phrase groups, which would be passed to the appropriateSearch subclass.

5.4.6 Search

The Search class was designed as an abstract superclass. It provided concrete methods required to

retrieve the various aspects of a search result, and the abstract methodsdoSearch() , to preform a

search. All Search subclasses work by calling their implementation ofdoSearch , which sets three

Vector objects based on the results of the search; these are,words , containing the list of words that

matched the answer pattern,wordFrequencies containing the frequency of each word in word, and

wordBiases , containing bias information for each word (though this data structure may not be used

by all Search subclasses. The following classes are subclasses ofSearch , each of which implements

a different search technique.

33

5.4.6.1 MobyListSearch

TheMobyListSearch class provided the crossword solver with brute-forcing capabilities. Using the

two main Moby Project wordlists, it took an answer pattern and returned all words which matched it.

5.4.6.2 WordNetSearch

This class was used on clues determined to be synonym type clues. Although the class is named Word-

NetSearch, in fact it uses a combination of the Moby Thesaurus and the raw data WordNet files. Firstly,

the clue string and answer pattern were passed through the Moby Thesaurus file, to return all possible

related words to the clue string. For synonym clues, these results were then compared with the clue

string’s definition(s) (if these exist) in WordNet. If any of the result strings were found in these defi-

nitions, extra bias was given to these words, on the basis that these were more likely to be the correct

answer for the clue.

5.4.6.3 GoogleService

The GoogleService1 class took a clue’s text as a string, and a regular expression pattern, and performed

a search using the Google Web API. The class had two main search types; A basic search, which is

this class’s implementation ofdoSearch , with no special parameters of any sort, and a missing word

search, which used the preceding or succeeding word as the “key word ” in the clue to match only those

words found after or before this keyword, and then only if the word in question matched the answer

pattern. In either case, the resulting snippets2 were each processed to strip away HTML formatting. The

remaining text was then compared against the regular expression, with resulting words being obtained

from these snippets. The Google Web APIs also allowed for advanced search parameters to be set. The

only parameter explicitly set was that the language of all results obtained were to be in English only. For

the crossword solving problem, this was a more than reasonable setting to have set.

1The name “GoogleSearch” would have been the desired choice for this class’s name, to keep class names consistent.

However, there already exists such a named class in the Google Web API.

2The current version of the Google Web API dictates there to be a maximum of ten results returned from any search

34

5.4.7 Documentation of Software

The documentation of the classes was created using the Javadoc tool[13], and can be found enclosed

with the software.

5.5 Development Plan

As an evolutionary development methodology had been used to create the crossword solver, careful

planning of each iteration was required to ensure each set of changes were significant from the previous

iteration, whilst allowing for changes that may be required as a result of testing previous iterations. The

original development plan in Figure 5.2 outlines the functionality to be added at each iteration. The

most critical aspects of the system (that is, the components of the solving process) were developed first,

before being integrated into the system’s GUI. (‘GUI’ in the table refers to theCrosswordGridGUI

class.)

5.6 The Solving Process

Once the crossword grid and clues have been entered into the system, the process used by the crossword

solver to attempt solving the crossword can be broken down into the following four stages:

1. Categorisation of Clue Types- for each clue, the clue text was taken and categorised, so that the

appropriate answer retrieval mechanism was used.

2. Retrieval of Answer Sets- On the first iteration of the solving process, each clue was fed into its

search mechanism, from which a set of answers was retrieved.

3. Calculation of Answer Confidence- On every iteration, each unsolved answer set was compared

which the current pattern for that answer, calculated from the current status of the grid. Candidate

answers that did not fit the pattern were removed from the answer set. The remaining candidate an-

swers were each given a confidence score calculated by the formula(WordFrequency/TotalWordFrequency)∗Bias, whereWordFrequencywas the is the individual word’s frequency andTotalWordFrequency

was the total number of word instances returned in the search results. These two variables en-

sured that results for all clue/answer pairs were normalised. TheBias parameter was used by

35

Iteration Functionality Added

0 Command-line driven Moby thesaurus search

1 Command-line driven Google basic web search

2 Command-line driven Moby wordlist search

3 Command-line driven Google missing word search

4 GUI - crossword grid

5 GUI - components to manipulate grid

6 GUI - panels for clues and information

7 GUI - Clue entry panel and internal components added

8 Conversion of search classes from iterations 0 - 3 intoSearch hierarchy; integration into GUI

9 GUI - Slider component for multi-worded clues added

10 Support for multiple definition clues

After iteration 10, testing of the system began.

11 Letter restriction feature added

12 Refinements to the solving process

Due to time constraints, iterations 13 and 14 were not completed.

13 Word overlap features added

14 RASPService class added

Figure 5.2: The original evolutionary development plan for the crossword solver.

the WordNetSearch class to give higher scores to candidate answers that overlapped in the

WordNet definition of the clue phrase.

4. Insertion of Confident Answers - After each answer’s highest scoring word was determined,

that word’s score was compared with the current highest scoring word. If this high score was

exceeded, it was replaced by the current word’s score, as the solver has a higher confidence for the

current word. Once all answer set scores had been calculated, if there was a high scoring word for

the current iteration, this was inserted into the grid. The solving process then moved onto the next

iteration, moving back to Step 3. If, at the end of an iteration, no high scoring word was found,

either all words have been inserted, or there are no more confident answers to enter into the grid.

In either case, the solving process was said to be complete.

36

Chapter 6

Development and Testing

6.1 Test Plan

The first test of the software was done once all essential features of the software had been incorporated,

after iteration 9 of the development plan. Each subsequent test was carried out after significant additions,

bug removals, or optimisations of the solving process had been carried out. These improvements would

be based on the data obtained from the previous tests results, and through debug data obtaining during

the solving process. The criteria used to test the crossword solver are as follows:

1. The number of answers attempted in the crossword, as a percentage.

2. The number of correct answers found as percentages, against:

(a) the total number of answers in the grid, and

(b) the total number of answers attempted

3. The average confidence for:

(a) all answers on the grid, and

(b) all answers attempted,

4. The time required to enter a crossword grid and its individual clues into an application,

37

5. The time required to complete the solving process.

The first criterion will judge how much of a given crossword the software attempted to solve. The

second criterion will judge how accurate the software was in attempting to solve the crossword, with

respect to the whole grid, and attempted answers respectively. The third will judge how confidence the

software was in obtaining the answers. This can be seen as a measure of the validity of the mechanisms

used to obtain answers to clues.

The fourth and fifth criteria will give a measure of both how quick the software solving process is,

and to a lesser degree, how user friendly the software is insofar as entering a crossword grid and clue

input. It is important that the total time taken to enter and solve a crossword does not exceed the time

needed by a human solver to attempt the crossword, especially if the human solver can complete the

crossword with a significantly higher accuracy, as this would make the software an obsolete alternative

for solving the Times T2 crossword.

Although a 100% success rate was considered unlikely to the point of impossible, given that large-

scale research projects and commercial packages such as [10] and [11] fail to achieve such a figure, an

initial target figure aimed for was the average of correct answers attained from the testing of existing

solutions - that is, a figure of at least 41%. The three crosswords tested in Chapter 3 were used to test

the solver, to allow for direct comparison with the previously tested CBSs.

6.2 Test Results

As Figure 6.1 shows, less than 25% of clues attempted were correct in the initial test phase. As a result

of this, refinements to the solving process were required.

6.2.1 Solving Process Refinements

Changes made to the solving process upon completion the first test phase include:

• giving preference to words with higher word counts, should two or more words have joint highest

scores for a given iteration.

• adding a letter restriction method to eliminate any letters that cannot occur at individual squares,

by performing a brute-force search using a given pattern of an unsolved answer. By using such a

method, letters restricted in checked squares for one clue will also be restricted for the other clue,

38

Criteria Crossword Mean

T2 Bk7 #1 T2 Bk7 #2 #3415

Clues/Answers 28 22 24 -

Attempted 20 20 23 -

% Attempted 0.714 0.909 0.958 0.860

Solving Time (mm:ss) 02:40 02:04 02:18 02:21

Total Confidence 12.338 11.567 10.893 -

Mean (All) 0.440 0.525 0.454 0.473

Mean (Attempted) 0.617 0.578 0.474 0.556

Correct answers 7 1 7 -

% (All) 0.25 0.045 0.292 0.196

% (Attempted) 0.35 0.05 0.304 0.235

Figure 6.1: Results from Test Phase 1

which may in turn lead to more letters being restricted for that clue and any other connected to

it. Initially, letter restriction was performed as an extra initial iteration in the solving process. It

was estimated that the use of the letter restriction method would increase solving by at least 30-45

seconds, given previous observations of the brute force search method used in the solving process.

As seen in 6.2, solving times increased on average by over 75 seconds, slightly longer than predicted.

Improvements made after Test Phrase 2 include:

• limitation on use of the letter restriction method in the first iteration of the solving process. It was

found that, in the initial iteration, letter restriction worked very well for multiple worded answers,

restricting many letters at most positions, but made no impact for single word answers less than

eight letters in length. By performing letter restriction for only certain answer types, it was hoped

to decrease solving times, so they were nearer to those found in Test Phase 1.

• executing the letter restriction method for any unsolved clue/answer pairs which failed to return

any candidate answers.

• the instant insertion into the grid of answers that were deemed to be “highly confident” - that

is, those with a confidence score of 1.0, and with a word count of at least 5. This decision was

39

Criteria Crossword Mean

T2 Bk7 #1 T2 Bk7 #2 #3415

Clues/Answers 28 22 24 -

Attempted 24 22 20 -

% Attempted 0.857 1.000 0.833 0.896

Solving Time (mm:ss) 04:02 03:28 03:55 03:28

Total Confidence 8.444 7.692 5.533 -

Mean (All) 0.302 0.347 0.231 0.293

Mean (Attempted) 0.352 0.347 0.277 0.352

Correct answers 6 5 6 -

% (All) 0.214 0.227 0.250 0.230

% (Attempted) 0.250 0.227 0.300 0.259

Figure 6.2: Results from Test Phase 2

taken as it was found that clues with these confidence statistics, especially in early iterations, were

almost always correct answers.

• giving extra bias to clue/answer pairs which already had letters present within its grid slot. The

current methods used to solve the grid are such that earlier iterations of the solving process are

more likely to give correct answers to clues.

The changes made between Test Phases 2 and 3 not only reduced the average solving time by 14

seconds, but increased the confidence of answers found, and returned a greater proportion of correct

answers.

One final change was made to Stage 4 of the solving process. Previously, the algorithm for deter-

mining whether an answer was more confident than the current most confident answer was given as:

if (score > highScore ||

(score == highScore && wordCount > highWordCount)) {

highScore = score;

...

}

40

Criteria Crossword Mean

T2 Bk7 #1 T2 Bk7 #2 #3415

Clues/Answers 28 22 24 -

Attempted 23 18 15 -

% Attempted 0.821 0.818 0.625 0.757

Solving Time (mm:ss) 03:15 03:20 03:08 03:14

Total Confidence 14.935 11.296 9.152 -

Mean (All) 0.533 0.513 0.381 0.476

Mean (Attempted) 0.649 0.628 0.610 0.629

Correct answers 9 5 7 -

% (All) 0.321 0.227 0.292 0.280

% (Attempted) 0.391 0.278 0.467 0.379

Figure 6.3: Results from Test Phase 3

The codeif (score > highScore...) was felt to give too great a bias to clues with high

scores but low word frequencies (stored in thewordCount variable). This line was altered slightly

with the use of two additional variablesscoreWordCount andhighScoreWordCount :

scoreWordCount = (score*score) * wordCount;

if (scoreWordCount > highScoreWordCount ||

(score == highScore && wordCount > highWordCount)) {

highScoreWordCount = scoreWordCount;

...

}

By squaring the (normalised) score, the intension was to give a greater preference to words with

higher word counts.

The proportion of correct answers to answers attempted increased by 21% as seen in Figure 6.4,

justifying the changes made between Test Phases 3 and 4.

41

Criteria Crossword Mean

T2 Bk7 #1 T2 Bk7 #2 #3415

Clues/Answers 28 22 24 -

Attempted 15 14 14 -

% Attempted 0.536 0.636 0.583 0.585

Solving Time (mm:ss) 03:27 02:41 02:30 02:54

Total Confidence 9.545 6.085 10.448 -

Mean (All) 0.341 0.277 0.435 0.351

Mean (Attempted) 0.636 0.435 0.746 0.606

Correct answers 8 7 9 -

% (All) 0.286 0.318 0.375 0.326

% (Attempted) 0.600 0.500 0.643 0.581

Figure 6.4: Results from Test Phase 4

6.2.2 Entry of Crossword Data

To test evaluation criteria #5, three users were asked to each enter the data for the three test crosswords

into the crossword solver. As Figure 6.5 shows, the shorter input times were for the crosswords with

fewer clues to enter data for.

Crossword T2 Bk7 #1 T2 Bk7 #2 #3415

Mean Time Taken 10:37 09:21 09:50

Figure 6.5: Results of data entry testing for three users of the software, using the three test crosswords.

42

Chapter 7

Evaluation

This chapter will evaluate the project, using the evaluation criteria outlined in Chapter 1.

7.1 Project Aim

The aim of the project was to design a piece of software that could solve the Times T2 crossword.

Through research into existing crossword solvers and technologies which could be used to acquire

crossword answers, the software has been designed using an evolutionary development methodology,

implemented in Java and tested to refine the solving processes used in attempting to solve the cross-

word. The resulting crossword solver is shown in Figure 7.1. As the software only managed to obtain

33% of correct answers by the end of the final testing phase, and thus failing to fully solve a complete

edition of the Times T2 crossword, the weaknesses of the system have been acknowledged in Chapter 8

- Conclusion. If further development of the crossword solver were to take place, these weaknesses can

be addressed, which would undoubtedly improve the crossword solver’s performance.

7.2 Objectives

The project’s objectives have all been realised as follows:

• Research the history of crosswords, and understand the difficulties face by human solvers

43

Figure 7.1: The final crossword solver. This screenshot shows the solver after attempting to solve the

T2 Crossword #3415, from the fourth and final test phase. All attempted answers are correct, apart from

2D, 4D, 17A, 18D and 20A.

in completing crosswords- The early stages of the project’s lifetime were spent researching the

evolution of crosswords from simple word puzzles to modern-day crosswords. From this, the

Times T2 crossword was looked at specifically. From this initial research, a statement of the

problem was realised.

• Evaluate existing crossword solvers, noting any features used by such solvers that could be

incorporated into a final solution - It was originally expected that there would be a number of

readily-available crossword solvers to research into and test out. However, only the One Across

system [20] was specifically designed to answer crossword clues. It was with some regret that

the Crossword Maestro software could not be tested, as this software looked most promising with

44

regard to solving crosswords.

• Research possible technologies used to acquire answers to clues, which could be incorpo-

rated into a final solution - A lot of time was spent researching technologies which could be

integrated into the crossword solver, even more so once Natural Language Processing (NLP) tech-

nologies were researched. Based on this research, firm choices could be made regarding which

technologies could be most successfully integrated into the system’s solving process.

• Evaluate software development methodologies and programming languages which could be

used to develop the system, giving reasoning for final choices- Choosing the right methodology

for the project was vital to ensure that the software could be developed as effectively as possible.

Evolutionary development allowed for changes to the requirements of the system, which made the

task of exceeding the minimum requirements easier. The choice of language was also important.

In using Java, a conscious choice was made to sacrifice performances gains offered by alternative

languages in return for speedier development due to having a good working knowledge of the Java

language.

• Develop a software solution which incorporates at least all the minimum requirements stated

in this report - This was where the majority of time spent on the project was placed. Development

of the crossword solver began with the coding of the first iterations at the end of December, and

ceased by the beginning of April. In between this however was done the extra research into Natural

Language Processing, so for a few weeks development of the software was sidelined.

• Test the developed system using various editions of the Times T2 crossword, against a num-

ber of evaluation criteria - A lot of time was spent testing and refining the crossword solver, as

initial expectations of the solver’s success rate proved to be higher than the results observed.

7.3 Minimum Requirements

The three minimum requirements of the project have been fully realised:

• To develop a system which can return a list of possible answers given a clue and the length of

its answer- the solving process used by the crossword solver returned a set of candidate answers

using the most appropriate search method, which is determined by the contents of the clue text.

45

• To develop a method of inputting a crossword grid into an application- Interaction with the

grid section of the crossword solver’s GUI using the mouse allows users to select the squares

which are to be black.

• To develop a method of inputting a crossword clue into an application, such that it relates

to a specific location on a crossword grid- The crossword solver takes the input grid, and

automatically works out the numbering of the clues and answers. From this, individual clues can

be selected from the GUI, which then allows the clue’s text to be entered into the application.

With regards to extensions to the minimum requirements, the ones listed at the beginning of the

report have all been incorporated into the final system. Crosswords can be loaded and saved from the

File menu of the GUI. The GUI itself was considered a major requirement for software of this nature.

The idea of the information box was derived from ideas demonstrated by existing solvers. Manual entry

of answers into the grid has also been incorporated into the crossword solver.

7.4 Comparison with Existing Systems

From results obtained in Chapter 3, the One Across system obtained 41% of correct answers. The

project’s crossword solver obtained 33% of answers. If the shortcomings of the solving process high-

lighted in Chapter 8 were rectified, there is a strong possibility that the 8% discrepancy between the two

systems could be made up, or even surpassed.

7.5 Comparison with Human Solvers

To gain an understanding on how the crossword solver compared to unassisted human performance, a

small survey was conducted using the three crosswords used to test all systems throughout the course

of this project. The raw results of this survey are to be found in Appendix B. The average times for

crossword data entry from Figure 6.5 have been added to the solving times obtained from the final test

phase in Figure 6.4, whose other data has been used here to compare the results of the human solvers.

As seen in Figure 7.2, although the human solvers took on average at least five minutes longer to

attempt the crosswords, their success rate for obtaining correct answers was far superior to that for the

crossword solver. In every instance, human solvers obtained approximately twice as many answers as the

crossword solver. For some answers, the crossword solver managed to obtain the correct answer where

46

Criteria T2 Bk7 #1 T2 Bk7 #2 #3415

Computer Humans Computer Humans Computer Humans

Clues/Answers 28 28 22 22 24 24

Mean Attempted 15 20.4 14 13.8 14 18.6

% Attempted 0.536 0.729 0.636 0.627 0.583 0.775

Mean Solving Time (mm:ss) 14:04 19:00 12:02 17:00 12:20 19:00

Correct answers (Mean) 8 18.8 7 12.8 9 18.2

% (All) 0.286 0.671 0.318 0.582 0.375 0.758

% (Attempted) 0.600 0.922 0.500 0.928 0.643 0.978

Figure 7.2: Comparison of unassisted human performance against results from Test Phase 3 (Note: The

mean solving times for human solvers are approximate)

few or none of the human solvers could do so. For example, for the clue “Sistine wall subject”(4,8) -

LAST JUDGMENT (T2 Bk7 #2: 21A). Some clues were so tough neither the crossword solver nor

any of the human solvers could obtain the answer. Example clues include “One writing notice”(8) -

REVIEWER (T2 Bk7 #1: 8A) and “Feudal allegiance”(6) -FEALTY (T2 Bk7 #2: 20A).

47

Chapter 8

Conclusion

8.1 Failings of the Solving Process

The solving process of the software performed below expectations. However, the weaknesses of the

system have been outlined below, along with possible remedial action to remove these weaknesses from

the system:

• Google search inconsistencies- From observing results obtaining using the Google Web API,

it has been noted that the search results are not always consistent. For the majority of clues, the

correct answer appears in all searches, with minimal variance in its frequency or confidence score

with respect to other answers returned in its results set. However, for some clues, the answer

appears, but is no longer the highest scoring word, or in the worst case, does not appear at all.

Possible solutions to this include:

1. Repeating Google searches a number of times to obtain a results set,

2. Preprocessing the clue using Question Answering techniques,

3. Using word overlap techniques on the results set, using WordNet

Of these solutions, the first is the quickest to implement in terms of coding. However, there is

no guarantee that repeated searches will return better results. Also, by repeating searches, there

is an increased possibility that the 1000 searches a day limit imposed by the current build of

48

the Google Web API will be reached. Using Question Answering as suggested in the second

solution would help to narrow the Google search, by building up a more specific search term

using related synonyms of words from the clue text. The third solution, if correctly implemented,

should be successful in determining the correct answer from a given results set, but only if the

answer appears within the results initially. From these suggestions, a combination of solutions 2.

and 3. would be the best solution to overcome the Google search inconsistencies.

• Moby Thesaurus not Anglicised - This weakness was only discovered during testing of the

crossword solver. It was assumed that, as the Moby Project’s main word lists contained British

English spellings, the same would be true of the project’s Thesaurus file. However, the Thesaurus

has been produced using American English spellings. Consequently, words likeLABOUR and

HONOUR were not found as synonyms during the test phases due to the inconsistent spellings

between the two forms of English. Solutions to this could be to manually convert the Thesaurus

into British English spellings, which would be an arduous task given the sheer size of thesaurus

file, or to seek out an alternative thesaurus source. The author finds the fact that an American

English thesaurus was compiled during a research project at a British university all the more

curious (not to mention frustrating).

• Word overlap techniques not fully utilised - This feature was only partially implemented into the

solving process, as clue definitions were checked for occurrences of candidate answers. However,

searching on the answer definitions was not implemented (following the development plan from

Figure 5.2, this was to be the next iteration to be integrated into the software). Due to time

constraints, full word overlap functionality was unable to be incorporated into the solving process.

• RASP system not utilised- This feature was originally planned to be one of the final additions to

the software. However, upon the advise of the project’s supervisor and assessor, development of

this feature (which would have taken the form of theRASPService class in the UML diagram

shown in Figure 5.2 was not recommended, again due to time constraints.

8.2 Suggested Improvements

As can be seen from the testing results in the Chapter 6, the solver does at best an adequate job of

retrieving accurate results. The following are suggested improvements which could be made to the

49

crossword solver:

• The automatic numbering of answers on the grid. Currently, the only way for users to determine

an answer’s position on the grid is when its clue is selected from the clue input box; it is then

highlighted in a different colour to the rest of the grid.

• During the entire solving process, the GUI appears to be in a frozen state. This is due to the meth-

ods used to solve a crossword being computationally expensive, time-consuming, and executing

in the same thread as the GUI. The solution is therefore to run the solving process in a separate

thread.

• Currently, there is no way to remove a single answer from the grid, apart from manually entering

another answer in its place, or clearing the grid’s entire contents. A extra feature to clear an answer

could be implemented within the solver’s GUI class.

8.3 Future Work

The crossword solver, as it stands, has huge potential for future work to be carried out on it. Small

projects may include developing a better Google search facility using elements of Question Answering,

developing a British English thesaurus to replace the current Moby thesaurus, or allowing for different

sizes of grid (perhaps even irregular shaped grids). More adventurous projects could aim to expand

the capabilities of the current crossword solver, so that, for example, it could solve cryptic clues as

Crossword Maestro[11] does.

50

Bibliography

[1] E. Briscoe and J. Carroll. Robust Accurate Statistical Annotation of General Text). InProceedings

of the Third International Conference on Language Resources and Evaluation (LREC 2002), pages

1499–1504, May 2002. URL: http://www.informatics.susx.ac.uk/research/nlp/rasp/ [17th February

2005].

[2] Richard Browne, editor.The Times T2 Crossword Book 7. Harper Collins, first edition, 2004.

[3] G. Burnage. CELEX - A Guide For Users. Technical report, Max Planck Institute for Psycholin-

guistics, University of Nijmegen, 1990. URL: http://www.ru.nl/celex/subsecs/sectiondoc.html

[26th April 2005].

[4] J. Crinnion.Evolutionary Systems Development. Ptiman Publishing, first edition, 1992.

[5] T. Gilb and S. Finzi.Principles of Software Engineering. Addison-Wesley, first edition, 1988.

[6] J. K. Hardy. Jumble and Crossword Solver. URL: http://ull.chemistry.uakron.edu/jumble.html [2nd

December 2004], January 2003.

[7] E. Hovy, L. Gerber, U. Hermjakob, M. Junk, and Chin-Yew L. Question Answering in Webclope-

dia. InTREC-9 Conference, NIST, 2001.

[8] Peter H. Jesty. Software Project Management: - Life Cycles, 2004. URL:

http://www.comp.leeds.ac.uk/se22/lectures.html [17th April 2005].

[9] Ed Pegg Jr. The Mathematical Association of America, Math Games: Crossword Rules. URL:

https://enterprise.maa.org/editorial/mathgames/mathgames05 10 04.html [24th November 2004],

May 2004.

51

[10] M. L. Littman, G. A. Keim, and N. Shazeer. A probabilistic approach to solving crossword puzzles.

Artificial Intelligence, 134:23–55, 2002.

[11] Genius 2000 Ltd. Crossword Maestro for Windows. URL: http://www.crosswordmaestro.com

[24th November 2004], 2000.

[12] K. Markert. AI32 Lecture Notes, 2005. URL: http://www.comp.leeds.ac.uk/ai32/lectures/index.html

[24th April 2005].

[13] Sun Microsystems. Javadoc Tool Home Page. URL: http://java.sun.com/j2se/javadoc/ [26th April

2005], 1995.

[14] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to WordNet: An

On-line Lexical Database.International Journal of Lexicography, 1990 (Revised August 1993).

[15] Editors of The American Heritage Dictionaries.The American Heritage Dictionary of the English

Language. Houghton Mifflin, fourth edition, 2000. Online Search: http://www.bartleby.com/61/

[26th April 2005].

[16] Wim Peters. Lexical Resources - Comparison of resources using metadata. URL:

http://phobos.cs.unibuc.ro/roric/lexcomparison.html [24th April 2005], 2003.

[17] L. Prechelt. An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a

search/string-processing program.IEEE Computer, March 2000.

[18] Oxford University Press, editor.The Oxford English Dictionary. Oxford University Press, second

edition, 1989. URL: http://dictionary.oed.com/ [19th February 2005].

[19] D. Radev, W. Fan, H. Qi, H. Wu, and A. Grewal. Probabilistic Question Answering on the Web.

Journal of the American Society for Information Science and Technology, April 2005.

[20] various. One Across. URL: http://www.oneacross.com [24th April 2005], 2005.

[21] Grady Ward. Moby Project, 2000. URL: http://www.dcs.shef.ac.uk/research/ilash/Moby/.

52

Chapter 9

Appendix A

9.1 Project Reflection

Overall, I felt the project to be very successful. The aim, minimum requirements and objectives were all

completed to what I felt to be a high standard, and on schedule. However, a few aspects of the project

did not run as smoothly as intended. The reasons why they were felt to be problems areas, and possible

mitigation of these as problem areas are discussed below.

Natural Language Processing

As a student who had not chosen to study Natural Language Processing as part of my degree programme,

it was not immediately obvious, if at all obvious, that this field of computing would be most useful in

the development of the crossword solver; it was only until learning that Dr. Markert was to be my

assessor that I realised elements of NLP needed to be integrated into my solution. If my research into

NLP had been undertaken much earlier in the lifetime of the project, I am very confident that the solver’s

success rate could have been much higher, as more time would have been given to implementing NLP

techniques into the software. Nevertheless, I found the field of NLP to be an very interesting one. My

advice to future finalists is to look at all possible areas of research in the field of computing with regards

to understanding the problem of their intended project, even areas they have not studied or will not study

as part of their degree programme. As well as furthering your computing knowledge, you may find the

53

learning process an enjoyable one, as I did.

Software Development

Given the lower than expected results for the crossword solver, in hindsight I probably placed too much

emphasis on getting the design of the GUI right. It has been said of me at that at times I can be a

perfectionist, unwilling to stop at something until I feel it is truly finished. Given that I am a strong

programmer, and that the software development phase of the project was delayed by two or three weeks

as extra research was carried out, this sentiment was definitely true with regards to the creation of the

crossword solver. Given more time, I felt (and still feel) that I could have improved the crossword solver.

However, I had to remind myself that it was the report, and not the software, that ultimately gets marked

and graded. The lessons learnt from this to be passed on to other students are:

a) to undertake early stages of a project as soon as possible to allow for changes in the direction of your

project, and

b) to allow plenty of time for developing software deliverables.

No matter how good a programmer you are in whatever language, programmings errors will occur

and problems will arise that take valuable time to rectify. This combined with pressures from regular

courseworks mean that the earlier you plan ahead and allow for such instances to occur, the better off

you will be come deadline day.

54

Chapter 10

Appendix B

10.1 Results of Human Solvers

As an extra test to the main testing and evaluation criteria, comparison of the crossword solver tounas-

sistedhuman performance was considered, to see how the automated crossword solving system com-

pared. In all, five participants attempted the same three crosswords tested throughout the project. The

raw results are detailed overleaf. The “answer found” percentages of the individual answers are shown,

along with any incorrect answers and their percentage scores.

55

Number Clue Answer % Found Other answers given

1A Baghdad its capital IRAQ 0.800 IRAC (0.200)

2D Full of water-plants; of oboe-like tone REEDY 0.400 -

3D Cervantes’s Don QUIXOTE 0.600 QUIOTAE (0.200)

4A Instrument plucked in mouth JEW’S HARP 0.600 -

4D Loose skin on neck JOWL 0.400 CHIN (0.200)

5D All one’s clothes WARDROBE 0.000 -

6D Nettle-rash HIVES 0.600 STING (0.200)

7D Pasta parcels RAVIOLI 1.000 -

8A One writing notice REVIEWER 0.000 NOTIFIER (0.200)

9A Long live!; oral test VIVA 0.800 -

10A George Gordon, poet BYRON 0.600 BROWN (0.200)

10D Public vehicle BUS 1.000 -

11A Shudder of emotion FRISSON 0.200 -

12D Not grown up IMMATURE 1.000 -

13A Steady flow STREAM 1.000 -

14D (Deliberately cause) anguish TORTURE 0.400 -

15A Of Welsh poets BARDIC 0.200 -

16D Small baking dish RAMEKIN 0.200 -

17D Reduce; one’s share CUT 0.800 BIT (0.200)

18A Done in small stages GRADUAL 0.400 -

19D A twelvesome DOZEN 1.000 -

20A Sting with pain; clever SMART 0.800 -

21D Straight-edge; king RULER 0.800 -

22D Out-of-focus sight BLUR 1.000 -

23A Fluffy hair; police (slang) FUZZ 1.000 -

24A Improbable UNLIKELY 1.000 -

25A Santa’s animal REINDEER 0.800 RAINDEER (0.200)

26A Annoy; police informer (slang) NARK 0.400 -

Figure 10.1: Results from Times 2 Crossword Book 7, Crossword #1

56

Number Clue Answer % Found Other answers given

1D Senior Service shade NAVY BLUE 0.400 -

2D Corrupt; decomposing PUTRID 0.400 -

3D Philatelist’s treasures STAMPS 1.000 -

4D Cask stopper BUNG 1.000 -

5D A fish; sounds likeposition PLAICE 0.600 -

6A Spontaneously; needlessly GRATUITOUSLY 0.400 -

6D Slight scrape; keep snacking GRAZE 0.800 -

7A Solicitor LAWYER 1.000 -

8A One for sorrowbird MAGPIE 0.800 -

9A Collapsed; moor FELL 0.600 -

10A Catastrophe DISASTER 0.800 -

11D With sawlike edge SERRATED 0.800 -

12A As a company, in conjunction TOGETHER 0.400 -

13D Fugitive OUTLAW 0.600 -

14D Dignity; be faithful to HONOUR 0.400 HONEST (0.200)

15D Place of safety REFUGE 0.600 -

16A Slope (joining two levels) RAMP 0.800 -

17D Sacred choral piece MOTET 0.000 PSALM (0.200)

18A Nicked STOLEN 1.000 -

19D Mediaeval plucked instrument LUTE 0.400 LYRE (0.400)

20A Feudal allegiance FEALTY 0.000 -

21A Sistine wall subject LAST JUDGMENT 0.000 -

Figure 10.2: Results from Times 2 Crossword Book 7, Crossword #2

57

Number Clue Answer % Found Other answers given

1A Practical application of science TECHNOLOGY 1.000 -

1D Lacking courage TIMID 1.000 -

2D (Of dress) conventional and

sober

CONSERVATIVE 1.000 -

3D Fine, subtle; pleasant NICE 1.000 -

4D Work hard LABOUR 1.000 -

5D Enterprising person GO-GETTER 1.000 -

6D Cromwell’s fighters NEW MODEL ARMY 0.600 -

7D Nov 30th this saint’s day ANDREW 0.800 -

8A Taxi MINICAB 1.000 -

9A Common; inexperienced GREEN 0.800 -

10A Dragged; finished level DREW 0.400 -

11A Shop user CUSTOMER 1.000 -

12D In the open air ALFRESCO 1.000 -

13A Auctioneer’s hammer GAVEL 1.000 -

13D A man (colloq.) GEEZER 0.800 -

14A Establish (university post) by

giving funds

ENDOW 1.000 -

15D Tuberous garden plant DAHLIA 0.200 BAMBOO (0.200)

16A Outside EXTERNAL 0.600 EXTERIOR (0.400)

17A Ran away FLED 0.600 -

18D Research intensively (into some-

thing)

DELVE 0.600 -

19D Sleeping (archaic) ABED 0.200 -

20A Overhanging roof part EAVES 0.800 -

21A Respectful of opinions different

from one’s own

LIBERAL 0.200 -

22A A Study in Scarletauthor CONAN DOYLE 0.400 -

Figure 10.3: Results from Times 2 Crossword #3415

58

Chapter 11

Appendix C - Gantt Chart Schedule

59