Solving the Times T2 Crossword Robert Yaw BSc … · Solving the Times T2 Crossword Robert Yaw BSc...
Transcript of Solving the Times T2 Crossword Robert Yaw BSc … · Solving the Times T2 Crossword Robert Yaw BSc...
Solving the Times T2 Crossword
Robert Yaw
BSc Computer Science
2004/2005
The candidate confirms that the work submitted is their own and the appropriate credit has been given
where reference has been made to the work of others.
I understand that failure to attribute material which is obtained from another source may be considered
as plagiarism.
(Signature of student)
Summary
This aim of this project was to create a piece of software that attempted to automatically solve
the Times T2 crossword. The software incorporated web search technologies and elements of Natural
Language Processing in order to obtain answers to clues.
On average, the solver managed to correctly find answers to 33% of clues given to it. Whilst this
compares poorly to existing systems (and especially to unassisted human performance), this report high-
lights in its conclusion some of the failings of the current system, and how these can be overcome in
order to improve the crossword solver’s performance.
i
Acknowledgements
Thanks go to the following people:
• Tony Jenkins, for his support and guidance throughout the project,
• Katja Markert, for her invaluable guidance given in the Mid-Project Report,
• Nobuo Tamemasa, whose originalMThumbSlider class1 was taken and customised for use in
the crossword solver,
• Everyone who helped in the crossword survey,
• My parents Lynne and Peter, who have supported me throughout the last three years!
1Downloaded fromhttp://physci.org/codes/tame/index.jsp/
ii
Contents
1 Introduction 1
1.1 The Problem Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Minimum Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6.1 Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6.2 Revised Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6.3 Revised Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background Reading 7
2.1 Crossword History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Crossword Grid Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Other Crossword Styles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 The Times T2 Crossword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Clue Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.1 Definition Clues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.2 Knowledge Clues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.3 Other Clue Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.4 Clue Type Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
iii
3 Existing Crossword Solvers 14
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 PROVERB and One Across . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Jumble and Crossword Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Crossword Maestro for Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Testing of Existing Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Answer Acquisition Technologies 20
4.1 Google Web APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Question Answering and RASP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Word Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.1 WordNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.2 Moby Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3.3 CELEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.4 Word List Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Brute Forcing and Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5 Methodology and Design 27
5.1 Development Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Requirements Gathering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.1 Crossword Puzzle Representation . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2.2 GUI Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3.2 Perl and Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4 UML and Class Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4.1 CrosswordGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4.2 Clue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4.3 Answer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
iv
5.4.4 CrosswordGridGUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4.5 RASPService . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4.6 Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4.6.1 MobyListSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.4.6.2 WordNetSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.4.6.3 GoogleService . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.4.7 Documentation of Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.5 Development Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.6 The Solving Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 Development and Testing 37
6.1 Test Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2.1 Solving Process Refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.2.2 Entry of Crossword Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7 Evaluation 43
7.1 Project Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.3 Minimum Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.4 Comparison with Existing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.5 Comparison with Human Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
8 Conclusion 48
8.1 Failings of the Solving Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.2 Suggested Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Bibliography 51
9 Appendix A 53
9.1 Project Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
v
10 Appendix B 55
10.1 Results of Human Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
11 Appendix C - Gantt Chart Schedule 59
vi
Chapter 1
Introduction
1.1 The Problem Domain
The crossword is one of the most popular word games in the world. The aim of the game is simple: to
answer as many of the crossword’s clues as possible, and consequently fill in the crossword grid. The
Times T2 Crossword was first published in 1993 and as of the start of 2005, has had over 3400 editions
published, six times a week for the last eleven years.
The majority of clues in the T2 crossword are definition clues. These are mostly comprised of no
more than four words, or in some instances, many single words where the answer fits each word, usually
in a different context. The remainder of the clues are general-knowledge based, and therefore cannot be
solved simply by looking up possible answers in a dictionary. The difficulty of the T2 crossword lies in
the ambiguous nature of these clues. For example, some of the clues have more than one possible answer,
given only the answer length. For example: “River of NE England”(4) (3427: 23A) could have been
eitherTYNE , TEES or WEAR , to name three perfectly acceptable alternatives - until other answers
have been found, the human solver has no inclination as to which alternative is the correct answer (the
answer, it turns out, wasTYNE ). Other answers to clues may not be as immediately obvious, due to
the clue’s minimal context. An example of this is in the clue “Common; inexperienced”(5) (3415:
9A). Taking the first part of the clue, the word “common”, in modern English, is used primarily in the
everyday language as an adjective; example phrases such as, “he is common”, or “that is a common
1
occurrence these days” illustrate this point. However, in this particular instance, the word “common” is
being used as a noun, to describe an area of parkland or greenery. Indeed, it is the wordGREEN that is
the correct answer. The second part of the clue confirms this, asGREEN is a fitting, yet once-again not
oft-used synonym for the the word “inexperienced”.
Traditionally, human solvers have had to rely on printed dictionaries and thesauruses to find answers
to those clues they could not solve by themselves. In more recent times Internet search engines such
as Google and Ask have provided additional means to acquiring answers. However, there are still few
freely-available computerised systems that allow users to retrieve answers to specific crossword clues.
Those that do exist are tailored to solve American crosswords or do not allow users to enter clues into
their system.
By utilising developments in computing power and Internet connection speeds, along with research
done in the field of Natural Language Processing, this project aims to develop a system for solving the
Times T2 crossword.
1.2 Project Aim
To develop a piece of software which, given any edition of the Times T2 crossword (comprising of its
grid and corresponding clues), can automatically give correct answers to the clues, and thus solve the
crossword.
1.3 Objectives
The objectives of the project are as follows:
• Research the history of crosswords, and understand the difficulties face by human solvers in com-
pleting crosswords,
• Evaluate existing crossword solvers, noting any features used by such solvers that could be incor-
porated into a final solution,
• Research possible technologies used to acquire answers to clues, which could be incorporated into
a final solution,
2
• Evaluate software development methodologies and programming languages which could be used
to develop the system, giving reasoning for final choices,
• Develop a software solution which incorporates at least all the minimum requirements stated in
this report, and
• Test the developed system using various editions of the Times T2 crossword, against a number of
evaluation criteria.
1.4 Minimum Requirements
The minimum requirements of the project are:
• To develop a system which can return a list of possible answers given a clue and the length of its
answer.
• To develop a method of inputting a crossword grid into an application.
• To develop a method of inputting a crossword clue into an application, such that it relates to a
specific location on a crossword grid.
Possible extensions to the project include:
• To allow for the loading and saving of a crossword grid, with an optional clue set.
• To develop a GUI for the software.
• To provide a means of informing the end user of the solving process for both individual clues and
the grid as a whole.
• To allow users to manually enter answers to clues they have already solved.
1.5 Deliverables
The project deliverables are:
• A piece of software which can attempt to solve the Times T2 crossword.
• A final report.
3
1.6 Schedule
The original project schedule was drawn up as a Gantt chart, and has been included in Appendix C. The
overview of the original schedule was as follows:
01/11/2004 - 12/11/2005 Research The Times T2 crossword.
01/11/2004 - 19/11/2005 Research existing crossword solvers, and if possible, test them
against the Times T2 crossword.
08/11/2004 - 03/12/2005 Research available Internet resources and available word lists that
could be incorporated into a final solution.
15/11/2004 - 03/12/2005 Look into possible programming languages to be used to code a final
solution.
22/10/2004 - 03/12/2005 Make final choices regarding programming languages and technolo-
gies to use.
29/11/2004 - 10/12/2005 Mid Project Report
13/12/2004 - 14/01/2005 HOLIDAY AND EXAM PERIODS
06/12/2004 - 21/01/2005 Complete evaluation criteria
15/12/2005 - 18/03/2005 Software development and testing
21/02/2005 - 11/03/2005 Submission of draft chapter and table of contents
07/03/2005 - 08/04/2005 Evaluation of software product
21/03/2005 - 08/04/2005 Project reflection
1.6.1 Milestones
10/12/04 Mid-Project Report
28/01/05 Research complete
11/03/05 Draft Chapter and Table of Contents
25/03/05 Software development and testing complete
08/04/05 Evaluation complete
27/04/05 Final Report
1.6.2 Revised Schedule
Following feedback from the mid-project report, the schedule was heavily revised due to the need to in-
corporate research from the field of natural language processing into the project. Other critical stages of
4
the project were also missing from the original schedule and have been included in the revised schedule,
which is detailed below, and commenced from the 24th January 2005:
06/12/2004 - 28/01/2005 Compile evaluation criteria
24/01/2005 - 25/02/2005 Research NLP technologies to be incorporated into the final solution
14/02/2005 - 25/02/2005 Research software development methodologies
31/01/2005 - 01/04/2005 Development of software components
21/02/2005 - 11/03/2005 Submit table of contents and draft chapter
28/02/2005 - 08/04/2005 Software testing and additional improvements
28/03/2005 - 15/04/2005 Software evaluation
04/04/2005 - 22/04/2005 Project evaluation
04/04/2005 - 25/04/2005 Project write-up
1.6.3 Revised Milestones
The milestones of the project were revised to reflect the changes to the schedule, and are listed below.
10/12/04 Mid-Project Report
25/02/05 Research into NLP and methodologies complete
11/03/05 Draft Chapter and Table of Contents
15/04/05 Software development and testing complete
22/04/05 Evaluation complete
27/04/05 Final Report
1.7 Evaluation Criteria
The success of the project will be judged at the end of this report in the evaluation. To ensure this could
be done successfully, a set of evaluation criteria were drawn up. These criteria are as follows:
1. How successfully does the software solve the problem, and thus achieve the project’s aim?
2. Have the objectives of the project been fully realised?
3. Have the minimum requirements been met and exceeded?
4. How does the system compare to existing solutions?
5. How does the system compare to unassisted human performance?
5
The first criterion was based on results gained in testing the software developed as part of this project.
The second compared the objectives stated in this chapter with the findings in the report. The third
judged how well the project completed its stated minimum requirements, and how these were expanded
upon. The last two criteria were designed to judge the success of the system, relative to other existing
crossword solvers and human performance respectively.
6
Chapter 2
Background Reading
This chapter will detail how the modern crossword came into being, and what trends modern crosswords
take. The Times T2 crossword will then be defined in terms of these trends. The remainder of this chapter
will define the clue types found within the T2 crossword.
2.1 A History of the Crossword
The origins of the modern crossword can be traced back to 19th century Britain, where they were found
in children’s books and periodicals. Simple grids, like the one shown in Figure 2.1 were used, with the
answers reading the same horizontally as they did vertically.
C I R C L E
I C A R U S
R A R E S T
C R E A T E
L U S T R E
E S T E E M
Figure 2.1: An example of an early form of crossword, taken fromGodey’s Lady’s Book and Magazine
(1862): Note how the answers are identical both across and down.
7
The first modern crossword was devised by British-born journalist Arthur Wynne, and was published
in the New York World on the 21st December 1913, the grid for which can be seen in Figure 2.2.
Figure 2.2: The grid from Arthur Wynne’s first modern-day crossword.
The popularity of the crossword increased from this point onwards, eventually catching on in Britain
when Pearson’s Magazine published a crossword in February 1922. The first Times cryptic crossword
followed in 1930. Today, crosswords in Britain can be found in all national daily newspapers, as well as
in many magazines and local newspapers.
Typical modern crosswords consist of two parts - the grid, which comprises of white answer squares
and (usually) black ‘blank’ squares, and the crossword clues themselves, where each clue’s answer is a
word or number of words that fit into a specific position on the grid, either ‘across’ (going along a row
of the grid) or ‘down’ (going down a column of the grid). The answer squares themselves can be either
‘checked’ or ‘unchecked’ - checked squares are those whose letter intersects with two answers (that is,
an across clue and a down clue), and unchecked letters are those that only intersect with one. As answers
are placed into the grid, some letters of other answers are revealed via the checked squares, helping the
solver to determine the answer to these other clues.
8
2.2 Crossword Grid Types
In the English-speaking world, there are two main types of crossword grid. North American style grids
are tightly packed, where many, if not all, the letter squares are checked. Blank squares are used as infre-
quently as possible. In these grids, 180o (sometimes even 90o) rotational symmetry is commonplace; for
this, grid sizes are usually odd-numbered squares [9]. British (or cryptic) style grids on the other hand
look quite different, with more blank squares and fewer checked letters, resulting in relatively fewer
clues. The grid may have rotational symmetry; however, this may depend on what will be referred to
in this report as the “house rules” of a particular crossword. For example, the two “house rules” of the
British-style T2 crossword grid are:
• That the grid is always thirteen squares by thirteen squares in size, and
• That the grid has 180o rotational symmetry.
2.3 Other Crossword Styles
It should be noted that, in the UK at least, there are other word puzzles that though of as derivatives of
the standard crossword; a few examples include:
• “barred crosswords” - thick black lines, as opposed to squares, separate the clues, the result being
that every square on the grid is white.
• “arrow words” - the clues are contained within squares on the grid; this crossword variant is
particularly popular in puzzle magazines in the UK.
• “Codebreakers”, where each white square is given a number between 1 and 26; the number rep-
resents a unique letter of the alphabet; these letters are discovered using a knowledge of the fre-
quency of letters in the English language, and through a process of elimination.
2.4 The Times T2 Crossword
As the focus of this project was on solving the Times T2 crossword, it was important to gain an under-
standing of the clue types which appear within it. The Times T2 grid, like most published in the United
9
Kingdom, is in the British style and comprises of thirteen rows by thirteen columns, with 180o rotational
symmetry. An example of a completed Times T2 crossword grid is shown in Figure 2.3.
Figure 2.3: The grid of Times T2 crossword #3415
The grid in Figure 2.3 will be one of three examples of the Times T2 crossword used throughout
this project to test existing crossword solvers, to judge the performance of human solvers in attempting
the Times T2 crossword, and to test the software developed through the course of this project during
its development. Throughout this report, the clues of the Times T2 crossword will be referred to in the
following format:
“Clue text” (answer-length) -ANSWER (Crossword edition: clue number)
The format above is for a reference to a clue and number of letters in the clue’s answer, with the
actual answer, together with the crossword and clue number from which it was taken.
2.5 Clue Types
The Times T2 crossword has two main clue types; definition clues and knowledge clues. It was not
unrealistic to assume that both clue types will require different forms of processing in order to obtain
10
possible answers. In the case of the human solver, trying to obtain the answer to a definition clue, they
may resort to using a thesaurus to look up synonyms of the clue. However, for a knowledge clue, a
web search may prove to be more useful to them in finding the correct answer. Therefore, an automatic
crossword solver, given any clue from a T2 crossword, needs to have the ability to determine what type
that clue belongs to.
2.5.1 Definition Clues
The most basic form of definition clues can be categorised as “synonym” clues, as answers to these are
a synonym of the clue word. Two examples of this clue are, “Select”(6) -CHOOSE (3414: 1D), and
“Blissful”(7) - IDYLLIC (3419: 3D). As synonym clues are always one word in length, no consideration
of the word’s type and (to some extent) context would be required by an automated solver, and could be
instantly labelled as such.
Definition clues which could prove harder for a computer system to identify as such are those that
are made up of multiple words. These were identified as “multiple-word definition” clues. Taking as an
example, “Ran away”(4) -FLED (3415: 17A), although a human can easily determine that the clue is
clearly a definition of the answer, a computer system would have to perform some kind of word overlap
technique to check the answer against the clue.
In some instances, synonym and multiple-word definition clues were made more complex by a con-
text specified in the clue. Fortunately, context in the Times T2 can be easily identified due to being
italicised text that is parenthesised, and placed at the end of the clue (or section of clue). For example,
“Sleeping (archaic)”(4) - ABED (3415: 19D). The number of different contexts used in the T2 cross-
word is limited to word classes, such as acronyms and colloquialisms, and a few European languages.
2.5.2 Knowledge Clues
Although knowledge clues usually have more words in the actual text of the clue than definition-type
clues, they can be much harder to determine as such. For example, “Respectful of opinions different
from one’s own”(7) -LIBERAL (3415: 21A) contains seven words, but the clue itself is essentially a
definition of the answer. Despite this, there are some give-away signs of a knowledge clue:
• Missing words in the clue: for example: “Yehudi -, violinist”(7) -MENUHIN (3410: 14A).
The answer will always be the missing word, and will come immediately after a “target word” in
11
the clue, that is, the word in the clue immediately preceding the blank, or the word immediately
succeeding the blank, depending on the clue’s structure. By using this “target word”, and ensuring
the potential answer fits the answer pattern, it should be possible to return a highly confident result
from these clue types.
• A proper noun within the clue’s text is usually a strong indicator of it being a knowledge clue, as
the answer more often than not will relate to that noun. For example, “Mountain where Delphi
is located”(9) -PARNASSUS(3427: 5D). The difficulty in using this approach in an automated
system would be determining which words if any are proper nouns. The rules of the English
language state that a proper noun is always capitalised. It is not enough to simply look for a
capitalised word in a given T2 clue, as the first word in all the Times T2 clues are capitalised.
Therefore, one of the checks a working solution may need to do is to determine the class of the
first word of a clue (if necessary).
• Italicised phrases or words in a clue are once again used, though this time to indicate the title of
a work, such as a book or a television programme. The answer will relate to the work mentioned.
For example, “A Study in Scarletauthor” (5,5) -CONAN DOYLE (3415: 22A); Arthur Conan
Doyle being the author of the book “A Study in Scarlet”.
In all cases, having knowledge of some of the letters of an answer limits the number of potential
answers, giving higher confidence for any words that still match the answer pattern.
2.5.3 Other Clue Types
One extension from these basic categories are clues which consist of multiple parts, delimited by semi-
colons (or in some cases, commas). In one sense, an automated solver could have greater confidence
finding the correct answer here as potential answers found using one part of the clue can be cross-
checked with those of another. An example of this clue type is “Rod; shaft; European”(4) -POLE
(3414: 8A).
A further complication in clue categorisation, though one that occurs very rarely (as shown later in
Figure 2.4), is that the text of a clue may be dependent on another clue’s answer. For example, “East
14 Dn town”(5) (3413: 17A) cannot be solved without first knowing the answer to 14 Down, “English
breed of chicken”(6) -SUSSEX(3413: 14D). Substitution this answer into 17A gives “East SUSSEX
town”(5), and this in turn gives the correct answer ofLEWES.
12
2.5.4 Clue Type Frequencies
Different clue types will undoubtedly require different processes in order to retrieve an answer or set
of potential answers. To help establish which clue types form the majority of clues in the Times T2
crossword, a frequency count of clues types was acquired using Times T2 Crosswords #1 - #10 found in
[2]. For multi-part clues, each part of the clue was categorised independently.
Clue Type Times Found % of Matches
Synonym 89 0.301
Multiple-Word Definition 116 0.392
Context-Dependent 19 0.064
ALL DEFINITION CLUES 224 0.757
Missing Word 5 0.017
Proper Noun answers 21 0.071
Other noun answers 45 0.152
Answer-dependent clues 1 0.003
ALL KNOWLEDGE CLUES 72 0.243
Figure 2.4: The frequencies for the clue categories of 296 clues or part-clues, taken from Crosswords
#1-#10 ofThe Times T2 Crossword Book 7(2004)
As can be seen from Figure 2.4, the overwhelming majority of clues fit into one of the two main
definition clue categories. These, together with the noun answer clues, make up over 90% of the clues
studied, and therefore are the two clue types which the final solution needs to have the most developed
and accurate search methods for. The category of “Missing Word”, although making up less than 2% of
clues studied, is arguably the one category for which the most accurate answers can be be retrieved, and
therefore if such clues are present within a given crossword, would make for the most logical place to
begin solving that crossword; consequently, it is logical to also develop a robust method for accurately
solving clues of this type.
13
Chapter 3
Existing Crossword Solvers
3.1 Overview
For such a popular pastime, it may come as a surprise that there are few existing applications that can
give out answers to entered crossword clues. The more popular ones that were available are detailed
below. Each was tested using three editions of the Times T2 crossword, with the results compared to a
single human solver to get an estimated of how well these systems performed.
3.2 PROVERB and One Across
The PROVERB system was developed as part of research carried out in 2002 by Littman et al[10].
The system was designed to automatically solved editions of the New York Times crossword. It uses
a probabilistic model that generates for each clue a “weighted candidate [answer] word list”, generated
from one or more “expert modules”. Over a number of iterations, answers are inserted into the grid, as
other candidate answers are ruled out. The system achieved a success rate of 95.3%.
One Across[20] is the on-line crossword clue solver system that stemmed from the research of
Littman et al[10]. The site allows a user to input a clue, together with the number of letters in the clue’s
answer. An unknown letter is marked using the ‘?’ character. Known letters can be given one of two
confidence ratings indicated by the letter’s case - lower-case letters for a 60% confidence, and upper-case
letters for the higher 95% confidence. Once submitted, the site returns a list of possible answers; these
14
are sorted by the system’s own confidence rating, which ranges from **** (very confident) down to *
(not so confident). A example of these results is shown in Figure 3.1.
The system works in a straightforward manner. All the user has to do is enter their clue and answer
pattern, and press “Go!” to retrieve the set of answers. However in practise, these answers are not
always accurate; sometimes the system returns words which clearly do not fit the user’s input. As the
PROVERB system was designed to solve American-style crossword grid, the implication is that that it
has a greater dependency on checked letters being present in answers (indeed, the example grid cited in
[10] has every white square checked). To the system’s credit, although it was developed with American
crosswords in mind, it does return both American and Anglicised spellings when such a difference oc-
curs.
Figure 3.1: The One Across system
3.3 Jumble and Crossword Solver
Similar to One Across in concept, the Jumbo and Crossword Solver (JCS) is an on-line answer retrieval
system[6]. Users enter a pattern (‘?’ for blanks, letters if known otherwise), selects whether the pattern is
an anagram or a standard crossword answer pattern, then submits this information to obtain results. The
15
system’s main strength is the depth of its word list; it claims to have “over 300,000 words” in its database;
indeed, based on some of the patterns entered, its word list does appear to be more comprehensive than
that of One Across. Unfortunately, there is no way to enter a clue into the system to help narrow the
search further.
3.4 Crossword Maestro for Windows
This is a commercial package that claims to be “world’s first expert system for solving cryptic and
non-cryptic crosswords”. From both the descriptions on the website[11], the system’s most prominent
feature is the way it tells the user, in natural language, the processes it goes through to obtain its answer,
as can be seen in Figure 3.2. Unfortunately, due to its high retail value ($79.99), and the lack of a
freely-available demonstration version, the system was unable to be tested.
Figure 3.2: Screenshot of the Crossword Maestro system.
3.5 Testing of Existing Solutions
To establish the success rate of the available existing computer-based solvers (CBSs), each was tested
out, using three editions of the Times T2 crossword. The intention of these tests was to find out how
these automated solvers compare to a human solver. As One Across was the only readily-available
system that could take both a given crossword clue and it’s answer pattern, it was used as the foundation
for the tests. The basic algorithm for the test was as follows:
16
1. Number of newly inserted answersN = 0.
2. For each unsolved across clue, input the clue and pattern into the One Across system.
3. If one answerA appears with a confidence rating of *** or ****, and all other alternate answers
have rating of ** or less, insertA into the grid; incrementN by 1.
4. If all answers have scores given ** or less, do not insert anything into the grid.
5. Repeat steps 2 to 4 for all unsolved down clues.
6. If N > 0, go to step 1; else we have either solved the crossword, or the system cannot find any
more answers. (Other Solvers Only) If|N|= 0, go to Step 7.
7. (Other Solvers Only) If there exists an incomplete answerAi = (l1, ..., ln) (wheren is the number
of letters in the answer) within the crossword gridG such that at least 1l in A is a letter, input
A into the solver. LetS be the set of possible answers returned by the solver. If|S| = 1 then let
A = S1, else leaveA unchanged.
8. (Other Solvers Only) Repeat Step 7 until|Ai |= 0, or if |Ai |> 0,∀Ai , |S|> 1.
Steps 1-6 were used for the One Across system only. Steps 3 and 4 ensured that only the most
confident, most probably correct answers were inserted into the grid. Across clues were solved first,
then down clues afterwards, in each iteration to maximise the use of checked letters.
For all other solving systems used in conjunction with One Across, steps 7 and 8 were additional.
Step 7 ensures that totally incomplete answers are not searched, as this would return all words of that
answer’s length; as there would undoubtedly be thousands of alternative matches returned for words
between three and thirteen letters in length, the task of determining the correct answer would be nigh on
impossible. Step 8 ensures that once the crossword has been completed, or the crossword cannot find
anymore confident solutions, the algorithm terminates.
As well as testing the CBSs, the author attempted to solve each of the three crosswords, to get a
rough estimate of how the CBSs compared to a human solver. To ensure a fair comparison could be
made with the CBSs, the author’s attempts to solve the crossword were done before testing the CBSs to
eliminate the possibility of answers being recalled from memory. No dictionaries, thesauruses or other
sources of help were used during the author’s tests.
17
Solver Used T2 Crossword Total Correct % of Correct
Clues Letters Clues Letters Clues Letters
One Across 3415 24 118 8 40 0.333 0.339
T2 Book 7 #1 28 118 12 59 0.428 0.500
T2 Book 7 #2 22 108 10 56 0.455 0.519
AVERAGE 0.405 0.453
One Across & J&CS 3415 24 118 8 40 0.333 0.339
T2 Book 7 #1 28 118 12 59 0.428 0.500
T2 Book 7 #2 22 108 10 56 0.455 0.519
AVERAGE 0.405 0.453
Human Solver 3415 24 118 16 91 0.667 0.771
T2 Book 7 #1 28 118 17 78 0.607 0.661
T2 Book 7 #2 22 108 13 68 0.591 0.630
AVERAGE 0.622 0.687
Figure 3.3: CBS and human solver testing results.
As can be seen from Figure 3.3, the performance of the CBSs was reasonably consistent, though
comparatively poor. One Across, on average, managed to find answers to only 41% of the clues fed into
it, with a success range of just over 12%. Combining this system with the Jumbo and Crossword Solver
did nothing to improve these results. Compared to the human solver, the tested systems did not compare
favourably, with an average of 62.2% of clues solved.
One clue/answer pair from the crosswords above and two clue/answer pairs from two other editions
of the the T2 crossword have been selected to highlight some of the downfalls of the One Across system:
“Baghdad its capital”(4) -IRAQ (T2 Book 7, #1: 1A)
Searching with any number of checked letters did not bring up the answerIRAQ at all. However,
if the word “Baghdad” was altered to the incorrect spelling of “Bagdad”, the answer was returned with
maximum confidence. This was a typographical error within the One Across system.
18
“Buffet with roasts”(7) -CARVERY (3416: 17D)
Searching forCARVERY in the Oxford English Dictionary (OED) [18] returned up the following
definition, which closely fits with the clue: “c. A buffet or restaurant where meat is carved from a joint
as required”. However, searching within an American English dictionary such as the on-line implemen-
tation of the American Heritage Dictionary of the English Language [15] gave no result for the word.
A check on One Across reveals that “carvery” is not in their repertoire of words either, rendering this
system alone useless for solving the Times T2 crossword.
“Greek dish of yoghurt with cucumber and garlic”(8) -TZATZIKI (3414: 5D)
Searching forTZATZIKI using the OED did return a result: “Greek dish consisting of yoghurt with
chopped cucumber, garlic, and (sometimes) mint, esp. as an hors d’oeuvre or dip”; Although not a word
of English etymology, the word has been used within the English-speaking world, and therefore finds
itself included in the OED. The AHD did not return any results forTZATZIKI . One Across, however,
did return the word, which may be an indication that the word has appeared in other crosswords before.
These case studies illustrated the fact that a final solution to the problem needed to use highly com-
prehensive word lists containing British English spellings to look up answers from.
3.6 Conclusions
The existing crossword solvers contain a number of features that were considered desirable to have
in this project’s own crossword solver. One Across uses a confidence system to rate its answers to
clues. A similar feature could be used to judge answers in the crossword solver. Crossword Maestro’s
GUI allowed users to see both the grid and the solving process simultaneously. These features were
considered as extensions to the minimum requirements of the project.
19
Chapter 4
Answer Acquisition Technologies
The clues of the Times T2 crossword were categorised in Chapter 2, where it was also established that
for each type of clue, different processes were required to come up with candidate answers. This chapter
will assess some of the possible answer acquisition technologies which were considered to be used in
the crossword solver. The chapter concludes with final choices made regarding which technologies will
be used in the final solution.
4.1 Google Web APIs
Google, one of the world’s widely known and widely used Web search engines, developed a set of APIs,
whereby software engineers can incorporate the search facilities of the Google website within their own
programs. In the context of knowledge clues, using a search engine was considered to be one of the
more successful ways to find an answer to a clue. Although search engines such as Google cannot be
fed any information regarding answer length, a human solver could quickly detect which word or words
are likely to be the answer. Using the example of “Capital of Queensland”(8) -BRISBANE (3418: 1D),
the following snippets are from the first three results given by Google, with the search term “Capital of
Queensland”:
• “... cootha’s travel reports. Brisbane - Capital of Queensland, Australia. 8 votes. Brisbane is a
beautiful river city, and is the capital city of Queensland. ...”
20
• “Brisbane Australia. South East Queensland. Brisbane - Capital of Queensland. Brisbane, the
sub-tropical capital of Queensland, is approximately ...”
• “... technology industries. My Government has embarked on three key strategies to attract more
venture capital to Queensland: ? Identifying ...”
From these three snippets alone, if only the eight-letter words were taken, the word “Brisbane”
appears 5 times, and “embarked” only once. Furthermore, the word “Brisbane” is easy to determine
as a proper noun (as its first letter was capitalised on all appearances within the snippet), gave greater
confidence that this was indeed the correct answer to the clue.
The Google Web API also allows developers to use all the search modifiers found on the Google
website. One of these modifiers is the ability pass whole phrases to the search engine, done by literally
quoting sections of the search term using the quotation (“ and ”) characters. By applying this to a clue it
was possible to narrow the web search with the intention of finding results relevant to the clue. Using as
a example the clue “Russian composer (Prince Igor)”(7) (3432: 6D), Figure 4.1 shows that by passing
the phrase “Prince Igor” to the search query, the correct answer ofBORODIN appears with a higher
frequency and proportion.
Russian composer (Prince Igor) (7) Russian composer (”Prince Igor”) (7)
Word Times Found % of Matches Word Times Found % of Matches
include 2 14.29 Borodin 4 40
Elibron 1 7.14 Steppes 1 10
edition 1 7.14 Central 1 10
Borodin 3 21.43 pianist 1 10
Steppes 2 14.29 ancient 1 10
Central 2 14.29 already 1 10
unusual 1 7.14 MEMOIRS 1 10
already 1 7.14
MEMOIRS 1 7.14
Figure 4.1: A comparison of search engine results on the same clue, one without the use of phrasing and
one with.
21
There are a number of disadvantages with relying on the Google Web API within a piece of soft-
ware. The most obvious of these is the need to be connected to the Internet. For users who only have
narrowband connections, the solving process will appear to be much slower as the retrieval of results
will take more time to complete. Secondly, the current version of the API only returns the first ten results
of a search; the upshot of this is that searches done through the Google will need to be as specific as
possible in order to retrieve only the most relevant results. Finally, the current Web API limits searches
per user to one thousand per day.
4.2 Question Answering and the RASP system
Question Answering (QA) is a natural language processing (NLP) technique whose aim is to give an-
swers to questions phrased in natural language. Applied to knowledge clues, the QA problem becomes
one of finding an answer of pre-determined length to the clue given. It needs to be noted that none of the
Times T2 knowledge clues are ever posed as questions - they never ask, for example “who was X?”, or
“what are Y?” or “where is Z?”. However, for a system to find answers to clues, some of the processes
applied to input text strings in QA can be applied here.
For successful question answering, there is usually a reliance on a vast data store from which to
retrieve facts. There has been much research into using the World Wide Web as this data store, and
is usually done via a process of formatting the question into a suitable query for the search engine,
performing the query and retrieving the results, and then these results used to extract the correct answer.
Radev et al[19] use the idea of “Query Modulation” to turn a question into a search term, using supported
operators such as OR and the use of phrase delimiter.
For pre-processing data, the Robust Accurate Statistical Parsing (RASP)[1] system may be a very
useful tool. RASP can process a given body of text using the following steps:
1. Tokenisation - the process of separating words from punctuation
2. Part of Speech Tagging- individual words are categorised by their type. RASP recognises that
some words can having different meaning depending on the context in which they are used, and
accommodates for this by assigning probabilities to each type.
3. Lemmatization - words are deconstructed into their individual parts. For example, the word
“unenviable” would be broken down into “un + envy + able”.
22
4. Parsing - this final step constructs a parse tree based on information form the previous steps.
The resulting parse tree generated will require post processing in order to strip away the encoding,
in order to leave the desired end result of phrase groups. These can then be passed into a Google Web
API search or wordlist, with the aim of improving the probability of finding the correct answer from the
results. The main RASP program,runall.sh , is a shell script, therefore can be integrated into most
programming languages easily.
4.3 Word Lists
4.3.1 WordNet
WordNet is an “on-line lexical reference system whose design is inspired by current psycholinguistic
theories of human lexical memory”[14]. It has been in development since 1985, and was first published
in 1990. The most basic use of WordNet come through the WordNet browser, which works in a similar
manner to an online dictionary such as that of the OED[18]. However, its power lies in the way it labels
words and groups them together. It uses the idea of synonym sets, or “synsets” to group words.[14].
The WordNet data files themselves are organised by word types, and are plain text files that contain the
words, their definitions and links to related words, such as the word’s synonyms and hyponyms.
WordNet has already been used in the development of other applications. It was used in the PROVERB
system that powers One Across[10]. Hovy et al developed a system called Webclopedia[7], which
used the WordNet system and their own specially-built search engine and parser to retrieve answers to
manually-entered questions. The WordNet project is in version 2.0 as of the beginning of 2005, and is
still active.
4.3.2 Moby Project Word Lists
The Moby Lexicon Project was carried out by Grady Ward at the Institute for Language Speech and
Hearing at the University of Sheffield. The project comprises of various ASCII-text files, each which
holds different type of words.
With regards to the crossword solving problem, the Moby Project contains three files that can be
regarded as being of great significance. Two of these are pure word lists files; one contains a list of
over 350,000 words, “excluding proper names, acronyms, or compound words and phrases”[21]. The
23
other list contains 250,000 instances of words and phrases excluded from the first list. There are other
word lists available that could prove to be useful. Such as those contain male and female names. How-
ever these are merely subsets of words from the two aforementioned lists, which when combined, it is
claimed, from “the largest [English] word list in the world”[21].
The third file is the Moby Thesaurus. This “is the largest and most comprehensive thesaurus data
source in English available for commercial use”[21]. The file contains in excess of 2.5 million synonyms,
spread across 30,260 root words or phrases. The sheer volume of data from this file alone should prove
to be extremely useful in finding synonyms of words, which, as detailed previously, form a considerable
percentage of the clues in the Times T2 crossword. One of the disadvantages of the thesaurus file
however, is that it has no formal structure - it is in effect one very large text file, and searching through
the whole file each time will be computationally expensive. As the file essentially contains raw data,
with no word senses attached to the synonyms, some form of word sense disambiguation post-processing
may be required on results returned using the thesaurus[12].
The other problem that affects the thesaurus file, and the two main word list files, is their size -
all are several megabytes in size. This will no doubt result in access times being slower than for other
answer acquisition techniques. However, the huge number of words documented in these word lists has
the potential to provide an invaluable resource of data for use in the final solution.
4.3.3 CELEX
CELEX is similar to WordNet in that it is a electronic database of lexical information. Whereas WordNet
placed more of an emphasis on relationships between words, CELEX focuses on the phonetic makeup
of words and word morphology[3]. Databases for CELEX are available in Dutch, German and English,
with the former two languages having the bigger data sets. The CELEX project was ended at the start of
2001.
4.3.4 Word List Comparisons
As 4.2 shows, the Moby Project Thesaurus contained by far the largest collection of lexical data1. As
it was the only thesaurus file from the various projects that have been researched here, its inclusion into
the solving system was considered essential for solving synonym clues.
1estimated figure taken from the Moby Project website.
24
List Number of word strings Word definitions
WordNet 2.0 152,059 YES
Moby Project Word Lists 611,756 NO
Moby Project Thesaurus over 2.5 million NO
CELEX 160,594 NO
Figure 4.2: Comparison of Word list files
For dictionary-style searches, WordNet was the only resource listed in Figure 4.2 that offered full
definitions of word forms. As CELEX was intended to be used a tool for morphological and phonological
study rather than semantic analysis (as shown by Peters[16], its use within the crossword solver was
disregarded. The smaller number of words offered by the WordNet project would have not posed much
of a disadvantage for the reason that, unlike CELEX, the WordNet project was (as of April 2005) still
active, so it is highly likely that in a future release, the content offered by WordNet will eclipse that of
CELEX.
The word lists of the Moby Project, given the size of their content2, contain a number of words used
in the English language comparable to that of world-authority dictionaries such as the Oxford English
Dictionary. For this reason, these lists were used as the basis for any brute-force searching needed to be
done in the final solution.
4.4 Brute Forcing and Pattern Matching
One of the ways in which a human solver may attempt to find an answer to given clue is to simply look
up all possibilities which fit the answer. For example, if a human solver has a six-letter answer and
knows the first two letters, but cannot progress further, they may resort to simply looking up every word
which fits the given pattern, and then use the definitions of the words to see which fits the clue best.
A computerised approach to this would be to give a program a pattern in the form of a regular
expression and return a list of all words (or compound words) which match that pattern. Using the
returned list, it can be possible to eliminate letters at certain positions. This can be especially useful
for checked letters, as by reducing the number of possible letters at these positions, it is likely that the
number of possible words which fit the answer in the alternative direction will also decrease.
2354,984 single words + 256,772 compound words, taken from 2 files of the Moby project
25
Ideally, a brute-force search should be used only as a last resort for retrieving answers. To brute-force
effectively, there is the need to consider all possibly answers used in the Times T2 crossword. There is
infrequent usage of archaic words, acronyms and colloquialisms, as well as proper nouns, which cover
a wide range of topics from people’s names, to geographical locations and works of film and literature.
In short, brute-forcing will require very comprehensive sources of data from which to match the pattern
with words. This in turn will lead to efficiency and memory issues if this it is required to look through a
number of large word list files.
4.5 Conclusions
As single-word synonym clues were the easiest clue types to categorise, these therefore could be fed
directly into a Moby thesaurus or WordNet search (or some combination of the two), in order to retrieve
matching words. Multi-part clues that are made up of synonym clues can be solved in a very similar
manner, with the additional step of cross-checking results for each definition. For multi-worded defini-
tion clues, the RASP system could be used to extract a key phrase word from the clue and then use this
to process the clue similarly to synonym clues.
Knowledge clues could also be processed using RASP, to again determine key phrases and/or words;
these can then be passed into the Google Web API. If this fails to produce an outright correct answer,
then the key phrases or words be passed to the WordNet dictionary or the Moby wordlists.
In the event that no satisfactory possible answers are returned by one answer acquisition process,
the alternative process should be used. In the worse case scenario that no answers can be found at all, a
brute-force search on the Moby Project wordlists, using the answer string as a regular expression, will
be used to retrieve words. These words can themselves be checked using WordNet to determine their
viability as correct answers.
26
Chapter 5
Methodology and Design
5.1 Development Methodologies
There are a number of development methodologies which were considered for developing the final
solution. These are briefly described below:
• Waterfall Model - The waterfall model is one of the more well-known development methodolo-
gies in software engineering. The model, although have a varying number of stages, essentially
keeps the same rigid structure. As Jesty[8] states, the waterfall model is particularly useful for
short projects. For larger projects, its rigid structure is unsuitable, as software requirements are
likely to change over time.
• Prototyping - Prototyping has the advantage of accelerated production of software, and allows
for changes in design. Its main disadvantage is that maintenance of code becomes an issue as
documentation of an evolutionary design becomes more difficult[8].
• Spiral Model - This model splits development into four ordered stages, which can be roughly
labelled as “Determination of Objectives” “Evaluation of Objectives”, “Development” and “Plan-
ning”. Each stage is continually visited in the same order, allowing for incremental development
of software. As the spiral model is risk-based, rather than product based[8], it is suited for large
projects.
27
• Evolutionary Development - this model’s emphasis is on planned incremental delivery[4, 5].
Software functions are added at planned iterations, with existing functionality either removed,
modified or left unchanged as a result of testing previous iterations. The advantage of this ap-
proach is that feedback can be gained from early iterations (which act more as prototypes). How-
ever careful planning is required, as the model does not consider time constraints[8].
As a consequence of research into Natural Language Processing techniques, that the time allocated
for software development has decreased. To work with a rigid methodology such as the Waterfall Model
leaves no time to correct any errors found within the software design. The solving process is also
likely to require many refinements before performing at a satisfactory level, leading to direct conflict
with the Waterfall Model which “assumes everything works perfectly”[8]. The Spiral Model is also
inappropriate for this project, as it suited to longer periods of development. The other two methodologies
focus on iterative development. However, Evolutionary Development has better provision for addition
of functionality over time. “Functionality”, in the case of the crossword solver, can be thought of as the
various solving techniques which can be applied to a crossword puzzle - by adding these to a system in
an iterative manner then evaluating the solver’s success rate, these can be refined to improve the solving
process. Therefore Evolutionary Development will be the methodology used to develop the software.
5.2 Requirements Gathering
To establish the attributes needed to describe the various components of a crossword puzzle, Natural
Language Requirements Gathering was used.
5.2.1 Crossword Puzzle Representation
• A crosswordconsists of agrid and aset of clues.
• A grid consists of a number ofsquares; these squares can either beblackor white.
• A grid’s black squares are always empty. A grid’swhite squaresare either empty, or contain a
single alphabetic character.
• Adjacentwhite squaresform answerswithin thegrid. Every answer has anumber, adirectionand
a fixedlength. An answer only has one correctword which fits its squares.
28
• A set of cluesdirectly corresponds to theanswerson the grid. Every clue containsclue text, and
at least oneanswer word. The total length of these answers words must be exactly equal to the
length of the answer on the grid.
The list above shows that three distinct objects were required to represent a crossword puzzle. A
Grid object will be needed to represent the physical layout of a crossword grid. A Clue object will be
required to hold the attributes of a single crossword clue; a set of Clue objects will hold all clues in the
puzzle. An equal number of Answer objects will also be needed to store all possible candidate answers
which fit the corresponding pattern on the grid.
To allow end users to easily extract information from the crossword grid, some kind of visual repre-
sentation of the crossword grid was required. A number of ways to implement this were considered:
• Via the console, using ASCII characters.
• GUI representation, using graphical components.
• HTML representation, using< table> tags.
ASCII representation, although relatively easy to code, poses two immediate problems: developing
a way of distinguishing between black squares and empty white squares, and developing a suitable
method for inserting the layout of the grid. Using, for example, a co-ordinate input method for setting
black squares is likely to be very time consuming. GUI representation is likely to be the most complex of
these options to implement, but once created will have has numerous benefits; users will be able to select
which squares are to be black through direct interaction with the grid using the mouse. The GUI can
also provide the interface for entering clues into the grid, and provide users with feedback with regards
to the solving process as clues are entered into the grid. HTML tables are simple to design, though will
require dynamic components such as CGI to switch squares between black and white, and for the grid
to interact with any automated solving process so it can be updated.
Based on the information above, a GUI is the favoured option of the three as it can be design can be
tailored to fit the requirements of the system.
5.2.2 GUI Functional Requirements
The functional requirements of the GUI will take the form of user inputs. The most essential user inputs
that the GUI should be capable of performing are:
29
1. Selection/deselection of crossword grid squares,
2. Confirmation of the grid’s design,
3. Clue selection and clue text entry,
4. Entry of the number of answer sections in an answer (and, if necessary, selection of word delimiter),
and
5. Starting the solving process.
Extensions to these basic user inputs include:
6. Resetting the design of the grid and clearing the grid of answers,
7. Manual entry of answers into the grid, and
8. Providing rotational symmetry design options,
Additionally, the GUI will need to display across and down clues as two separate groups, as is
conventional for nearly all crossword puzzles.
5.3 Programming Languages
Below are some of the possible languages considered for developing the final solution.
5.3.1 Java
Advantages for development of a system using Java include its direct compatibility with the Google Web
API, meaning this technology could be incorporated straight into an application. Java has strong GUI
support, provided by the two class hierarchies, the older, less portable AWT (java.awt.*) and the newer
Swing libraries (java.swing.*). Java code was design to be portable, which will help speed up develop-
ment as there will be no restrictions as to choice of development platform. The author’s prior knowledge
of working with the Java language is also a strong advantage in the language’s favour. Documentation
of Java code is highly comprehensive and freely-available, which will aid system development. Finally,
support for regular expression pattern matching is available via the classesPattern andMatcher .
30
Disadvantages to using Java are, firstly, it has a slow runtime speed; tests by Prechelt[17] show that
for basic text manipulation programs, Java runs approximately three times slower than C and C++. Java
is also a complex language to write code for - the number of lines of code needed to write identical
programs in C++ and Perl is much smaller, as Java coding standards require that accessors and mutators
be declared for individual attributes of a class.
5.3.2 Perl and Python
The biggest advantages to using either Perl or Python as a development language are their built-in regular
expression support. Perl especially excels at text processing, which has been established as forming a
substantial part of the final solution’s solving process. Both languages computation speeds are also better
than that of Java’s. Prechelt[17, p.17] showed that Perl runs approximately eight times faster than Java,
Python slightly slower at around four times faster
Disadvantages to using either language are the author’s has no prior experience coding in Perl, and
relatively little experience; the time needed to learn the language would have to been considered as part
of the development stage. Also, neither has native GUI support; Perl does not support the use of GUIs
directly. Python GUI’s rely on the use of third-party modules, or custom-designed modules. This runs
the risk of introducing new problems into the design of the final software solution.
5.3.3 Conclusions
In considering choice of programming language, three criteria were used to evaluate the three languages
detailed above. These were, prior knowledge, computation speed, and GUI capabilities. As all languages
support the use of regular expressions, this was not considered as a criterion. Of the three languages, the
author has had most experience using Java, which, although a computationally expensive language, has
in-built GUI capabilities. Perl and Python are both faster computationally, but given that development
time has been shortened, the added time required to learn either language to the degree that a high-
quality solution can be produced using them meant that these languages had to be disregarded as viable
options. Therefore, Java was the programming language chosen to develop the crossword solver.
31
5.4 UML and Class Descriptions
In order to visualise the systems requirements as Java classes, UML was used to outline the interaction
between the components of the crossword solver. Explanations of the classes outlined in Figure 5.1 are
given below:
Figure 5.1: The UML class diagram for the crossword solver.
5.4.1 CrosswordGrid
This class holds all the information needed to describe an actual crossword grid. It contained the actual
grid, and stored the set of clue/answer pairs asVector objects. Two auxiliary classes (not shown in
Figure 5.1) were used by this class, one which described the actual data contents of the grid, and one
used for rendering the grid within a GUI, to ensure it had the intended look of a printed crossword.
5.4.2 Clue
The Clue class holds all details relating to an individual clue; that is, the clue’s text, its number and
direction, and its starting coordinates on the grid. This class was also where clues were categorised; using
theses categories, the solver could select what it thought to be the most appropriate solving technique
for the clue/answer pair.
5.4.3 Answer
The Answer class holds all details related to an individual answer; this included the current regular
expression of the answer (as derived from the grid) andVector objects to hold possible answers with
their corresponding frequencies (and, when relevant, the bias). The main concept behind this class was
speed; when a given clue/answer combination was first searched using one of the search methods, it was
anticipated that the process of retrieving answers would be relatively slow; given the size of word list
32
and thesaurus files, performing a search was expected to use much processing time (more so when using
Java). Similarly, a Google web search may take a number of seconds to retrieve its data. On the first
iteration of the solving process, results for a given clue/answer are saved into an instance of the Answer
class. Any subsequent searches (that is, once all clue/answer combinations have been search for the first
time) were then performed using the data stored within the Answer object.
5.4.4 CrosswordGridGUI
The CrosswordGridGUI class extended from the CrosswordGrid class, to provide a visual representation
of the attributes stored in its superclass. The decision was taken early to have this class be the one
to interact with theSearch hierarchy, as it was felt that by separating the data representation from
the crossword solving techniques, implementing theSearch hierarchy could be done in iteratively
(following the principles of the chosen methodology).
5.4.5 RASPService
The main function of theRASPService was to run the RASP system using a clue object’s text as
input. Once the output from the generated parse tree has been interpreted, it was intended to return
phrase groups, which would be passed to the appropriateSearch subclass.
5.4.6 Search
The Search class was designed as an abstract superclass. It provided concrete methods required to
retrieve the various aspects of a search result, and the abstract methodsdoSearch() , to preform a
search. All Search subclasses work by calling their implementation ofdoSearch , which sets three
Vector objects based on the results of the search; these are,words , containing the list of words that
matched the answer pattern,wordFrequencies containing the frequency of each word in word, and
wordBiases , containing bias information for each word (though this data structure may not be used
by all Search subclasses. The following classes are subclasses ofSearch , each of which implements
a different search technique.
33
5.4.6.1 MobyListSearch
TheMobyListSearch class provided the crossword solver with brute-forcing capabilities. Using the
two main Moby Project wordlists, it took an answer pattern and returned all words which matched it.
5.4.6.2 WordNetSearch
This class was used on clues determined to be synonym type clues. Although the class is named Word-
NetSearch, in fact it uses a combination of the Moby Thesaurus and the raw data WordNet files. Firstly,
the clue string and answer pattern were passed through the Moby Thesaurus file, to return all possible
related words to the clue string. For synonym clues, these results were then compared with the clue
string’s definition(s) (if these exist) in WordNet. If any of the result strings were found in these defi-
nitions, extra bias was given to these words, on the basis that these were more likely to be the correct
answer for the clue.
5.4.6.3 GoogleService
The GoogleService1 class took a clue’s text as a string, and a regular expression pattern, and performed
a search using the Google Web API. The class had two main search types; A basic search, which is
this class’s implementation ofdoSearch , with no special parameters of any sort, and a missing word
search, which used the preceding or succeeding word as the “key word ” in the clue to match only those
words found after or before this keyword, and then only if the word in question matched the answer
pattern. In either case, the resulting snippets2 were each processed to strip away HTML formatting. The
remaining text was then compared against the regular expression, with resulting words being obtained
from these snippets. The Google Web APIs also allowed for advanced search parameters to be set. The
only parameter explicitly set was that the language of all results obtained were to be in English only. For
the crossword solving problem, this was a more than reasonable setting to have set.
1The name “GoogleSearch” would have been the desired choice for this class’s name, to keep class names consistent.
However, there already exists such a named class in the Google Web API.
2The current version of the Google Web API dictates there to be a maximum of ten results returned from any search
34
5.4.7 Documentation of Software
The documentation of the classes was created using the Javadoc tool[13], and can be found enclosed
with the software.
5.5 Development Plan
As an evolutionary development methodology had been used to create the crossword solver, careful
planning of each iteration was required to ensure each set of changes were significant from the previous
iteration, whilst allowing for changes that may be required as a result of testing previous iterations. The
original development plan in Figure 5.2 outlines the functionality to be added at each iteration. The
most critical aspects of the system (that is, the components of the solving process) were developed first,
before being integrated into the system’s GUI. (‘GUI’ in the table refers to theCrosswordGridGUI
class.)
5.6 The Solving Process
Once the crossword grid and clues have been entered into the system, the process used by the crossword
solver to attempt solving the crossword can be broken down into the following four stages:
1. Categorisation of Clue Types- for each clue, the clue text was taken and categorised, so that the
appropriate answer retrieval mechanism was used.
2. Retrieval of Answer Sets- On the first iteration of the solving process, each clue was fed into its
search mechanism, from which a set of answers was retrieved.
3. Calculation of Answer Confidence- On every iteration, each unsolved answer set was compared
which the current pattern for that answer, calculated from the current status of the grid. Candidate
answers that did not fit the pattern were removed from the answer set. The remaining candidate an-
swers were each given a confidence score calculated by the formula(WordFrequency/TotalWordFrequency)∗Bias, whereWordFrequencywas the is the individual word’s frequency andTotalWordFrequency
was the total number of word instances returned in the search results. These two variables en-
sured that results for all clue/answer pairs were normalised. TheBias parameter was used by
35
Iteration Functionality Added
0 Command-line driven Moby thesaurus search
1 Command-line driven Google basic web search
2 Command-line driven Moby wordlist search
3 Command-line driven Google missing word search
4 GUI - crossword grid
5 GUI - components to manipulate grid
6 GUI - panels for clues and information
7 GUI - Clue entry panel and internal components added
8 Conversion of search classes from iterations 0 - 3 intoSearch hierarchy; integration into GUI
9 GUI - Slider component for multi-worded clues added
10 Support for multiple definition clues
After iteration 10, testing of the system began.
11 Letter restriction feature added
12 Refinements to the solving process
Due to time constraints, iterations 13 and 14 were not completed.
13 Word overlap features added
14 RASPService class added
Figure 5.2: The original evolutionary development plan for the crossword solver.
the WordNetSearch class to give higher scores to candidate answers that overlapped in the
WordNet definition of the clue phrase.
4. Insertion of Confident Answers - After each answer’s highest scoring word was determined,
that word’s score was compared with the current highest scoring word. If this high score was
exceeded, it was replaced by the current word’s score, as the solver has a higher confidence for the
current word. Once all answer set scores had been calculated, if there was a high scoring word for
the current iteration, this was inserted into the grid. The solving process then moved onto the next
iteration, moving back to Step 3. If, at the end of an iteration, no high scoring word was found,
either all words have been inserted, or there are no more confident answers to enter into the grid.
In either case, the solving process was said to be complete.
36
Chapter 6
Development and Testing
6.1 Test Plan
The first test of the software was done once all essential features of the software had been incorporated,
after iteration 9 of the development plan. Each subsequent test was carried out after significant additions,
bug removals, or optimisations of the solving process had been carried out. These improvements would
be based on the data obtained from the previous tests results, and through debug data obtaining during
the solving process. The criteria used to test the crossword solver are as follows:
1. The number of answers attempted in the crossword, as a percentage.
2. The number of correct answers found as percentages, against:
(a) the total number of answers in the grid, and
(b) the total number of answers attempted
3. The average confidence for:
(a) all answers on the grid, and
(b) all answers attempted,
4. The time required to enter a crossword grid and its individual clues into an application,
37
5. The time required to complete the solving process.
The first criterion will judge how much of a given crossword the software attempted to solve. The
second criterion will judge how accurate the software was in attempting to solve the crossword, with
respect to the whole grid, and attempted answers respectively. The third will judge how confidence the
software was in obtaining the answers. This can be seen as a measure of the validity of the mechanisms
used to obtain answers to clues.
The fourth and fifth criteria will give a measure of both how quick the software solving process is,
and to a lesser degree, how user friendly the software is insofar as entering a crossword grid and clue
input. It is important that the total time taken to enter and solve a crossword does not exceed the time
needed by a human solver to attempt the crossword, especially if the human solver can complete the
crossword with a significantly higher accuracy, as this would make the software an obsolete alternative
for solving the Times T2 crossword.
Although a 100% success rate was considered unlikely to the point of impossible, given that large-
scale research projects and commercial packages such as [10] and [11] fail to achieve such a figure, an
initial target figure aimed for was the average of correct answers attained from the testing of existing
solutions - that is, a figure of at least 41%. The three crosswords tested in Chapter 3 were used to test
the solver, to allow for direct comparison with the previously tested CBSs.
6.2 Test Results
As Figure 6.1 shows, less than 25% of clues attempted were correct in the initial test phase. As a result
of this, refinements to the solving process were required.
6.2.1 Solving Process Refinements
Changes made to the solving process upon completion the first test phase include:
• giving preference to words with higher word counts, should two or more words have joint highest
scores for a given iteration.
• adding a letter restriction method to eliminate any letters that cannot occur at individual squares,
by performing a brute-force search using a given pattern of an unsolved answer. By using such a
method, letters restricted in checked squares for one clue will also be restricted for the other clue,
38
Criteria Crossword Mean
T2 Bk7 #1 T2 Bk7 #2 #3415
Clues/Answers 28 22 24 -
Attempted 20 20 23 -
% Attempted 0.714 0.909 0.958 0.860
Solving Time (mm:ss) 02:40 02:04 02:18 02:21
Total Confidence 12.338 11.567 10.893 -
Mean (All) 0.440 0.525 0.454 0.473
Mean (Attempted) 0.617 0.578 0.474 0.556
Correct answers 7 1 7 -
% (All) 0.25 0.045 0.292 0.196
% (Attempted) 0.35 0.05 0.304 0.235
Figure 6.1: Results from Test Phase 1
which may in turn lead to more letters being restricted for that clue and any other connected to
it. Initially, letter restriction was performed as an extra initial iteration in the solving process. It
was estimated that the use of the letter restriction method would increase solving by at least 30-45
seconds, given previous observations of the brute force search method used in the solving process.
As seen in 6.2, solving times increased on average by over 75 seconds, slightly longer than predicted.
Improvements made after Test Phrase 2 include:
• limitation on use of the letter restriction method in the first iteration of the solving process. It was
found that, in the initial iteration, letter restriction worked very well for multiple worded answers,
restricting many letters at most positions, but made no impact for single word answers less than
eight letters in length. By performing letter restriction for only certain answer types, it was hoped
to decrease solving times, so they were nearer to those found in Test Phase 1.
• executing the letter restriction method for any unsolved clue/answer pairs which failed to return
any candidate answers.
• the instant insertion into the grid of answers that were deemed to be “highly confident” - that
is, those with a confidence score of 1.0, and with a word count of at least 5. This decision was
39
Criteria Crossword Mean
T2 Bk7 #1 T2 Bk7 #2 #3415
Clues/Answers 28 22 24 -
Attempted 24 22 20 -
% Attempted 0.857 1.000 0.833 0.896
Solving Time (mm:ss) 04:02 03:28 03:55 03:28
Total Confidence 8.444 7.692 5.533 -
Mean (All) 0.302 0.347 0.231 0.293
Mean (Attempted) 0.352 0.347 0.277 0.352
Correct answers 6 5 6 -
% (All) 0.214 0.227 0.250 0.230
% (Attempted) 0.250 0.227 0.300 0.259
Figure 6.2: Results from Test Phase 2
taken as it was found that clues with these confidence statistics, especially in early iterations, were
almost always correct answers.
• giving extra bias to clue/answer pairs which already had letters present within its grid slot. The
current methods used to solve the grid are such that earlier iterations of the solving process are
more likely to give correct answers to clues.
The changes made between Test Phases 2 and 3 not only reduced the average solving time by 14
seconds, but increased the confidence of answers found, and returned a greater proportion of correct
answers.
One final change was made to Stage 4 of the solving process. Previously, the algorithm for deter-
mining whether an answer was more confident than the current most confident answer was given as:
if (score > highScore ||
(score == highScore && wordCount > highWordCount)) {
highScore = score;
...
}
40
Criteria Crossword Mean
T2 Bk7 #1 T2 Bk7 #2 #3415
Clues/Answers 28 22 24 -
Attempted 23 18 15 -
% Attempted 0.821 0.818 0.625 0.757
Solving Time (mm:ss) 03:15 03:20 03:08 03:14
Total Confidence 14.935 11.296 9.152 -
Mean (All) 0.533 0.513 0.381 0.476
Mean (Attempted) 0.649 0.628 0.610 0.629
Correct answers 9 5 7 -
% (All) 0.321 0.227 0.292 0.280
% (Attempted) 0.391 0.278 0.467 0.379
Figure 6.3: Results from Test Phase 3
The codeif (score > highScore...) was felt to give too great a bias to clues with high
scores but low word frequencies (stored in thewordCount variable). This line was altered slightly
with the use of two additional variablesscoreWordCount andhighScoreWordCount :
scoreWordCount = (score*score) * wordCount;
if (scoreWordCount > highScoreWordCount ||
(score == highScore && wordCount > highWordCount)) {
highScoreWordCount = scoreWordCount;
...
}
By squaring the (normalised) score, the intension was to give a greater preference to words with
higher word counts.
The proportion of correct answers to answers attempted increased by 21% as seen in Figure 6.4,
justifying the changes made between Test Phases 3 and 4.
41
Criteria Crossword Mean
T2 Bk7 #1 T2 Bk7 #2 #3415
Clues/Answers 28 22 24 -
Attempted 15 14 14 -
% Attempted 0.536 0.636 0.583 0.585
Solving Time (mm:ss) 03:27 02:41 02:30 02:54
Total Confidence 9.545 6.085 10.448 -
Mean (All) 0.341 0.277 0.435 0.351
Mean (Attempted) 0.636 0.435 0.746 0.606
Correct answers 8 7 9 -
% (All) 0.286 0.318 0.375 0.326
% (Attempted) 0.600 0.500 0.643 0.581
Figure 6.4: Results from Test Phase 4
6.2.2 Entry of Crossword Data
To test evaluation criteria #5, three users were asked to each enter the data for the three test crosswords
into the crossword solver. As Figure 6.5 shows, the shorter input times were for the crosswords with
fewer clues to enter data for.
Crossword T2 Bk7 #1 T2 Bk7 #2 #3415
Mean Time Taken 10:37 09:21 09:50
Figure 6.5: Results of data entry testing for three users of the software, using the three test crosswords.
42
Chapter 7
Evaluation
This chapter will evaluate the project, using the evaluation criteria outlined in Chapter 1.
7.1 Project Aim
The aim of the project was to design a piece of software that could solve the Times T2 crossword.
Through research into existing crossword solvers and technologies which could be used to acquire
crossword answers, the software has been designed using an evolutionary development methodology,
implemented in Java and tested to refine the solving processes used in attempting to solve the cross-
word. The resulting crossword solver is shown in Figure 7.1. As the software only managed to obtain
33% of correct answers by the end of the final testing phase, and thus failing to fully solve a complete
edition of the Times T2 crossword, the weaknesses of the system have been acknowledged in Chapter 8
- Conclusion. If further development of the crossword solver were to take place, these weaknesses can
be addressed, which would undoubtedly improve the crossword solver’s performance.
7.2 Objectives
The project’s objectives have all been realised as follows:
• Research the history of crosswords, and understand the difficulties face by human solvers
43
Figure 7.1: The final crossword solver. This screenshot shows the solver after attempting to solve the
T2 Crossword #3415, from the fourth and final test phase. All attempted answers are correct, apart from
2D, 4D, 17A, 18D and 20A.
in completing crosswords- The early stages of the project’s lifetime were spent researching the
evolution of crosswords from simple word puzzles to modern-day crosswords. From this, the
Times T2 crossword was looked at specifically. From this initial research, a statement of the
problem was realised.
• Evaluate existing crossword solvers, noting any features used by such solvers that could be
incorporated into a final solution - It was originally expected that there would be a number of
readily-available crossword solvers to research into and test out. However, only the One Across
system [20] was specifically designed to answer crossword clues. It was with some regret that
the Crossword Maestro software could not be tested, as this software looked most promising with
44
regard to solving crosswords.
• Research possible technologies used to acquire answers to clues, which could be incorpo-
rated into a final solution - A lot of time was spent researching technologies which could be
integrated into the crossword solver, even more so once Natural Language Processing (NLP) tech-
nologies were researched. Based on this research, firm choices could be made regarding which
technologies could be most successfully integrated into the system’s solving process.
• Evaluate software development methodologies and programming languages which could be
used to develop the system, giving reasoning for final choices- Choosing the right methodology
for the project was vital to ensure that the software could be developed as effectively as possible.
Evolutionary development allowed for changes to the requirements of the system, which made the
task of exceeding the minimum requirements easier. The choice of language was also important.
In using Java, a conscious choice was made to sacrifice performances gains offered by alternative
languages in return for speedier development due to having a good working knowledge of the Java
language.
• Develop a software solution which incorporates at least all the minimum requirements stated
in this report - This was where the majority of time spent on the project was placed. Development
of the crossword solver began with the coding of the first iterations at the end of December, and
ceased by the beginning of April. In between this however was done the extra research into Natural
Language Processing, so for a few weeks development of the software was sidelined.
• Test the developed system using various editions of the Times T2 crossword, against a num-
ber of evaluation criteria - A lot of time was spent testing and refining the crossword solver, as
initial expectations of the solver’s success rate proved to be higher than the results observed.
7.3 Minimum Requirements
The three minimum requirements of the project have been fully realised:
• To develop a system which can return a list of possible answers given a clue and the length of
its answer- the solving process used by the crossword solver returned a set of candidate answers
using the most appropriate search method, which is determined by the contents of the clue text.
45
• To develop a method of inputting a crossword grid into an application- Interaction with the
grid section of the crossword solver’s GUI using the mouse allows users to select the squares
which are to be black.
• To develop a method of inputting a crossword clue into an application, such that it relates
to a specific location on a crossword grid- The crossword solver takes the input grid, and
automatically works out the numbering of the clues and answers. From this, individual clues can
be selected from the GUI, which then allows the clue’s text to be entered into the application.
With regards to extensions to the minimum requirements, the ones listed at the beginning of the
report have all been incorporated into the final system. Crosswords can be loaded and saved from the
File menu of the GUI. The GUI itself was considered a major requirement for software of this nature.
The idea of the information box was derived from ideas demonstrated by existing solvers. Manual entry
of answers into the grid has also been incorporated into the crossword solver.
7.4 Comparison with Existing Systems
From results obtained in Chapter 3, the One Across system obtained 41% of correct answers. The
project’s crossword solver obtained 33% of answers. If the shortcomings of the solving process high-
lighted in Chapter 8 were rectified, there is a strong possibility that the 8% discrepancy between the two
systems could be made up, or even surpassed.
7.5 Comparison with Human Solvers
To gain an understanding on how the crossword solver compared to unassisted human performance, a
small survey was conducted using the three crosswords used to test all systems throughout the course
of this project. The raw results of this survey are to be found in Appendix B. The average times for
crossword data entry from Figure 6.5 have been added to the solving times obtained from the final test
phase in Figure 6.4, whose other data has been used here to compare the results of the human solvers.
As seen in Figure 7.2, although the human solvers took on average at least five minutes longer to
attempt the crosswords, their success rate for obtaining correct answers was far superior to that for the
crossword solver. In every instance, human solvers obtained approximately twice as many answers as the
crossword solver. For some answers, the crossword solver managed to obtain the correct answer where
46
Criteria T2 Bk7 #1 T2 Bk7 #2 #3415
Computer Humans Computer Humans Computer Humans
Clues/Answers 28 28 22 22 24 24
Mean Attempted 15 20.4 14 13.8 14 18.6
% Attempted 0.536 0.729 0.636 0.627 0.583 0.775
Mean Solving Time (mm:ss) 14:04 19:00 12:02 17:00 12:20 19:00
Correct answers (Mean) 8 18.8 7 12.8 9 18.2
% (All) 0.286 0.671 0.318 0.582 0.375 0.758
% (Attempted) 0.600 0.922 0.500 0.928 0.643 0.978
Figure 7.2: Comparison of unassisted human performance against results from Test Phase 3 (Note: The
mean solving times for human solvers are approximate)
few or none of the human solvers could do so. For example, for the clue “Sistine wall subject”(4,8) -
LAST JUDGMENT (T2 Bk7 #2: 21A). Some clues were so tough neither the crossword solver nor
any of the human solvers could obtain the answer. Example clues include “One writing notice”(8) -
REVIEWER (T2 Bk7 #1: 8A) and “Feudal allegiance”(6) -FEALTY (T2 Bk7 #2: 20A).
47
Chapter 8
Conclusion
8.1 Failings of the Solving Process
The solving process of the software performed below expectations. However, the weaknesses of the
system have been outlined below, along with possible remedial action to remove these weaknesses from
the system:
• Google search inconsistencies- From observing results obtaining using the Google Web API,
it has been noted that the search results are not always consistent. For the majority of clues, the
correct answer appears in all searches, with minimal variance in its frequency or confidence score
with respect to other answers returned in its results set. However, for some clues, the answer
appears, but is no longer the highest scoring word, or in the worst case, does not appear at all.
Possible solutions to this include:
1. Repeating Google searches a number of times to obtain a results set,
2. Preprocessing the clue using Question Answering techniques,
3. Using word overlap techniques on the results set, using WordNet
Of these solutions, the first is the quickest to implement in terms of coding. However, there is
no guarantee that repeated searches will return better results. Also, by repeating searches, there
is an increased possibility that the 1000 searches a day limit imposed by the current build of
48
the Google Web API will be reached. Using Question Answering as suggested in the second
solution would help to narrow the Google search, by building up a more specific search term
using related synonyms of words from the clue text. The third solution, if correctly implemented,
should be successful in determining the correct answer from a given results set, but only if the
answer appears within the results initially. From these suggestions, a combination of solutions 2.
and 3. would be the best solution to overcome the Google search inconsistencies.
• Moby Thesaurus not Anglicised - This weakness was only discovered during testing of the
crossword solver. It was assumed that, as the Moby Project’s main word lists contained British
English spellings, the same would be true of the project’s Thesaurus file. However, the Thesaurus
has been produced using American English spellings. Consequently, words likeLABOUR and
HONOUR were not found as synonyms during the test phases due to the inconsistent spellings
between the two forms of English. Solutions to this could be to manually convert the Thesaurus
into British English spellings, which would be an arduous task given the sheer size of thesaurus
file, or to seek out an alternative thesaurus source. The author finds the fact that an American
English thesaurus was compiled during a research project at a British university all the more
curious (not to mention frustrating).
• Word overlap techniques not fully utilised - This feature was only partially implemented into the
solving process, as clue definitions were checked for occurrences of candidate answers. However,
searching on the answer definitions was not implemented (following the development plan from
Figure 5.2, this was to be the next iteration to be integrated into the software). Due to time
constraints, full word overlap functionality was unable to be incorporated into the solving process.
• RASP system not utilised- This feature was originally planned to be one of the final additions to
the software. However, upon the advise of the project’s supervisor and assessor, development of
this feature (which would have taken the form of theRASPService class in the UML diagram
shown in Figure 5.2 was not recommended, again due to time constraints.
8.2 Suggested Improvements
As can be seen from the testing results in the Chapter 6, the solver does at best an adequate job of
retrieving accurate results. The following are suggested improvements which could be made to the
49
crossword solver:
• The automatic numbering of answers on the grid. Currently, the only way for users to determine
an answer’s position on the grid is when its clue is selected from the clue input box; it is then
highlighted in a different colour to the rest of the grid.
• During the entire solving process, the GUI appears to be in a frozen state. This is due to the meth-
ods used to solve a crossword being computationally expensive, time-consuming, and executing
in the same thread as the GUI. The solution is therefore to run the solving process in a separate
thread.
• Currently, there is no way to remove a single answer from the grid, apart from manually entering
another answer in its place, or clearing the grid’s entire contents. A extra feature to clear an answer
could be implemented within the solver’s GUI class.
8.3 Future Work
The crossword solver, as it stands, has huge potential for future work to be carried out on it. Small
projects may include developing a better Google search facility using elements of Question Answering,
developing a British English thesaurus to replace the current Moby thesaurus, or allowing for different
sizes of grid (perhaps even irregular shaped grids). More adventurous projects could aim to expand
the capabilities of the current crossword solver, so that, for example, it could solve cryptic clues as
Crossword Maestro[11] does.
50
Bibliography
[1] E. Briscoe and J. Carroll. Robust Accurate Statistical Annotation of General Text). InProceedings
of the Third International Conference on Language Resources and Evaluation (LREC 2002), pages
1499–1504, May 2002. URL: http://www.informatics.susx.ac.uk/research/nlp/rasp/ [17th February
2005].
[2] Richard Browne, editor.The Times T2 Crossword Book 7. Harper Collins, first edition, 2004.
[3] G. Burnage. CELEX - A Guide For Users. Technical report, Max Planck Institute for Psycholin-
guistics, University of Nijmegen, 1990. URL: http://www.ru.nl/celex/subsecs/sectiondoc.html
[26th April 2005].
[4] J. Crinnion.Evolutionary Systems Development. Ptiman Publishing, first edition, 1992.
[5] T. Gilb and S. Finzi.Principles of Software Engineering. Addison-Wesley, first edition, 1988.
[6] J. K. Hardy. Jumble and Crossword Solver. URL: http://ull.chemistry.uakron.edu/jumble.html [2nd
December 2004], January 2003.
[7] E. Hovy, L. Gerber, U. Hermjakob, M. Junk, and Chin-Yew L. Question Answering in Webclope-
dia. InTREC-9 Conference, NIST, 2001.
[8] Peter H. Jesty. Software Project Management: - Life Cycles, 2004. URL:
http://www.comp.leeds.ac.uk/se22/lectures.html [17th April 2005].
[9] Ed Pegg Jr. The Mathematical Association of America, Math Games: Crossword Rules. URL:
https://enterprise.maa.org/editorial/mathgames/mathgames05 10 04.html [24th November 2004],
May 2004.
51
[10] M. L. Littman, G. A. Keim, and N. Shazeer. A probabilistic approach to solving crossword puzzles.
Artificial Intelligence, 134:23–55, 2002.
[11] Genius 2000 Ltd. Crossword Maestro for Windows. URL: http://www.crosswordmaestro.com
[24th November 2004], 2000.
[12] K. Markert. AI32 Lecture Notes, 2005. URL: http://www.comp.leeds.ac.uk/ai32/lectures/index.html
[24th April 2005].
[13] Sun Microsystems. Javadoc Tool Home Page. URL: http://java.sun.com/j2se/javadoc/ [26th April
2005], 1995.
[14] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to WordNet: An
On-line Lexical Database.International Journal of Lexicography, 1990 (Revised August 1993).
[15] Editors of The American Heritage Dictionaries.The American Heritage Dictionary of the English
Language. Houghton Mifflin, fourth edition, 2000. Online Search: http://www.bartleby.com/61/
[26th April 2005].
[16] Wim Peters. Lexical Resources - Comparison of resources using metadata. URL:
http://phobos.cs.unibuc.ro/roric/lexcomparison.html [24th April 2005], 2003.
[17] L. Prechelt. An empirical comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a
search/string-processing program.IEEE Computer, March 2000.
[18] Oxford University Press, editor.The Oxford English Dictionary. Oxford University Press, second
edition, 1989. URL: http://dictionary.oed.com/ [19th February 2005].
[19] D. Radev, W. Fan, H. Qi, H. Wu, and A. Grewal. Probabilistic Question Answering on the Web.
Journal of the American Society for Information Science and Technology, April 2005.
[20] various. One Across. URL: http://www.oneacross.com [24th April 2005], 2005.
[21] Grady Ward. Moby Project, 2000. URL: http://www.dcs.shef.ac.uk/research/ilash/Moby/.
52
Chapter 9
Appendix A
9.1 Project Reflection
Overall, I felt the project to be very successful. The aim, minimum requirements and objectives were all
completed to what I felt to be a high standard, and on schedule. However, a few aspects of the project
did not run as smoothly as intended. The reasons why they were felt to be problems areas, and possible
mitigation of these as problem areas are discussed below.
Natural Language Processing
As a student who had not chosen to study Natural Language Processing as part of my degree programme,
it was not immediately obvious, if at all obvious, that this field of computing would be most useful in
the development of the crossword solver; it was only until learning that Dr. Markert was to be my
assessor that I realised elements of NLP needed to be integrated into my solution. If my research into
NLP had been undertaken much earlier in the lifetime of the project, I am very confident that the solver’s
success rate could have been much higher, as more time would have been given to implementing NLP
techniques into the software. Nevertheless, I found the field of NLP to be an very interesting one. My
advice to future finalists is to look at all possible areas of research in the field of computing with regards
to understanding the problem of their intended project, even areas they have not studied or will not study
as part of their degree programme. As well as furthering your computing knowledge, you may find the
53
learning process an enjoyable one, as I did.
Software Development
Given the lower than expected results for the crossword solver, in hindsight I probably placed too much
emphasis on getting the design of the GUI right. It has been said of me at that at times I can be a
perfectionist, unwilling to stop at something until I feel it is truly finished. Given that I am a strong
programmer, and that the software development phase of the project was delayed by two or three weeks
as extra research was carried out, this sentiment was definitely true with regards to the creation of the
crossword solver. Given more time, I felt (and still feel) that I could have improved the crossword solver.
However, I had to remind myself that it was the report, and not the software, that ultimately gets marked
and graded. The lessons learnt from this to be passed on to other students are:
a) to undertake early stages of a project as soon as possible to allow for changes in the direction of your
project, and
b) to allow plenty of time for developing software deliverables.
No matter how good a programmer you are in whatever language, programmings errors will occur
and problems will arise that take valuable time to rectify. This combined with pressures from regular
courseworks mean that the earlier you plan ahead and allow for such instances to occur, the better off
you will be come deadline day.
54
Chapter 10
Appendix B
10.1 Results of Human Solvers
As an extra test to the main testing and evaluation criteria, comparison of the crossword solver tounas-
sistedhuman performance was considered, to see how the automated crossword solving system com-
pared. In all, five participants attempted the same three crosswords tested throughout the project. The
raw results are detailed overleaf. The “answer found” percentages of the individual answers are shown,
along with any incorrect answers and their percentage scores.
55
Number Clue Answer % Found Other answers given
1A Baghdad its capital IRAQ 0.800 IRAC (0.200)
2D Full of water-plants; of oboe-like tone REEDY 0.400 -
3D Cervantes’s Don QUIXOTE 0.600 QUIOTAE (0.200)
4A Instrument plucked in mouth JEW’S HARP 0.600 -
4D Loose skin on neck JOWL 0.400 CHIN (0.200)
5D All one’s clothes WARDROBE 0.000 -
6D Nettle-rash HIVES 0.600 STING (0.200)
7D Pasta parcels RAVIOLI 1.000 -
8A One writing notice REVIEWER 0.000 NOTIFIER (0.200)
9A Long live!; oral test VIVA 0.800 -
10A George Gordon, poet BYRON 0.600 BROWN (0.200)
10D Public vehicle BUS 1.000 -
11A Shudder of emotion FRISSON 0.200 -
12D Not grown up IMMATURE 1.000 -
13A Steady flow STREAM 1.000 -
14D (Deliberately cause) anguish TORTURE 0.400 -
15A Of Welsh poets BARDIC 0.200 -
16D Small baking dish RAMEKIN 0.200 -
17D Reduce; one’s share CUT 0.800 BIT (0.200)
18A Done in small stages GRADUAL 0.400 -
19D A twelvesome DOZEN 1.000 -
20A Sting with pain; clever SMART 0.800 -
21D Straight-edge; king RULER 0.800 -
22D Out-of-focus sight BLUR 1.000 -
23A Fluffy hair; police (slang) FUZZ 1.000 -
24A Improbable UNLIKELY 1.000 -
25A Santa’s animal REINDEER 0.800 RAINDEER (0.200)
26A Annoy; police informer (slang) NARK 0.400 -
Figure 10.1: Results from Times 2 Crossword Book 7, Crossword #1
56
Number Clue Answer % Found Other answers given
1D Senior Service shade NAVY BLUE 0.400 -
2D Corrupt; decomposing PUTRID 0.400 -
3D Philatelist’s treasures STAMPS 1.000 -
4D Cask stopper BUNG 1.000 -
5D A fish; sounds likeposition PLAICE 0.600 -
6A Spontaneously; needlessly GRATUITOUSLY 0.400 -
6D Slight scrape; keep snacking GRAZE 0.800 -
7A Solicitor LAWYER 1.000 -
8A One for sorrowbird MAGPIE 0.800 -
9A Collapsed; moor FELL 0.600 -
10A Catastrophe DISASTER 0.800 -
11D With sawlike edge SERRATED 0.800 -
12A As a company, in conjunction TOGETHER 0.400 -
13D Fugitive OUTLAW 0.600 -
14D Dignity; be faithful to HONOUR 0.400 HONEST (0.200)
15D Place of safety REFUGE 0.600 -
16A Slope (joining two levels) RAMP 0.800 -
17D Sacred choral piece MOTET 0.000 PSALM (0.200)
18A Nicked STOLEN 1.000 -
19D Mediaeval plucked instrument LUTE 0.400 LYRE (0.400)
20A Feudal allegiance FEALTY 0.000 -
21A Sistine wall subject LAST JUDGMENT 0.000 -
Figure 10.2: Results from Times 2 Crossword Book 7, Crossword #2
57
Number Clue Answer % Found Other answers given
1A Practical application of science TECHNOLOGY 1.000 -
1D Lacking courage TIMID 1.000 -
2D (Of dress) conventional and
sober
CONSERVATIVE 1.000 -
3D Fine, subtle; pleasant NICE 1.000 -
4D Work hard LABOUR 1.000 -
5D Enterprising person GO-GETTER 1.000 -
6D Cromwell’s fighters NEW MODEL ARMY 0.600 -
7D Nov 30th this saint’s day ANDREW 0.800 -
8A Taxi MINICAB 1.000 -
9A Common; inexperienced GREEN 0.800 -
10A Dragged; finished level DREW 0.400 -
11A Shop user CUSTOMER 1.000 -
12D In the open air ALFRESCO 1.000 -
13A Auctioneer’s hammer GAVEL 1.000 -
13D A man (colloq.) GEEZER 0.800 -
14A Establish (university post) by
giving funds
ENDOW 1.000 -
15D Tuberous garden plant DAHLIA 0.200 BAMBOO (0.200)
16A Outside EXTERNAL 0.600 EXTERIOR (0.400)
17A Ran away FLED 0.600 -
18D Research intensively (into some-
thing)
DELVE 0.600 -
19D Sleeping (archaic) ABED 0.200 -
20A Overhanging roof part EAVES 0.800 -
21A Respectful of opinions different
from one’s own
LIBERAL 0.200 -
22A A Study in Scarletauthor CONAN DOYLE 0.400 -
Figure 10.3: Results from Times 2 Crossword #3415
58