1 Combining KR and search: Crossword puzzles Next: Logic representations Reading: C. 7.4-7.8.

29
1 Combining KR and search: Combining KR and search: Crossword puzzles Crossword puzzles Next: Logic representations Reading: C. 7.4-7.8
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    229
  • download

    1

Transcript of 1 Combining KR and search: Crossword puzzles Next: Logic representations Reading: C. 7.4-7.8.

1

Combining KR and search: Combining KR and search: Crossword puzzlesCrossword puzzles

Next: Logic representations

Reading: C. 7.4-7.8

2

Changes in HomeworkChanges in Homework

Mar 4th: Hand in written design, planned code for all modules

Mar 9th: midterm Mar 25th: Fully running system due Mar 30th: Tournament begins

3

Changes in HomeworkChanges in Homework

Dictionary Use dictionary provided; do not use your own Start with 300 words only Switch to larger set by time of tournament

Representation of dictionary is important to reducing search time

Using knowledge to generate word candidates could also help

4

Midterm SurveyMidterm Survey

Start after 9AM Friday and finish by Thursday, Mar. 4th

Your answers are important: they will affect remaining class structure

5

Crossword Puzzle SolverCrossword Puzzle Solver

Proverb: Michael Litman, Duke Univ Developed by his AI class

Combines knowledge from multiple sources to solve clues (clue/target)

Uses constraint propogation in combination with probabilities to select best target

6

Algorithm OverviewAlgorithm Overview

Independent programs specialize in different types of clues – knowledge experts

Information retrieval, database search, machine learning

Each expert module generates a candidate list (with probabilities)

Centralized solver Merges the candidates lists for each clue Places candidates on the puzzle grid

7

PerformancePerformance

Averages 95.3% words correct and 98.1% letters correct

Under 15 minutes/puzzle Tested on a sample of 370 NYT puzzles Misses roughly 3 words or 4 letters on a

daily 15X15 puzzle

8

QuestionsQuestions

Is this approach any more intelligent than the chess playing programs?

Does the use of knowledge correspond to intelligence?

Do any of the techniques for generating words apply to Scrabble?

9

10

To begin: research styleTo begin: research style

Study of existing puzzles How hard? What are the clues like? What sources of knowledge might be helpful?

Crossword Puzzle database (CWDB) 350,000 clue-target pairs >250,000 unique pairs = # of puzzles seen over 14 years at rate of

one puzzle/day

11

How novel are crossword How novel are crossword puzzles? puzzles?

Given complete database and a new puzzle, expect to have seen

91% of targets 50% of clues 34% of clue target pairs 96% of individual words in clues

12

13

Categories of cluesCategories of clues

Fill in the blank: 28D: Nothing ____: less

Trailing question mark 4D: The end of Plato?:

Abbreviations 55D: Key abbr: maj

14

Expert CategoriesExpert Categories

Synonyms 40D Meadowsweet: spiraea

Kind-of 27D Kind of coal or coat: pea “pea coal” and “pea coat” standard phrases

Movies 50D Princess in Woolf’s “Orlando”: sasha

Geography 59A North Sea port: aberdeen

Music 2D “Hold Me” country Grammay winner, 1988: oslin

Literature 53A Playwright/novelist Capek: karel

Information retrieval 6D Mountain known locally as Chomolungma: everest

15

16

17

18

Candidate generatorCandidate generator

Farrow of “Peyton Place”: mia

Movie module returns: 0.909091 mia 0.010101 tom 0.010101 kip 0.010101 ben 0.010101 peg 0.010101 ray

19

20

21

Ablation testsAblation tests

Removed each module one at a time, rerunning all training puzzles

No single module changed overall percent correct by more than 1%

Removing all modules that relied on CWDB 94.8% to 27.1% correct

Using only the modules that relied exclusively on CWDB

87.6% correct

22

Word list modulesWord list modules

WordList, WordListBig Ignore their clues and return all words of correct length WordList

655,000 terms WordListBig

WordList plus constructed terms First and last names, adjacent words from clues 2.1 million terms, all weighted equally

5D 10,000 words, perhaps: novelette Wordlist-CWDB

58,000 unique targets Returns all targets of appropriate length Weights with estimates of their “prior” probabilities as targets of

arbitrary clues Examine frequency in crossword puzzles and normalize to account for bias

caused by letters intersecting across and down terms

23

CWDB-specific modulesCWDB-specific modules

Exact Match Returns all targets of the correct length associated with the clue Example error: it returns eeyore for 19A Pal of Pooh: tigger

Transformations Learns transformations to clue-target pairs Single-word substitution, remove one phrase from beginning or end and

add another, depluralizing a word in clue, pluralize word in target Nice X <-> X in France X for short <-> X abbr. X start <-> Prefix with X X city <-> X capital 51D: Bugs chaser: elmer, solved by Bugs pursuer: elmer and the

transformation rule X pursuer <-> X chaser http://www.oneacross.com

24

Information retrieval modulesInformation retrieval modules

Encyclopedia For each query term, compute distribution of terms

“close” to query Counted 10-k times every times it apears at a distance of

k<10 from query term Extremely common terms (as, and) are ignored

Partial match For a clue c, find all clues in CWDB that share words For each such clue, give its target a weight

LSI-Ency, LSI-CWDB Latent semantic indexing (LSI) identifies correlations

between words: synonyms Return closest word for each word in the clue

25

Database ModulesDatabase Modules

Movie www.imdb.com Looks for patterns in the clue and formulates query to database

Quoted titles: 56D “The Thief of Baghdad” role: abu Boolean operations: Cary or Lee: grant

Music, literary, geography Simple pattern matching of clue (keywords “city”, “author”,

“band”, etc) to formulate query 15A “Foundation of Trilogy” author: asimov Geography database: Getty Information Institute

26

SynonymsSynonyms

WordNet Look for root forms of words in the clue Then find variety of related words

49D Chop-chop: apace

Synonyms of synonyms Forms of related words converted to forms of

clue word (number, tense) 18A Stymied: thwarted Is this relevant to Scrabble?

27

Syntactic ModulesSyntactic Modules

Fill-in-the-blanks >5% clues Search databases (music, geography, literary and

quotes) to find clue patterns 36A Yerby’s “A Rose for _ _ _ Maria”: ana

Pattern: for _ _ _ Maria Allow any 3 characters to fill the blanks

Kindof Pattern matching over short phrases 50 clues of this type

“type of” (A type of jacket: nehru) “starter for” (Starter for saxon: anglo) “suffix with” (Suffix with switch or sock: eroo

28

Implicit Distribution ModulesImplicit Distribution Modules

Some targets not included in any database, but more probable than random

Schaeffer vs. srhffeeca Bigram module

Generates all possible letter sequences of the given length by returning a letter bigram distribution over all possible strings, learned from CWDB

Lowest probability clue-target, but higher probability than random sequence of letters

Honolulu wear: hawaiianmuumuu

How could this be used for Scrabble?

29

QuestionsQuestions

Is this approach any more intelligent than the chess playing programs?

Does the use of knowledge correspond to intelligence?

Do any of the techniques for generating words apply to Scrabble?