Descriptive ILP for Mathematical Discovery Simon Colton Computational Bioinformatics Lab Department...

Descriptive ILP for Mathematical Discovery

Simon ColtonComputational Bioinformatics

LabDepartment of ComputingImperial College, London

OverviewInductive logic programmingPredictive versus descriptive inductionThe HR programApplications to mathematicsAll new application to vision dataFuture work

Logic Programs

Representation languageSubset of first order logic

Conjunction of literals (body) implies a single literal (head)

E.g., p(X) q(Y,X) r(X)

Huge amount of theoretical workMany knowledge basesEven a programming language (Prolog)

Inductive Logic Programming

Machine learningImprove with respect to a task

Usually classification/prediction via concept/rule learning

In light of experienceUsually data/background knowledge

Given:Background logic programs, BPositive examples E+ and negative examples E-

Learn:A hypothesis logic program, H such that(B H) E+ and (B H) E-

Predictive InductionAI went down the problem solving path

Everything shoe-horned into this paradigm

Set up a problem as follows:Given a set of positives and a set of negativesLearn a reason why positives are positives and the negatives are negative

Reasons are rulesets/concepts/math-formulae

Positives onlyManage the same task without negatives

Reasons allow for prediction of the class ofOf unseen examples

Predictive ILP programs: Progol, FOIL, …

Descriptive InductionMuch broader remit:

Given the same background and dataFind something interesting about the data

Interesting thing may be:A particular exampleA concept which categorises examplesA hypothesis which relates conceptsAn explanation of a hypothesis

Descriptive ILP systems: Claudien, Warmr

Descriptive versus Predictive

Predictive:“You know what you’re looking for, but you don’t know what it looks like”

Descriptive:“You don’t know what you’re looking for”

Future:“You don’t even know you’re looking for something”E.g., adding an entry to a database

Example – the AnimalsClassic toy problem for ILPGiven details of a set of animals

E.g., covering, milk, homeothermic, legs

Learn reasons for categorisation into:Mammals, reptiles, birds, fish

Descriptive induction finds the same rulesBut as conjectures, not as solutions to a problem

Finds other things of interest:E.g., “Did you see that the duck-billed platypus is the only mammal which lays eggs?”

The HR Program

Performs descriptive inductionDeveloped since 1997

Edinburgh, York, ImperialPhD (prolog), PostDoc (java), beyond (projects)

Mainly applied to mathematicsBut developed as a general purpose ML program

Also applied to AI tasksPredictive learning, theorem proving, CSPs

Automated Theory Formation

A theory consists of at least:Examples, concept, conjectures, proofs

Given background knowledgeExamples, concepts, axiomsE.g., groups, multiplication, identity axiom

Theory formation proceeds by a cycle of:Concept formationConjecture makingProof/disproof attemptsAssessment of concepts

Concept Formation15 production rules

Take old concept(s) and produce new ones

Most PRs are genericExists, forall, compose, negate, split, size, match, equal, disjunct, linear constraint

A few are more domain specificRecord, arithmetic, embed graph,

Example ConstructionOdd prime numbers

Split

Negate

Size

Compose

[a,b] : b|a

[a,n]:n = |{b:b|a}|

[a]:2=|{b:b|a}|

[a] : 2|a

[a] : not 2|a

[a]:2=|{b:b|a}| & not 2|a

Split

Conjecture MakingConjectures made empirically

At each attempt to make a new concept

New concept has same success set as old oneAn equivalence conjecture is made

New concept has empty success setA non-existence conjecture is made

Success set is a proper subset/supersetAn implication conjecture is made

More succinct results are extracted from theseImplications, implicates and prime implicates

Explanation HandlingUser supplies some axioms of domain

E.g., three axioms of group theory

HR appeals to third party ATP softwareUses Otter to try to prove that each conjecture follows from the axiomsIf unsuccessful, uses Mace to try and find a counterexample

New example added to the theory

Other reasoning software used (MathWeb)

InterestingnessHR has many types of search:

BFS, DFS, random, tiered, reactive, heuristicUsually depth limit (complexity)

Heuristic search:Measure the “interestingness” of each conceptBuild new concepts from the best old ones

Intrinsic measuresComprehensibility, applicability, variety, …

Relative valuesNovelty, child/parent, interestingness of conjectures

Utilitarian measuresCoverage, highlight, invariance, discrimination, …

Handling Uncertain DataGiven a genuinely new concept

HR tries to make conjectures opportunistically

Near-implicationsOne concept has nearly a subset of examples of anotherUser specifies the minimum percentage match, e.g., 20%E.g., prime numbers are odd (only one exception)

Near-equivalencesTwo concepts have nearly the same success setsOften choose to look at the positives only

Near-nonexistsNew concept has very few examplesE.g., even prime number if and only if equal to 2

Mathematical DiscoveryHR has been applied to:

Number theory, graph theory, algebraic domains, e.g., group theory, ring theory

Early approaches were entirely descriptive

Interesting concepts and conjectures looked for in the theories HR formed

Later approaches largely driven by particular applications

For mathematical reasons and for AI reasons

Number TheoryFirst big success for HREncyclopedia of Integer sequences

Coming up to 100,000 sequences Such as primes, squares, odds, evens, fibonacci

HR invented 20 new sequencesWhich are now in the encyclopediaAll supplied with reasons why they are interesting

Some nice examplesOdd refactorable numbers are perfect squaresSum of divisors is prime number of divisors is prime

Spin-off Systems

NumbersWithNameswww.machine-creativity.com/programs/nwnData-mines a subset of the encyclopediaE.g., Perfect numbers are pernicious

HOMERAvailable online soonHR, Otter and Maple combination

Maple background file, HR forms conjectures, Otter acts as a filter (provable theorems in the bin)

Interacts with the userDemonstration

Graph Theory

Siemion Fajtlowicz & Ermelinda DelavinaGraffiti program and the writing on the wallScores of papers written about the conjectures

Including by Paul Erdos

MSc. Project of Noor MohamadaliUse HR to re-discover Graffiti’s conjectures

Very successfulFound all the ones we expected it to, and some others which Graffiti hasn’t foundCurrently being proved/disproved by the AGX program

Pierre Hansen and Giles Caporossi

Algebraic DomainsOtter/Mace are pretty good in these domainsHR can be completely bootstrapping

Start from the axioms of the domain alone

Building of classification treesHR used in a predictive induction way

Find classifying concepts (distinguish pairs of algebras)

Part of a larger system which verifies the results: IJCAR04

Example Classification

Groups of size eight can be classified using their self inverse elements (x-1=id)

“They will either have (i) all self inverse elements (ii) an element which squares to give a non-self inverse element (iii) no self-inverse elements which aren't also commutators (iv) a self inverse element which can be expressed as the product of two non-commutative elements or (v) none of these properties.”

Classification tree produced for loops of size 6

109 isomorphism classes

AI Applications

Machine learningUsed for bioinformatics datasets

Constraint solvingReformulation of CSPs (implied constraints)E.g., QG3-quasigroups are anti-Abelian

Automated reasoningProducing theorems for TPTP library

40,000 possible theorems in 10 minutesAround 100 accepted into the library

“Proving” non-theorems

The TM SystemInspired by Lakatos’ philosophy of maths

PhD project of Alison Pease

Starts with a non-theoremUses Mace to:

Find counterexamples and supporting examples

Uses HR to find concept which is true of a subset of counters or supporting examples

Adds appropriate axioms

Uses Otter to prove reformulated theorem

Example from TM

Non-theorem from TPTP library:In ring theory, the following identity is true

w x ((((w*w)*x)*(w*w))=id)

Mace found 7 supporting examples and 6 falsifying examplesHR found a single concept true of 3 positives:

¬( b (b*b=b & ¬(b+b=b)))

“In rings for which, for all b, b*b=b implies that b+b=b, the identity is true”

Proved by Otter, hand proof by Roy McCasland

Vision ApplicationWork with Paulo Santos and Derek MageeVision data:

Learning the rules of a dice gameContext of autonomous agents

Learning/adapting from observations alone

Ask Paulo et. al about the vision side!First serious application of HR’s handling of noisy, incomplete dataFirst quantitative comparison of HR and Progol (predictive ILP system)

Progol Setup

State and successor data transformed to:Transition/4 predicatesE.g., trans([c1],[c1,c2],t101,t103)

Mode declarations chosen wiselyApplication of a language bias

Rules extracted from the answer to a positives-only learning problem:

Find a set of rules to cover the trans/4 predicate

Also a meta-level application of ProgolSee Santos et. al ECAI paper

HR SetupState and successor data transformed to:

state0/1, state1/2 and state2/2 predicates(HR cannot handle lists)

Various different setups tried with HRUsing match, compose, exists, split PRsUsing near-implications at 50%, 70%, 100%Using different search strategies

BFS, split-first, compose-last

Rules extracted from the theory producedi.e., no explicit problem to solve

Experiments

8 different sessions from vision systemPlus one with all the data

Looking for 18 rules of the gameEmpty becomes 1-state1-state becomes 2-state2 equal dice faces are both removedOtherwise, the smallest of the two is removed

15 instantiations of this rule

Recorded how many rules each system foundAlso recorded the coverage of the 15 rules

Found in the data

Sensitivity Results

HR ResultsFirst application of HR with uncertain data

Wanted to assess it’s performance

Altering parametersAffected sensitivity in entirely expected ways

Increasing these increases sensitivity:Number of stepsComplexity limit (but not too much)

Decreasing percentage match increases sensitivity

HR able to pick up conjectures which are only 65% true

Altering the search strategy had little effect

Drawbacks to ApproachesPredictive approach

Problem may be solved without finding some game rules

Due to over-generalisation with little data

Need to know the mode declarations in advanceIn general, rules will require the solution of many different prediction problems

Descriptive approachCan take a long time to find rules (10x slower)

In general, rules will not be so simple to find

Can produce a glut of information from which it’s difficult to extract the pertinent rules

Descriptive best in theory, predictive slightly better in practice (at least for this data set, but we’ll see…)

Proposed CombinationCan we get the best of both worlds?Idea:

Use HR for an initial theory formation explorationLook at some rules found

Extract a general format for them

Turn general rules into mode declarationsUse Progol to solve individual problems

In practice, HR may take too long to workBut the process would be entirely automatic

ConclusionsHR shown good abilities for discovery in pure mathematics and AI applications

Characterised by lack of noise, concrete axioms

Not clear that the application to discovery in noisy, incomplete data sets will be as good

Future WorkApplying for grants

To continue the application to CSP reformulationWith Ian Miguel

To continue the Lakatos approach, possibly with applications to software verification

With Alison Pease

My focus is on integration ofLearning, constraint solving, automated reasoningApplication to creative tasks in

Mathematics, bioinformatics and vision!

Descriptive ILP for Mathematical Discovery Simon Colton Computational Bioinformatics Lab Department...

Documents

Transcript of Descriptive ILP for Mathematical Discovery Simon Colton Computational Bioinformatics Lab Department...