Descriptive ILP for Mathematical Discovery Simon Colton Computational Bioinformatics Lab Department...
-
Upload
marylou-little -
Category
Documents
-
view
222 -
download
0
Transcript of Descriptive ILP for Mathematical Discovery Simon Colton Computational Bioinformatics Lab Department...
Descriptive ILP for Mathematical Discovery
Simon ColtonComputational Bioinformatics
LabDepartment of ComputingImperial College, London
OverviewInductive logic programmingPredictive versus descriptive inductionThe HR programApplications to mathematicsAll new application to vision dataFuture work
Logic Programs
Representation languageSubset of first order logic
Conjunction of literals (body) implies a single literal (head)
E.g., p(X) q(Y,X) r(X)
Huge amount of theoretical workMany knowledge basesEven a programming language (Prolog)
Inductive Logic Programming
Machine learningImprove with respect to a task
Usually classification/prediction via concept/rule learning
In light of experienceUsually data/background knowledge
Given:Background logic programs, BPositive examples E+ and negative examples E-
Learn:A hypothesis logic program, H such that(B H) E+ and (B H) E-
Predictive InductionAI went down the problem solving path
Everything shoe-horned into this paradigm
Set up a problem as follows:Given a set of positives and a set of negativesLearn a reason why positives are positives and the negatives are negative
Reasons are rulesets/concepts/math-formulae
Positives onlyManage the same task without negatives
Reasons allow for prediction of the class ofOf unseen examples
Predictive ILP programs: Progol, FOIL, …
Descriptive InductionMuch broader remit:
Given the same background and dataFind something interesting about the data
Interesting thing may be:A particular exampleA concept which categorises examplesA hypothesis which relates conceptsAn explanation of a hypothesis
Descriptive ILP systems: Claudien, Warmr
Descriptive versus Predictive
Predictive:“You know what you’re looking for, but you don’t know what it looks like”
Descriptive:“You don’t know what you’re looking for”
Future:“You don’t even know you’re looking for something”E.g., adding an entry to a database
Example – the AnimalsClassic toy problem for ILPGiven details of a set of animals
E.g., covering, milk, homeothermic, legs
Learn reasons for categorisation into:Mammals, reptiles, birds, fish
Descriptive induction finds the same rulesBut as conjectures, not as solutions to a problem
Finds other things of interest:E.g., “Did you see that the duck-billed platypus is the only mammal which lays eggs?”
The HR Program
Performs descriptive inductionDeveloped since 1997
Edinburgh, York, ImperialPhD (prolog), PostDoc (java), beyond (projects)
Mainly applied to mathematicsBut developed as a general purpose ML program
Also applied to AI tasksPredictive learning, theorem proving, CSPs
Automated Theory Formation
A theory consists of at least:Examples, concept, conjectures, proofs
Given background knowledgeExamples, concepts, axiomsE.g., groups, multiplication, identity axiom
Theory formation proceeds by a cycle of:Concept formationConjecture makingProof/disproof attemptsAssessment of concepts
Concept Formation15 production rules
Take old concept(s) and produce new ones
Most PRs are genericExists, forall, compose, negate, split, size, match, equal, disjunct, linear constraint
A few are more domain specificRecord, arithmetic, embed graph,
Example ConstructionOdd prime numbers
Split
Negate
Size
Compose
[a,b] : b|a
[a,n]:n = |{b:b|a}|
[a]:2=|{b:b|a}|
[a] : 2|a
[a] : not 2|a
[a]:2=|{b:b|a}| & not 2|a
Split
Conjecture MakingConjectures made empirically
At each attempt to make a new concept
New concept has same success set as old oneAn equivalence conjecture is made
New concept has empty success setA non-existence conjecture is made
Success set is a proper subset/supersetAn implication conjecture is made
More succinct results are extracted from theseImplications, implicates and prime implicates
Explanation HandlingUser supplies some axioms of domain
E.g., three axioms of group theory
HR appeals to third party ATP softwareUses Otter to try to prove that each conjecture follows from the axiomsIf unsuccessful, uses Mace to try and find a counterexample
New example added to the theory
Other reasoning software used (MathWeb)
InterestingnessHR has many types of search:
BFS, DFS, random, tiered, reactive, heuristicUsually depth limit (complexity)
Heuristic search:Measure the “interestingness” of each conceptBuild new concepts from the best old ones
Intrinsic measuresComprehensibility, applicability, variety, …
Relative valuesNovelty, child/parent, interestingness of conjectures
Utilitarian measuresCoverage, highlight, invariance, discrimination, …
Handling Uncertain DataGiven a genuinely new concept
HR tries to make conjectures opportunistically
Near-implicationsOne concept has nearly a subset of examples of anotherUser specifies the minimum percentage match, e.g., 20%E.g., prime numbers are odd (only one exception)
Near-equivalencesTwo concepts have nearly the same success setsOften choose to look at the positives only
Near-nonexistsNew concept has very few examplesE.g., even prime number if and only if equal to 2
Mathematical DiscoveryHR has been applied to:
Number theory, graph theory, algebraic domains, e.g., group theory, ring theory
Early approaches were entirely descriptive
Interesting concepts and conjectures looked for in the theories HR formed
Later approaches largely driven by particular applications
For mathematical reasons and for AI reasons
Number TheoryFirst big success for HREncyclopedia of Integer sequences
Coming up to 100,000 sequences Such as primes, squares, odds, evens, fibonacci
HR invented 20 new sequencesWhich are now in the encyclopediaAll supplied with reasons why they are interesting
Some nice examplesOdd refactorable numbers are perfect squaresSum of divisors is prime number of divisors is prime
Spin-off Systems
NumbersWithNameswww.machine-creativity.com/programs/nwnData-mines a subset of the encyclopediaE.g., Perfect numbers are pernicious
HOMERAvailable online soonHR, Otter and Maple combination
Maple background file, HR forms conjectures, Otter acts as a filter (provable theorems in the bin)
Interacts with the userDemonstration
Graph Theory
Siemion Fajtlowicz & Ermelinda DelavinaGraffiti program and the writing on the wallScores of papers written about the conjectures
Including by Paul Erdos
MSc. Project of Noor MohamadaliUse HR to re-discover Graffiti’s conjectures
Very successfulFound all the ones we expected it to, and some others which Graffiti hasn’t foundCurrently being proved/disproved by the AGX program
Pierre Hansen and Giles Caporossi
Algebraic DomainsOtter/Mace are pretty good in these domainsHR can be completely bootstrapping
Start from the axioms of the domain alone
Building of classification treesHR used in a predictive induction way
Find classifying concepts (distinguish pairs of algebras)
Part of a larger system which verifies the results: IJCAR04
Example Classification
Groups of size eight can be classified using their self inverse elements (x-1=id)
“They will either have (i) all self inverse elements (ii) an element which squares to give a non-self inverse element (iii) no self-inverse elements which aren't also commutators (iv) a self inverse element which can be expressed as the product of two non-commutative elements or (v) none of these properties.”
Classification tree produced for loops of size 6
109 isomorphism classes
AI Applications
Machine learningUsed for bioinformatics datasets
Constraint solvingReformulation of CSPs (implied constraints)E.g., QG3-quasigroups are anti-Abelian
Automated reasoningProducing theorems for TPTP library
40,000 possible theorems in 10 minutesAround 100 accepted into the library
“Proving” non-theorems
The TM SystemInspired by Lakatos’ philosophy of maths
PhD project of Alison Pease
Starts with a non-theoremUses Mace to:
Find counterexamples and supporting examples
Uses HR to find concept which is true of a subset of counters or supporting examples
Adds appropriate axioms
Uses Otter to prove reformulated theorem
Example from TM
Non-theorem from TPTP library:In ring theory, the following identity is true
w x ((((w*w)*x)*(w*w))=id)
Mace found 7 supporting examples and 6 falsifying examplesHR found a single concept true of 3 positives:
¬( b (b*b=b & ¬(b+b=b)))
“In rings for which, for all b, b*b=b implies that b+b=b, the identity is true”
Proved by Otter, hand proof by Roy McCasland
Vision ApplicationWork with Paulo Santos and Derek MageeVision data:
Learning the rules of a dice gameContext of autonomous agents
Learning/adapting from observations alone
Ask Paulo et. al about the vision side!First serious application of HR’s handling of noisy, incomplete dataFirst quantitative comparison of HR and Progol (predictive ILP system)
Progol Setup
State and successor data transformed to:Transition/4 predicatesE.g., trans([c1],[c1,c2],t101,t103)
Mode declarations chosen wiselyApplication of a language bias
Rules extracted from the answer to a positives-only learning problem:
Find a set of rules to cover the trans/4 predicate
Also a meta-level application of ProgolSee Santos et. al ECAI paper
HR SetupState and successor data transformed to:
state0/1, state1/2 and state2/2 predicates(HR cannot handle lists)
Various different setups tried with HRUsing match, compose, exists, split PRsUsing near-implications at 50%, 70%, 100%Using different search strategies
BFS, split-first, compose-last
Rules extracted from the theory producedi.e., no explicit problem to solve
Experiments
8 different sessions from vision systemPlus one with all the data
Looking for 18 rules of the gameEmpty becomes 1-state1-state becomes 2-state2 equal dice faces are both removedOtherwise, the smallest of the two is removed
15 instantiations of this rule
Recorded how many rules each system foundAlso recorded the coverage of the 15 rules
Found in the data
Sensitivity Results
HR ResultsFirst application of HR with uncertain data
Wanted to assess it’s performance
Altering parametersAffected sensitivity in entirely expected ways
Increasing these increases sensitivity:Number of stepsComplexity limit (but not too much)
Decreasing percentage match increases sensitivity
HR able to pick up conjectures which are only 65% true
Altering the search strategy had little effect
Drawbacks to ApproachesPredictive approach
Problem may be solved without finding some game rules
Due to over-generalisation with little data
Need to know the mode declarations in advanceIn general, rules will require the solution of many different prediction problems
Descriptive approachCan take a long time to find rules (10x slower)
In general, rules will not be so simple to find
Can produce a glut of information from which it’s difficult to extract the pertinent rules
Descriptive best in theory, predictive slightly better in practice (at least for this data set, but we’ll see…)
Proposed CombinationCan we get the best of both worlds?Idea:
Use HR for an initial theory formation explorationLook at some rules found
Extract a general format for them
Turn general rules into mode declarationsUse Progol to solve individual problems
In practice, HR may take too long to workBut the process would be entirely automatic
ConclusionsHR shown good abilities for discovery in pure mathematics and AI applications
Characterised by lack of noise, concrete axioms
Not clear that the application to discovery in noisy, incomplete data sets will be as good
Future WorkApplying for grants
To continue the application to CSP reformulationWith Ian Miguel
To continue the Lakatos approach, possibly with applications to software verification
With Alison Pease
My focus is on integration ofLearning, constraint solving, automated reasoningApplication to creative tasks in
Mathematics, bioinformatics and vision!