Identifying Empirical Laws - fenix.tecnico.ulisboa.pt · Martim Maria Mathias Cortez de Lobão...

Identifying Empirical Laws

Martim Maria Mathias Cortez de Lobão

Thesis to obtain the Master of Science Degree in

Mathematics and Applications

Supervisor: Prof. José Félix Gomes da Costa

Examination CommitteeChairperson: Prof. Maria Cristina Sales Viana Serôdio Sernadas

Supervisor: Prof. José Félix Gomes da CostaMembers of the Committee: Prof. André Nuno Carvalho Souto

November 2016

To Carolina

Acknowledgments

I would like to thank my supervisor, Professor José Félix Costa, for his invaluable advice, unfailingavailability, and infinite patience. This thesis would not have been possible without his guidance.

To Manel, for finding my motivation when I could not.To my family, in particular to my parents and my grandmother, for their unconditional support.And to Carolina, whose love and kindness have been my safe haven.

v

Resumo

A teoria da aprendizagem permite estudar os limites teóricos da inferência indutiva de identificação defunções e conjuntos computáveis. Baseia-se em funções chamadas cientistas que recebem como inputuma sequência de observações (tanto pontos do grafo de uma função como elementos de um conjunto)e que devolvem conjeturas acerca do objeto ao qual pertencem essas observações. Um cientista é bemsucedido em identificar o objeto se houver uma altura a partir da qual estabiliza numa conjetura correta.

Nesta dissertação, faz-se uma breve introdução aos conceitos fundamentais da teoria da aprendi-zagem, incluindo diferentes tipos de ambientes de aprendizagem (texto, texto gordo, texto imperfeito,e informante) e diferentes restrições ao poder identificativo de cientistas (cientistas não-computáveis,computáveis, e de memória limitada). Estabelece-se a relação entre a capacidade de identificação devários tipos de cientista em ambientes de aprendizagem diferentes, e são dados exemplos de classes defunções e conjuntos em cada uma dessas categorias. Criaram-se também métodos novos para auxiliara verificação de se uma determinada classe é ou não identificável: o cientista Markoviano para a iden-tificação em memória limitada e os conjuntos limite para a identificação não computável, desenvolvidocom base no trabalho de e em conjunto com o professor José Félix Costa.

Por fim, desenvolve-se uma proposta do conceito de identificação empírica, de modo a poder aplicaros resultados obtidos anteriormente a situações práticas. Esta abordagem é feita com inspiração nos mo-delos físicos computáveis definidos em Szudzik [26]. É criado também o conceito de função descobrívelusando as funções primitivas recursivas.

Palavras-chave: cientista, identificação de conjuntos e funções, identificação no limite, inferência indu-tiva, função recursiva, função descobrível.

vii

Abstract

Learning theory allows the study of the theoretical limits of the inductive inference of computable setsand functions. It is based on functions called scientists, which receive as input a sequence of observa-tions (either points of the graph of a function or elements in a set) and return conjectures about the objectto which those observations belong. A scientist is successful in identifying the object if there is a pointin which it stabilizes on a correct conjecture.

In this work, we cover a brief introduction to learning theory, including different types of learningenvironments (text, fat text, imperfect text, and informant) and different restrictions on the identifica-tional power of scientists (noncomputable, computable, and memory-limited scientists). We establishthe relation between the identification capacity of several kinds of scientist and learning environment,and we give examples of classes of sets and functions in each category. New methods to aid in theverification of whether or not a given class was identifiable were also created: Markov scientists formemory-limited identification and limit sets for noncomputable identification, which was developedbased on the work of and in partnership with Professor José Félix Costa.

Finally, we propose a concept for empirical identification so that the previously obtained resultsmay be applicable in practice. This approach is inspired by the computable physical models defined inSzudzik [26]. We also create the concept of discoverable function using primitive recursive functions.

Keywords: scientist, set and function identification, identification in the limit, inductive inference, re-cursive function, discoverable function.

ix

Contents

List of Figures xiii

List of Algorithms xv

Notation and Abbreviations xvii

Glossary xix

Introduction 1

1 Introduction to inductive inference 51.1 Preliminaries on computability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Identification of sets and functions 232.1 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1.1 Locking sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1.2 Angluin’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.1.3 Limit sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2 Identification by general scientist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.2.1 In sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.2.2 In functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2.3 Comparison of collections of classes for general scientists . . . . . . . . . . . . . . . 41

2.3 Identification by computable scientist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.3.1 In sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.3.2 In functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.3.3 Comparison of collections of classes for computable scientists . . . . . . . . . . . . 50

2.4 Identification by memory-limited scientist . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.4.1 In sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.4.2 In functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.4.3 Comparison of collections of classes for memory-limited scientists . . . . . . . . . 59

3 Identifying scientific laws 613.1 The nature of empirical laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.2 Empirical identification of the class of discoverable functions . . . . . . . . . . . . . . . . . 673.3 An example from the history of science: planetary orbits . . . . . . . . . . . . . . . . . . . 69

3.3.1 Orbits on the complex plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4 Conclusion 75

xi

5 Additional proofs 775.1 An explicit bijection from N to Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.2 Locking sequence theorems for functions and for other learning environments . . . . . . 785.3 A nondecidable, recursively enumerable set . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.4 Properties of set representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.5 Equivalence of Ptolemaic and Copernican orbits . . . . . . . . . . . . . . . . . . . . . . . . 81

References 83

Index 85

xii

List of Figures

1 The cyclical nature of the scientific method . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 The recognition of patterns plays a central role in science . . . . . . . . . . . . . . . . . . . 2

1.1 Obtaining the encoded function rg from the real function f . . . . . . . . . . . . . . . . . . 111.2 Obtaining indexes from sets and functions and vice versa . . . . . . . . . . . . . . . . . . . 13

2.1 Two valid situations for a set L1 Ć L, where DL Ď L and DL Ď L1, such that Angluin’scondition holds for the class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.2 The hierarchy of sets and Angluin sets in the class C´ “ tLi : i P Nu, where Li “ tj P N :

j ď iu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3 The hierarchy of sets and Angluin sets in the class C “ tLi : i P Nu where Li “ tj P N :

i ď j ď 2iu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.4 Two sets L and L1 where F “ L´ L1 is finite and I “ L1 ´ L is infinite . . . . . . . . . . . 382.5 Collections of classes of sets, categorized by general scientists . . . . . . . . . . . . . . . . 412.6 Collections of classes of functions, categorized by general scientists . . . . . . . . . . . . . 412.7 Collections of classes of sets, categorized by computable scientists . . . . . . . . . . . . . . 502.8 Collections of classes of functions, categorized by computable scientists . . . . . . . . . . 502.9 The memoryless property of memory-limited scientist M . . . . . . . . . . . . . . . . . . 502.10 Collections of classes of sets, categorized by memory-limited scientists . . . . . . . . . . . 592.11 Collections of classes of functions, categorized by memory-limited scientists . . . . . . . . 59

3.1 A primitive real recursive function γ provides a way of computing the image y of a dis-coverable function f to an arbitrary precision, where d1ypnq “ p´1qpdyp0q`1q ˆ

dypnq10n´1 . . . . 65

3.2 Measurements of an empirical text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.3 Path of the planet Mars in apparent retrograde motion between June and November of 2003 693.4 The orbits of the Sun, Mercury, and Venus constructed using a deferent and multiple

epicycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.5 Diagram of orbits of Copernicus and Ptolemy with one epicycle, with the Sun S, the Earth

E and the planet P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.6 A triangle-like orbit obtained using a deferent and a single epicycle . . . . . . . . . . . . . 723.7 An approximation of a drawing of Homer Simpson using 1000 epicycles . . . . . . . . . . 73

xiii

List of Algorithms

1.1 Scientist M identifies text T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.2 Scientist N identifies the set S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.3 Scientist F identifies the class of sets C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1 Construction of prefix σi`1 of the text T for a set S . . . . . . . . . . . . . . . . . . . . . . . 242.2 Scientist M identifies the set N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3 Scientist M does not identify the set N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.4 Construction of prefix τ i`1 of the text T for a set S . . . . . . . . . . . . . . . . . . . . . . . . 262.5 Scientist M identifies the class of sets C using Angluin’s condition . . . . . . . . . . . . . . 292.6 Scientist M identifies the class of sets Ni “ tN´ tiu : i P Nu . . . . . . . . . . . . . . . . . . 342.7 Scientist M identifies the class of sets C “ FIN Y tNu in informant . . . . . . . . . . . . . . . 352.8 Scientist M identifies the class E of all recursively enumerable sets in informant . . . . . . 362.9 Scientist M identifies an infinite variant class of sets in imperfect text . . . . . . . . . . . . . 382.10 Scientist M identifies the class R of all recursive functions in text . . . . . . . . . . . . . . . 392.11 Scientist M identifies the class POLY of polynomials in imperfect text . . . . . . . . . . . . . 402.12 Computable scientist M identifies the class of sets PMULT in imperfect text . . . . . . . . . 452.13 Computable scientist N identifies the class of sets S in text . . . . . . . . . . . . . . . . . . . 462.14 Computable scientist M identifies the class of sets AEZ in text . . . . . . . . . . . . . . . . . 462.15 f is a partial recursive function of type Nˆ NÑ N . . . . . . . . . . . . . . . . . . . . . . . 472.16 Scientist M identifies the class PR of all primitive recursive functions in text . . . . . . . . 482.17 Computable scientist N identifies the class of functions S in informant . . . . . . . . . . . . 482.18 Function g constructs a prefix τ of informant from a prefix σ of canonical text . . . . . . . . 492.19 Computable scientist M identifies the class of functions S in canonical text . . . . . . . . . 49

3.1 Empirical scientist G identifies the class of discoverable functions in empirical text . . . . . 68

5.1 Construction of prefix τ i`1 of the text T for a function f . . . . . . . . . . . . . . . . . . . . 785.2 Construction of prefix τ i`1 of the imperfect text T for a set S . . . . . . . . . . . . . . . . . . 795.3 Construction of prefix τ i`1 of the informant T for a set S . . . . . . . . . . . . . . . . . . . . 80

xv

Notation and Abbreviations

N the set of natural numbers∅ the empty setĂ denotes the relation between two sets in which the for-

mer is strictly a subset of and not equal to the latter(synonymous to the more explicit Ĺ)

Ď denotes the relation between two sets in which the for-mer is a subset of or equal to the latter

λx . y the function that assigns the value of y to the variable xχS the characteristic function of the set S, where χSpxq “ 1

if x P S and χSpxq “ 0 otherwisepx1, . . . , xnq the ordered n-tuple, a point of Nn

xx1, . . . , xny a non-negative integer which represents the n-tuplepx1, . . . , xnq P Nn, such that λ px1, . . . , xnq . xx1, . . . , xny

is a bijection from Nn to NP the set of all primitive recursive numbersF the class of all functions of type NÑ NP the class of all (possibly) partial recursive functions of

type NÑ NR the class of all total recursive functions of type NÑ NDS the class of all discoverable functionsPR the class of all primitive recursive functions of type

NÑ NS the class of subsets of ND the class of decidable subsets of NE the class of recursively enumerable subsets of NΨpfq the set representation of the function fWi the domain of the partial recursive function φi˛ the concatenation symbol, used to concatenate prefixes# the blank symbolT pnq the pn` 1q-th element of a text TT rns the first n elements of a text Ttext the set of all textsfat text the set of all fat textsimp. text the set of all imperfect textsinformant the set of all informantsemp. text the set of all empirical texts

xvii

SEQ the set of all prefixes of text for sets: tT rns : T is a textfor a set and n P Nu

SEG the set of all prefixes of text for functions: tT rns : T is atext for a function and n P Nu

ISEQ the set of all prefixes of informant for setsISEG the set of all prefixes of informant for functionsESEG the set of all prefixes of empirical textPREF either the set SEQ or SEG, inferred from context|σ| the length of prefix σ, i.e. the number of elements in σ,

including blanks and repetitionsσn the pn` 1q-th element of prefix σσlast the last element of prefix σσrns the first n elements of prefix σ, or the last ´n elements

if n is negativeσ´ the prefix σ with its last element removedσ̂ when σ P SEG, the function such that σ̂pxq “ y if xx, yy P

content pσq and is undefined otherwiseS the school of scientists for setsSf the school of scientists for functionsSCP the school of computable scientists for setsSCPf the school of computable scientists for functions

SML the school of memory-limited scientists for setsSMLf the school of memory-limited scientists for functions

SCPe the school of empirical scientists

TXTEX abbreviation for the collection rSCP, textsEX abbreviation for the collection rSCP

f , textsrS1, es the collection of all classes that are identifiable by some

scientist in S1 in learning environment edx the primitive recursive defining function for x P Pµ the memory function of a Markov scientistEQi the set tj P N : φj “ φiu

K a nondecidable, recursively enumerable setK˚ the class of sets tK Y txu : x P NuFIN the class of sets of finite size with elements in NFIN` the class of sets FIN Y tNuCOFIN the class of all recursively enumerable sets whose com-

plement is finiteAEZ the class of recursive functions that are almost every-

where zeroSD the class of recursive functions that are self-definingC, S classes of sets or functionsφ, ψ partial recursive functionsS, L recursively enumerable sets with elements in ND a finite setM, N scientists, along with other bold charactersσ, τ , ρ prefixes of learning environments

xviii

Glossary

class a set whose elements are sets or functionscomputable scientist a scientist whose underlying defining function is com-

putablecontent the set of nonblank elements of a text, prefix, or setdecidable set also called recursive, a subset of N that is has a recur-

sive characteristic functiondecoding a bijection of N to a set Sdiscoverable function a that can be encoded with a primitive real recursive

functiondistinguishable function a total real function f such that fpQq Ď Pempirical text a text for a function f which encapsulates margins of

error for the ‘dependent’ variable of fempirical scientist a scientist who identifies scientific laws, allowing for

errors in measurementsencoded function a function of type NÑ N that encoded some total func-

tion with a countable domain and rangeencoding a bijection of a set S to Nfat text a text where each nonblank element in the text appears

infinitely oftenhypothesis a number i P N, typically in the context of the output of

a scientistimperfect text an imperfect text for a function f or a set S is a text for

a finite variant of f or S, respectivelyindexing an enumeration of the set of partial recursive functions

of type NÑ Ninformant an informant for a function f or a set S is a text for

χΨpfq or χS , respectivelymemory-limited scientist a scientist M is said to be memory-limited if for all pre-

fixes σ, τ, ρ P PREF, if Mpσq “ Mpτq then Mpσ ˛ ρq “

Mpτ ˛ ρq

natural number set a subset of Npartial recursive function also called computable function, a function which can

be computed by a Turing machineprefix a finite initial segment of a textprimitive real recursive function a primitive recursive function of type N2 Ñ N that ‘con-

catenates’ multiple primitive recursive defining func-tions

xix

primitive recursive function a partial recursive function that is total and can be com-puted by a Turing machine without using unboundedloops

primitive recursive defining function a function used for computing a primitive recursivenumber

primitive recursive number a number whose decimal expansion can be computedusing a primitive recursive function

recursive function a partial recursive function that is totalrecursively enumerable set also called listable, a subset of N that is the domain of

some partial recursive functionrepresentation the representation of a number x is its image in a given

encodingschool a collection of scientists, possibly satisfying some prop-

ertyscientist a (possibly partial) function of type PREF Ñ Ntext a text for a function f or set S is an ordered, infinite se-

quence of elements of N Y t#u, where # is the blanksymbol, whose content is exactly Ψpfq or S, respec-tively

Turing machine an abstract model of a computing machine that manip-ulated symbols of a tape according to a fixed set of rules

xx

Introduction

Philosophical framework

Throughout history, scientists have relied on the scientific method to discover natural laws. Whilethere may arguably be exceptions,1 the advancement of science is usually brought about by the tried andtrue procedure of data collection, hypothesis formulation, and theory verification. This cycle, illustratedin Figure 1, is what some scientists and philosophers of science (see Kemeny [14]) consider an effectivedescription of the scientific method. In this view of things, scientific breakthroughs and paradigm shiftsconstitute instances in which new data contradicts existing theories, and thus restarts the cycle with theformulation of a new theory.

FACTS

THEORIESPREDICTIONS

Induction

D eduction

Ver

ific

atio

n

Figure 1: The cyclical nature of the scientific method

The initial step of the scientific method — the inductive step — hinges on the identification of pat-terns within observable facts. Pattern recognition is the necessary basis without which there could beno formulation of a scientific theory. It was the cyclical nature of the orbits of planets that eventuallyled to Kepler’s laws, and it was Mendel’s observation of hereditary traits in pea plants that promptedthe foundation of genetics. The realization of the existence of patterns over randomness and order overchaos is what allows scientists to make accurate predictions of our world.

You, the reader, may also be accustomed to observing such patterns and inductively inferring futurebehaviors. Figure 2 illustrates such a pattern that should not be exceedingly difficult to identify. Uponobserving the first few elements, you may likely observe one or two characteristics in each pair of num-bers and formulate a hypothesis based on that observation. Once you have a satisfactory candidate, youcan test it against the other elements of data to confirm or disprove it. Hopefully, the repetition of thisprocess will eventually lead you to conjecture the correct hypothesis.

This puzzle allowed the reader to effectively use a sample of points from the graph of a function toidentify the totality of the graph. Assuming that the intended function is λx . xpx´ 1q, the reader was

1Anecdotal cases of Newton’s falling apple and Archimedes’ ‘Eureka!’ moment are possible examples.

1

9 ÝÑ 72

7 ÝÑ 42

5 ÝÑ 20

6 ÝÑ 30

8 ÝÑ 56

4 ÝÑ ?

Figure 2: The recognition of patterns plays a central role in science. Can you guess the next element inthe sequence?

able to achieve the incredible feat of correctly identifying a single function from an infinite number ofpossibilities, using only a very finite (and unordered) amount of information.

The procedure the reader took is essentially the same an empiricist takes when attempting to identifya natural law regarding some observed phenomenon. Just like the reader, a scientist only has access to afinite amount of data. This data can be in the form of experiments or observations they have personallymade, or in the form of the cumulative human knowledge since the beginning of mankind, but it isfinite in any case. The scientist proceeds to observe patterns within the data and then formulate theirconjecture in an elegant mathematical form, such as an equation or formula. At some point, even despitebeing supplied with only a comparatively minuscule quantity of information, the scientist is able toidentify a single law describing the observed phenomenon from within a sea of countless alternatives.

Obviously, the first conjectured law is not always the right one. In the previous quiz, the hypoth-esised function is only ‘correct’ in the sense that it is the one we had chosen beforehand for you todiscover. In fact, we could have picked any of the infinite number of other equally reasonable functions,and you could have hypothesised a different one. Then, if instead of 12, it turned out that we had con-ceived the missing number to be 72, you would have had to once again formulate a new hypothesis andthen test it against the new data.

A very valid question that one may raise is whether the cycle of the scientific method ever reachesa fixed point. In other words, when — if ever — is there sufficient evidence in support of a givenhypothesis such that, no matter how much more data is collected, it will never contradict the existinghypothesis? It is not unreasonable to believe that such a fixed point does not necessarily exist. Justas Newton improved upon the conventional notions of motion held since the ancient Greeks, and asEinstein later improved upon that with the theory of relativity, some future scientist may once againrectify the laws of motion with a better or more complete theory, and so on. In that case, it may be thatscientific knowledge is continually improved, but that we may never reach an ultimate, exact theoryfrom which no improvement can be made.

On the other hand, is it possible that in some cases the scientific method may lead us to overshootand end up with a theory that is too permissive and contains the ‘correct’ theory as a proper subset? Forexample, consider that, given a sample of numbers from a set S Ď N, a scientist conjectures that S “ N.Such a conjecture would be unfalsifiable, as suggested by Karl Popper [23], because even though theset S might only contain all the even numbers, it is not possible to show that S does not contain the oddnumbers as well.

It can be said that the crux of learning theory lies in understanding the limitations imposed by usinga finite amount of data to infer an infinite amount of information. As we will show, in many situationsthis limitation is not as severe as one would first assume. There are in fact several reasonable restrictions

2

we may impose on the nature of the data which guarantee that many patterns (including a wide varietyof functions and sets) are identifiable in a finite (but unbounded) amount of time.

In the following chapter, we establish the basic concepts of learning theory and inductive identifica-tion. In Chapter 2 we explore conditions and restrictions in which purely mathematical objects such assets and functions are identifiable, and we introduce new concepts including Markov scientists and limitsets. Finally, in Chapter 3 we adapt and apply these results to the identification of scientific laws, devel-oping original notions of empirical identification, primitive real recursive functions, and discoverablefunctions.

Many of the results presented in this thesis were based on the notes and work of Professor José FélixCosta, who was an invaluable aid and without which this work would not have been possible.

3

Chapter 1

Introduction to inductive inference

1.1 Preliminaries on computability

Inductive identification was first studied in depth by E. Mark Gold [10] in the mid-twentieth century.The problem was not posed from the point of view of a scientist attempting to identify a set or a function,but rather from that of a child who is learning her first language. It is observable that without priorknowledge of any other language, a child learns to speak her mother tongue — say, English — simplyby listening to instances of sentences in English. It is often argued (such as in Marcus [18]) that childrendo so using only positive data, i.e. using only information of which sentences belong to their language,but not of the ones that do not. And yet, in spite of the apparent limitation of only ever having access toa finite amount of information, a child is able to correctly learn a language in a finite amount of time andeven to be able to speak new sentences that she has never heard but that nonetheless abide to Englishgrammar.

Without access to negative information, it is logically possible that a child could learn a grammati-cally richer and equally valid version of English, such as one which includes sentences like Went up thehill Jack and Jill, or Had a little lamb Mary. Unless she is given explicit information that these sentencesdo not belong to the English language, she would have no way of knowing that what she is speaking isnot English. However, this does not seem to be the case in reality: children are indeed able to learn theirtarget language and not merely a language which contains their target language as a proper subset.

Gold studied a very general concept of language learnability, loosely defined as one in which alearner learns a language L from within a class of languages C if it is able to conjecture the correct lan-guage after a finite amount of information received, and from then on does not conjecture any differenthypothesis. In Gold [10], the notion of language learnability in itself is not defined: it is only defined inrelation to a specific class of languages. In other words, a learner is permitted to ‘pick’ from within a setof languages instead of being tasked to learn a language without any additional context, and therefore alanguage may be learnable in one class of languages but not in another. Evidently, this class can be (andusually is) infinitely large, and may even be thought of as the class of all languages which are humanlylearnable.

Gold proved a necessary condition for language identifiability that we will expand on in Chapter2. Even though Gold explicitly defines a language as a collection of strings of sentences, Johnson [13]correctly points out that Gold’s learnability theorem remains valid if we instead define a language usinga formal grammar, sets of integers, or even if we allow sentences to be infinite strings. Gold’s theoremis therefore a fundamental theorem in learning theory and is widely used in a variety of contexts.

5

Sets and functions

In what follows, a ‘language’ is simply a set whose elements are natural numbers. This is equivalentto Gold’s definition of a language: any finite string can be easily encoded to and from an integer suchthat we may think of integers as sentences and sentences as integers. Our only constraint is that each‘language’ (i.e. each set) must be specifiable in some systematic way. As has been done throughoutthe history of learning theory, these sets must be ‘computable.’ We begin by defining the notion ofcomputability and giving a brief overview of some fundamental concepts.

1.1 Definition (Natural number function). A natural number function f is a (possibly partial) functionwith domain in Nm and codomain in Nn, for m,n P N. The class of all natural number functions withdomain and codomain in N is denoted by F .

The class (or set) of natural number functions F is comprised of every partial function of naturalnumbers to natural numbers. The class of ‘computable’ functions is contained within the class of naturalnumber functions.

1.2 Definition (Partial recursive or computable function). A natural number function φ : Nm Ñ Nn issaid to be computable or partial recursive if there exists a deterministic Turing machine M with outputtape such that, for every input x P Nm, M eventually halts on x P Nm with output y P Nn if and only ifφpxq “ y. The class of all partial recursive functions with domain and codomain in N is denoted by P .1

Although we have formally defined computable functions using Turing machines, a deep under-standing of Turing machines is not required to grasp the notion of a computable function. Intuitively, acomputable function is simply one that may be computed by a machine, where ‘machine’ can be takento mean a computer, a computer program written in some programming language, a human being usingpencil and paper, or any other kind of means of computation effectively equivalent to a Turing machine.Thus, a partial recursive function can simply be thought of as a (possibly partial) natural number func-tion that may be computed using some computer program. Note that the previous definition does notrequire that a partial recursive function be defined for every input x (hence being a partial function). Forexample, the function that is undefined for every input is trivially computable by a Turing machine thatnever halts on any input, or equivalently by a computer program that goes into an infinite loop for anyinput it receives.

As is known from basic computability results, not every natural number function is computable.This result is not necessarily related to some calculational difficulty: a simple observation that thereare more natural number functions than there are computable functions is enough to conclude this.2 Anatural number function that is not partial recursive is called a noncomputable function.

A partial recursive function that is also a total function is given a particular importance.

1.3 Definition (Recursive function). Whenever a partial recursive function φ : Nm Ñ Nn is a totalfunction, it is said to be total recursive. In general, and whenever there is no risk of confusion, we willsimply use the term recursive function to mean a total recursive function. The class of all recursivefunctions with domain N and codomain in N is denoted by R.

There is a special kind of recursive function — called the primitive recursive function — that is alsorelevant in the theory of computability. These functions are also total, and they constitute a proper

1This notation is taken from Jain et al. [12].2For a sketch of a proof, consider that for a function to be computable, it must be possible to write it in some programming

language. The sequence of characters for any program written in some programming language is necessarily finite and can thusbe encoded into a natural number. Therefore, there is a countable infinity of computable functions. On the other hand there are2N natural number functions, which is uncountable, and so there are more natural number functions then there are computerprograms. For a more formal proof, as well as some additional background on computable functions, see Rogers [25, ch. 1.8].

6

subset of the recursive functions. Informally, they are the class of functions that can be defined usinga computer program that does not contain any unbounded loops (such as the while or goto loop) butmay contain bounded loops (such as the for or do loop). This is in contrast to the recursive functions,which do allow for unbounded loops. More formally, they are functions that are recursively built froma set of total computable operations so that every function is defined by a finite sequence of operations.

1.4 Definition (Primitive recursive function, Rogers [25]). The class of primitive recursive functions is thesmallest class C (i.e. the intersection of all classes) of functions with domain Nm and codomain in Nn

such that

(i) all constant functions λx1, . . . , xm . c are in C, for m ě 1 and c ě 0,

(ii) the successor function λx . x` 1 is in C,

(iii) all projections λx1, . . . , xm . xi are in C, for 1 ď i ď m,

(iv) if f is a function of m variables in C, and g1, . . . , gm are each functions of n variables in C, then thecomposition function λx1, . . . , xn . fpg1px1, . . . , xnq, . . . , gmpx1, . . . , xnqq is in C, for m,n ě 1,

(v) if f0 is a function of m variables in C, and f1 is a function of m`2 variables in C, then the recursionfunction h of m` 1 variables satisfying

hpx1, . . . , xm, 0q “ f0px1, . . . , xmq

hpx1, . . . , xm, k ` 1q “ f1px1, . . . , xm, k, hpx1, . . . , xm, kqq

is in C, for m ě 0.

We denote the class of all primitive recursive functions with domain N and codomain in N by PR.

Even though the class of primitive recursive functions may seem relatively poor and incapable ofexpressing complex phenomena, it turns out that almost all functions that one may conceive of areprimitive recursive. Examples of primitive recursive functions include addition, exponentiation, theFibonacci function, and the function that returns the n-th prime for a given n P N`. However, there doexist functions that are recursive but not primitive recursive, and we will therefore generally study theclass of recursive functions instead of the class of primitive recursive functions.

Aside from functions, we consider a kind of ‘computable set’ based on the notion of computablefunctions.

1.5 Definition (Natural number set). A natural number set S is a subset of N. The class of all naturalnumber sets (i.e. 2N, the parts of N) is denoted by S.

1.6 Definition (Recursively enumerable or listable set). A natural number set S is said to be recursivelyenumerable or listable if it is the domain of a partial recursive function of type N Ñ N. The class of allrecursively enumerable sets is denoted by E .

Intuitively, a set is recursively enumerable if there is a program (or any other procedure) that listsout its elements, with repetitions allowed. This is equivalent to being able to positively verify if a givenelement belongs to a set, but not verifying if it does not. In other words, a set S is recursively enumerableif there is a partial recursive function φ such that, for all x P N, we have x P S if and only if φpxq isdefined.

Occasionally, it may be easier or more intuitive to specify a recursively enumerable set using a dif-ferent partial recursive function. Proposition 1.7 states that we may equally use the range of a partialrecursive function to define a recursively enumerable set. Whenever it is more practical to do so, we willuse the range of a some partial recursive function to show that a given set is recursively enumerable.

7

1.7 Proposition. A natural number set is recursively enumerable if and only if it is the range of a partial recursivefunction.

Proof. See Rogers [25, p. 61].

If a set S is recursively enumerable and its complement S is also recursively enumerable, then thereis a partial recursive function for verifying if an element belongs to S and another partial recursivefunction for verifying if it does not. Then, there is a (total) recursive function for deciding if any elementbelongs to S.

1.8 Definition (Recursive or decidable set). A natural number set S is said to be recursive or decidable ifit has a recursive characteristic function f of type N Ñ N, i.e. for all x P N, we have x P S ñ fpxq “ 1

and x P S ñ fpxq “ 0. The class of all decidable sets is denoted by D.

It follows that a decidable set is also recursively enumerable. Indeed, if a set S is decidable, thenthere is a Turing machine M that computes its characteristic function f . We may then construct a Turingmachine M1 that for every x P N returns 1 if fpxq “ 1 and does not halt otherwise. Now, M1 computes afunction whose domain is exactly S, and so S is recursively enumerable.

In practice, many functions and sets that one may be familiar with are computable.

1.9 Example. The following are examples of partial recursive, recursive and noncomputable functions,as well as decidable and recursively enumerable sets:

• The constant function f “ λx . 0, which to every natural number x assigns the number 0, is recur-sive, and its domain N is a decidable set (and therefore also recursively enumerable). By Proposi-tion 1.7, its range t0u is also decidable.

• The function h “ λx . 2x, which to every natural number x assigns the value of 2x, is recursive,and the set t2x : x P Nu is decidable.

• The function whose range is the empty set ∅, which is undefined for every natural number, ispartial recursive, and the empty set ∅ is decidable.

• The function p : NÑ N that lists all the prime numbers is recursive, and the set of prime numbersis decidable.

• The set K˚ presented in Definition 2.62 is an example of a recursively enumerable but nondecid-able set, as shown in Lemma 5.4 of the Appendix.

• The Ackermann function that for all m,n P N returns

Apm,nq “

$

’

’

’

&

’

’

’

%

n` 1, if m “ 0

Apm´ 1, 1q, if m ą 0 and n “ 0

Apm´ 1, Apm,n´ 1qq, if m ą 0 and n ą 0,

is a famous example of a function that is recursive but not primitive recursive. The diagonalfunction λn .Apn, nq is an example of a recursive function of type N Ñ N which is not primitiverecursive.

• The halting problem function, which for each program M and input x correctly decides if M

eventually halts for x, is noncomputable.3,4

3For a complete proof, see for example Rogers [25, ch. 1.9].4Similarly, there is no computable function that, for every partial recursive function, decides if it is total or not.

8

• Consider the definition of an index for a partial recursive function given in Definition 1.18, foreach partial recursive function φi, the set EQi “ tj P N : φj “ φiu of indexes for which φj and φi

are equal. Then the set EQi is not recursively enumerable, and the function that, for every j P N,returns 1 if φj “ φi and 0 otherwise is noncomputable.

• The function that decides whether a partial recursive funcion φi is total is not computable, and theset tx : φx is totalu is not recursively enumerable.

Encodings and encoded functions

The reader may have noticed that until now, we have only considered functions and sets within thenatural numbers. This is in contrast to most functions that are usually used in science, economics andmathematics, which are defined over the real numbers. It may not be obvious how we hope to studyscientific laws if we do not even consider the domain in which they are defined. While some of thesefunctions are easily projected onto the natural numbers, there are many that are not, including eventrivial examples such as the law that relates the radius r of circle with its perimeter p “ 2πr.

However, a simple solution to this issue is to extend the study of computable functions from thenatural numbers N to the set of rational numbers Q, which is countable and dense in the real numbers.This allows us to approximate functions in the real numbers with rational computable functions to anarbitrary precision, similarly to how it is possible for a computer to calculate irrational numbers such asπ to any desired decimal place.

To clarify the precise nature of how we wish to ‘encode’ the rational numbers into the natural num-bers, we take advantage of the Cantor pairing function.

1.10 Definition (Encoding and representation). We say that a function f : S Ñ N is an encoding of S if itis a bijection from S to N. The inverse function f´1 : NÑ S is called a decoding of S. If f is an encodingof S and fpxq “ n, we say that n represents (or is the representation of) x.

1.11 Lemma. If f : S Ñ N is an encoding, then the domain S of f is countably infinite.

1.12 Definition (Cantor pairing function). The Cantor pairing function λx, y . xx, yy from N2 to N is anencoding of the set N2, and is defined by

xx, yy “1

2px` yqpx` y ` 1q ` x.

The inverse function is a decoding of N2 and is given by

$

&

%

x “ n´ wpw`1q2

y “ wpw`3q2 ´ n,

where n “ xx, yy and w “Y?

8n`1´12

]

.Since the Cantor pairing function is an encoding of N2, we say that the natural number xx, yy repre-

sents the ordered pair px, yq.

It is easy to verify that the Cantor pairing function is a bijection. With the use of the Cantor pairingfunction, any ordered pair of natural numbers can be uniquely associated to a single natural number,and vice versa. In this way, we can define the explicit encoding of any countable set, which allows thestudy of computable functions defined beyond the natural numbers.

1.13 Example. The following are one-to-one encodings of various sets onto the natural numbers. Severalalternative definitions are acceptable, and usually an explicit encoding will not be required in this work.

9

However, whenever it may be suitable to define such an encoding explicitly, we will use the definitionsdescribed here.

1. The identity function of type N Ñ N is both an encoding and a decoding of N, as is any otherpermutation of N.

2. For the set Z of integers, we define the encoding z : ZÑ N for an element n P Z such that

n ÞÑ

$

&

%

2n, if n ě 0

2|n| ´ 1, if n ă 0.

3. For the set Nn of ordered tuples of size n of natural numbers, we extend the notation of the Cantorpairing function to define the encoding px1, . . . , xnq ÞÑ xx1, x2, . . . , xny, where xx1, x2, . . . , xny isrecursively defined as

xx1, x2, . . . , xny “ xxx1, x2, . . . , xn´1y, xny.

4. For the set N˚ of ordered tuples of arbitrary finite size of natural numbers, we define the encodingfor an element px1, . . . , xnq P N˚ as

px1, . . . , xnq ÞÑ xxx1, . . . , xny, ny.

5. For the set Q of rational numbers, we can specify the encoding q defined in Section 5.1 of theAppendix.

6. For the set t0, 1, . . . , k ´ 1u˚ of finite sequences of the first k ą 1 natural numbers, we define theencoding for an element px1, . . . , xnq as

px1, . . . , xnq ÞÑnÿ

i“1

xikn´i.

This set can be seen as the set of all the words writable using a finite alphabet ta0, a1, . . . , ak´1u ofk characters, ordered lexicographically.

Using encodings of various countable sets, we are able to use a partial recursive function definedin the natural numbers to encode any other ‘computable’ function whose domain and codomain arecountable. Encoded functions are the extension of the concept of encodings from sets into functions.Similarly to encodings for sets, encoded functions allow the interpretation within the natural numbersof functions with a countable domain and codomain.

1.14 Definition (Encoded function). Let A and B be countable sets, and let f and g be encodings ofA and B respectively. If ψ : A Ñ B is a total function and rψ : N Ñ N is a partial recursive functionsuch that, for all x P A, rψpfpxqq “ gpψpxqq, we say that rψ is the encoded function of ψ. In this case, wemay frequently say that f is a (partial) recursive function when we mean that it has a (partial) recursiveencoded function rψ. Conversely, ψ is the decoded function of rψ.

1.15 Example. Consider the real function f “ λx . 1{x defined in f : R Ñ R. In order for a function tobe encodable, it must have a countable domain and codomain. We therefore consider g : Q Ñ Q, theprojection of f onto the rational numbers, so that g “ λx . 1{x.

A decoding of Q is the function q´1 : QÑ N, defined in Section 5.1 of the Appendix. The domain ofrg — the encoded function of g — is q´1pQ´ t0uq “ N´ tq´1p0qu “ N`.

10

f : R Ñ R projectionÝÑ g : Q Ñ Q encoded function

ÝÑ rg : N Ñ N

Figure 1.1: Obtaining the encoded function rg from the real function f

Using the standard encoding provided by q : N Ñ Q, the first 10 pairs of elements of the graph of gare

p´1,´1q , p1, 1q , p´1{2,´2q , p1{2, 2q , p´1{3,´3q , p1{3, 3q , p´2,´1{2q , p2, 1{2q , p´1{5, 5q , p1{5, 5q .

The elements of the graph of rg are given by`

q´1pxq, q´1 p1{xq˘

, where px, 1{xq is an element of the graphof g. The corresponding first 10 elements of rg are

p´1,´1q ÞÑ p1, 1q

p1, 1q ÞÑ p2, 2q

p´1{2,´2q ÞÑ p3, 7q

p1{2, 2q ÞÑ p4, 8q

p´1{3,´3q ÞÑ p5, 17q

p1{3, 3q ÞÑ p6, 18q

p´2,´1{2q ÞÑ p7, 3q

p2, 1{2q ÞÑ p8, 4q

p´1{5, 5q ÞÑ p9, 49q

p1{5, 5q ÞÑ p10, 50q

It is worth pointing out that a function and its encoded function do not exhibit the same behavior ingeneral. Observe that, although g is monotonically decreasing in Q`, its encoded function is not (andneither is it monotonically increasing). In general, we will not explicitly consider encoded functions,but it is enough to note that the original function and its encoded function do not necessarily exhibit thesame properties.

Remark. In the previous example, it was enough to consider the projection of a real function onto therational numbers in order to obtain an encodable function. However, for many real functions, this sim-plistic approach is not enough. For example, restricting the domain of the function f “ λx . sinpx` πq

to the rational numbers still results in a codomain of r´1, 1s ´ Q, which is not countable. Nonetheless,the range fpQq of f is indeed countable, and so it is still possible to devise an encoding for the image ofany function whose domain is countable, as stated in Jain et al. [12].5 We may also use this method toestablish an encoded function for the example λ r . 2πr given in the beginning of this section.

Indexes for sets and functions

Consider the encoding in item 6 of Example 1.13. If we allow the set ta0, a1, . . . , ak´1u to be thealphabet of symbols used in a specific programming language, we may use this encoding to associate aunique integer to each string of programming commands. In this way, it is possible to computably listout the class of computable functions P using a programming system.

5One possible encoding may be to pair each irrational number fpxq, for x P Q, with an integer representing the code of aprogram that computes fpxq to n P N decimal places. This concept is closely connected to Definition 1.18 of an index for afunction.

11

1.16 Definition (Programming system, Jain et al. [12]). A programming system for the class of partialrecursive functions P is a partial recursive function ν such that tλx . νpxp, xyq : p P Nu “ P . We maydenote νpxp, xyq by νppxq.

A programming system entails the existence of a universal partial recursive function ν that encodesall other partial recursive functions.6 Such a function computes φpxq for all φ P P and x P N. Intuitively,a programming system can be thought of as an ‘interpreter’ for some programming language, and p

is a natural number that encodes some partial recursive function in the ν programming system. Thisprogramming system may be almost any reasonable formulation of the computable functions, such asassembly code, Python, Turing machines, or Kleene’s formulation of the partial recursive functions. Fora fixed programming system, a program p P N encodes a sequence of symbols or instructions withinthe given system, that corresponds to some partial recursive function.7 Therefore, νppxq gives the sameresult, if any, of running the input x on program p.

Within a specific programming system, the integers p allow for the indexation of the partial recursivefunctions, since, for each p, there exists a partial recursive function νp that may be associated to it.Because of this, the number p is often referred to as a ν-index of νp, and the domain of νp is denoted byW νp .

1.17 Definition (Acceptable programming system, Jain et al. [12]). An acceptable programming system isa programming system ν such that, for all other programming systems ψ, there is a recursive function tsuch that, for all p, we have νtppq “ ψp.

A programming system is acceptable when there is an effective (recursive) function t that translatesbetween it and any other acceptable programming system. In effect, this means that all acceptableprogramming systems are equivalent, in the sense that any partial recursive function that is expressiblein one acceptable programming system is also expressible in another, and there is a recursive way ofswitching between both.

Due to the equivalence of acceptable programming systems, we fix one particular acceptable pro-gramming system and denote it by φ. Any acceptable programming system will suffice, and it is impliedthat we know how to compute φp P P from an index p.

1.18 Definition (Index for a function and set). With a fixed acceptable programming system φ, a φ-indexp of a partial recursive function φp is simply called an index for φp. The domain of φp is the recursivelyenumerable set denoted by Wp, and p is an index for it.

Using indexes for (partial recursive) functions and (recursively enumerable) sets, it is possible to ‘listout’ all the elements of P and E . We call this list an indexing.

1.19 Definition (Indexing). Using a fixed acceptable programming system φ, the acceptable indexing, orsimply indexing, of all the partial recursive functions and the indexing of all the recursively enumerablesets are the lists tφiuiPN “ φ0, φ1, φ2, . . . and tWiuiPN “W0,W1,W2, . . . , respectively.

It is important to note that indexings are not injective functions of N into P or E . Whereas it is truethat an integer is an index for a single function or set, a function or set can have more than one index.In fact, for any partial recursive function ψ or recursively enumerable set S, there are infinitely manyindexes for ψ and S.

1.20 Lemma (Osherson et al. [21]). Let φ0, φ1, . . . be an acceptable indexing of P . Then, for all i P N, the setEQi “ tj P N : φj “ φiu is infinite.

6The same is not true for the class of recursive functions: there exists no universal recursive function and no recursive indexingof the recursive functions.

7In many cases, ‘programs’ will simply be an arbitrary sequence of symbols that do not even form a valid program, andtherefore present a function that is undefined for every input.

12

The indexing φ provides a means of associating a partial recursive function or a recursively enumer-able set to any natural number. Conversely, we can also effectively obtain an index for any function orset we can computably define. Figure 1.2 illustrates this process.

i P N

program

φi P P

decoding

compute ‘create’

encoding

(a) An index for a function

i P N

program

φi P P

Wi P E

decoding

recognize ‘create’

encoding

(b) An index for a set

Figure 1.2: Obtaining indexes from sets and functions and vice versa

We begin with an index i for some set or function and we obtain its decoding into a program inthe programming language of our indexing. Using this program, it is possible to compute the partialrecursive function associated with i such that, for every x P Wi, if φipxq “ y, then we can obtain y. Toobtain the recursively enumerable set Wi, we obtain the domain of φi by simply recognizing the valuesof x for which φi is defined (i.e. the values of x for which the program eventually halts).

Note that to obtain an index from a set or function, the figure presents a ‘create’ step. This reflects themultiplicity of indexes for each set or function, meaning that there is no single program that computesφi. Any program that computes the desired function or whose domain is the desired set will provide asuitable index for that function or set.

Remark. In obtaining an index for a set S, we will frequently use the range of a partial recursive functioninstead of its domain to show that S is recursively enumerable. These definitions are equivalent byProposition 1.7, and so if we do not need to determine a particular index for S but rather that such anindex exists (i.e. that S is recursively enumerable), it will suffice to build a program whose range is S.

1.21 Example. We list the process of obtaining indexes for some examples of recursively enumerablesets and recursive functions below. The explicit value of each index depends entirely on the indexingused.

• Let

fpxq “

$

&

%

x, if x is even

undefined, otherwise.

Construct any program that computes f , i.e. any program that, when given x as input, returns xis x is even and otherwise does not halt. If i is an encoding for this program, then i is an index forboth the partial recursive function f and the recursively enumerable set of even numbers.

• Let j be the encoding of a program that computes x2 for every x P N. Then, j is an index forthe function λx . x2, and j is also an index for the domain of φj , which is N. Additionally, byProposition 1.7, the range of φj is also a recursively enumerable set, and has some other indexdifferent from j.

13

• Construct a program that, given x P N, returns 1 if x is prime and otherwise does not halt. Let k bethe encoding of this program. It follows that k is an index for the set of prime numbers, and thatthis set is recursively enumerable.

• Construct a program that, given x P N, computes the function φx with input x and returns 1 ifthe program eventually halts. Such a program is easy to construct, and so the diagonal haltingproblem set K “ tx : x PWxu is recursively enumerable.

1.2 Identification

Consider a new guessing game, similar to the one given in the Introduction. Let the C be a class ofsets such that C “ tSi : i P Nu, and where each Si “ N´ tiu is the set of natural numbers with the singleelement i removed. Suppose we choose some specific set Si in C and then task a reader with identifyingwhich set it is. We will inform the reader of elements of Si one by one in no particular order, withpossible repetitions. We only guarantee that if an element x is in Si, then we will eventually inform thereader accordingly. The reader will be required to give out conjectures and will be allowed to changetheir conjecture any number of times they wish. We will consider that the reader wins our guessinggame if there is some point in time in which he or she gives out the correct hypothesis, and from thenon never gives out a different hypothesis. Using only the information given, is it possible to devise astrategy that will guarantee a win? Does such a strategy even exist?

Let us begin our game. The first element that we inform the reader of is 5. The reader may decideto formulate some conjecture. The next element is 6. The reader may choose to maintain their currentconjecture or give out a new one. The next ten elements are 10, 1, 0, 5, 2, 7, 3, 9, 8, and 11. Now,the reader’s new conjecture might be the set of all the natural numbers excluding the number 4. Butthat conjecture is wrong, because the next element we show the reader is precisely the element 4. Thefollowing elements are 12, 15, 6, 18, 1, 17, 16, 19, 9, and 3. Perhaps the reader’s next conjecture is the setS13 or S14, or possibly even S100. However, he or she can never know if we will not at some point giveout the element 13, 14, or 100.

A clever reader may have adopted the strategy to conjecture the set Si when i is the smallest elementthat has not been observed. This approach is indeed one that will lead to success: if i is the smallestelement that does not exist in the set, then at some point in time all elements smaller than i will havebeen observed. From then on, no additional information about elements in the set will ever cause thereader to change their conjecture. In this case, the reader has played the role of an empirical scientistattempting to identify some natural law from within an infinite number of possible laws. This is thebasis for the concept of identification: to conjecture hypotheses using a finite sequence of information.

1.22 Definition (Hypothesis). A hypothesis is a natural number i.

A hypothesis may be thought of as the ‘output’ given by a scientist at any given time. The number iis simply the index of some partial recursive function φi. In order for a scientist to produce a hypothesis,it must be given some kind of ‘input.’

Texts and prefixes

Texts are infinite sequences of natural numbers that respect certain properties.8 A text is a particulartype of learning environment, which serve as the basis for the ‘input’ scientists receive.

1.23 Definition (Learning environment). A learning environment is a mapping of N into NY t#u, where# is the blank symbol.

8A sequence is an ordered collection of elements in which repetitions are allowed.

14

The content of any finite or infinite sequence — including texts and other learning environments —is simply the set of all natural numbers it contains.

1.24 Definition (Content). Let s be a (possibly infinite) sequence of elements. The content of s, writtencontent psq, is the set of all natural numbers that are elements of s.

Note that by the definition of content, # is never contained in the content of any learning environ-ment, since it is not a natural number.

Before we formally define a text, we first specify a means of representing a function using a set ofnatural numbers. In the definition for the Cantor pairing function, we introduced the use of a naturalnumber xx, yy to represent the ordered pair px, yq. Extending this notion further, we obtain the conceptof the set representation of a function by considering the set of ordered pairs of its graph.

1.25 Definition (Set representation). S Ď N is said to be the set representation of the set tpx, yq : xx, yy P Su

of ordered pairs. The set representation of the graph of a function f (or simply the set representation off ) is the set Ψpfq “ txx, fpxqy : x P domainpfqu.

Texts are the simplest kind of learning environment, and they are defined for sets or functions.

1.26 Definition (Text, Gold [10]). A text T is a learning environment such that

(a) T is a text for a recursively enumerable set S just in case content pT q “ S, and

(b) T is a text for a recursive function f just in case content pT q “ Ψpfq.

The set of all texts (for both sets and functions) is denoted by text.

A text for a set S or a function f is then any sequence in which every element of S or every elementof Ψpfq appears at least once, respectively. Note that texts for functions are only defined for recursivefunctions and not for partial recursive functions. Consequently, texts for functions are only defined fortotal functions.

1.27 Definition (Notation on texts).

(a) Elements in texts are counted from zero, and the pn` 1q-th element of a text T is denoted by T pnq,with n ě 0. Therefore, the ‘first’ element of T is T p0q P NY t#u.

(b) The initial sequence of length n of a text T is denoted by T rns, with n ě 0. Note that T rns does notinclude T pnq, which is the first element after T rns, and so T r0s “ ε, the empty sequence. Hence,T rns “ T p0q, T p1q, . . . , T pn´ 1q.

From the point of view of an empirical scientist using observations or experiments in order to identifysome natural law, a text can be seen as a potential sequence of all possible observations or experimentsthat the scientist can witness. As such, repetitions of elements are allowed within texts and we acceptthat occasionally no element be witnessed, as is the case when a text contains a blank symbol #. Addi-tionally, if T is a text for a set S and x P S, then there is some n P N such that T pnq “ x. This conditionrequires that all elements of a set appear in the text at least once.

1.28 Example.

• The text T “ 5, 9,#, 6, 0,#, 5, 1, 1, . . . is a text for the set of natural numbers N, so long as ev-ery element of N appears in T at least once. In this case, T p0q “ 5, T p2q “ #, T p4q “ 0, andT r4s “ 5, 9,#, 6. The text U “ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, . . . which lists out all the natural numbersin sequential order is also a text for N. Note that nothing is said of the relative order in whichelements appear in a text: in general, elements in a text can have any arbitrary order.

15

• The text T “ 11, 13, 7, 23, 7,#, 2, . . . is a text for the set of prime numbers, so long as every primenumber appears in the text at least once. The text U “ 2, 2, 3, 2, 3, 5, 2, 3, 5, 7, . . . in which everyprime appears infinitely many times is also a text for the set of prime numbers.

• The text T “ #,#,#,#,#, . . . is the only text for the empty set ∅. In this case, T pnq “ # for alln P N, since content pT q “ ∅.

• The text T “ 2, 8, 4, 10, 3, 12, 6, . . . is not a text for the set of even numbers, since it contains integersthat are not even.

• The text T “ x3, 9y, x2, 4y,#, x0, 0y, x3, 9y, x7, 49y, . . . is a text whose content is the set representationof the function λx . x2. Here, T p0q “ x3, 9y, T p1q “ x2, 4y, and T r2s “ x3, 9y, x2, 4y.

Despite the number of observations an empirical scientist can make being unlimited, it will alwaysbe limited to a finite number of observations at any given time. Similarly, a scientist learner can never‘work with’ an entire text but only with a finite initial segment of a text, called a prefix.

1.29 Definition (Prefix).

(a) A prefix is a finite initial segment T rns for some n P N and some arbitrary text T . The set of allprefixes is PREF “ tT rns : T is a text and n P Nu.

(b) If T is a text for a set, then PREF can be written as SEQ: the set of prefixes for sets. Otherwise, if Tis a text for a function, then PREF can be written as SEG: the set of prefixes for functions.

(c) If σ is a prefix of a text T , we write σ Ă T .

(d) A prefix σ is said to be for a set S or function f if there is some text T such that σ is a prefix of Tand T is a text for S or f , respectively.

Remark. Observe that since any text for a recursive function f is technically also a text for a recursivelyenumerable set (i.e. the set representation of f ), it immediately follows that SEG Ă SEQ. The converse isnot true because not all recursively enumerable sets are set representations for functions.

1.30 Definition (Notation on prefixes). Let σ, τ P PREF be two prefixes. Then

(a) the length of σ is the number of elements of σ, including blanks and repetitions. It is denoted by|σ|.

(b) the empty prefix σ “ ε contains no elements, such that |σ| “ 0.

(c) the pn ` 1q-th element of σ is denoted by σn, for n ě 0. Hence, σn “ T pnq when σ is a prefix of Tand n ă |σ|. (Note that the first element of σ is denoted σ0, the second σ1, and so on.)

(d) the sequence of the first n elements of σ is denoted by σrns, and the sequence of last n elements ofσ is denoted by σr´ns. If |σ| “ n then σrns “ σr´ns “ σ.

(e) for |σ| ą 0, the last member of σ — that is, σr´1s “ σ|σ|´1 — is denoted by σlast.

(f) for |σ| ą 0, all of σ with its last element removed — that is, the prefix obtained by removing σlast

from σ — is denoted by σ´. If |σ| “ 1 then σ´ “ ε.

(g) the result of concatenating τ onto the end of σ is denoted by σ ˛ τ . Similarly, the result of concate-nating a single character x P NY t#u onto the end of σ is written as σ ˛ x.

(h) if σ is an initial segment of τ , then we write σ Ď τ . If σ is a proper initial segment of τ , then wewrite σ Ă τ .

16

(i) if σ P SEG is a prefix of a text for a function, then, for each element σn “ xx, yy, we denote theprojection of the first element by π1pσnq “ x and the projection of the second element by π2pσnq “

y.

(j) if σ P SEG is a prefix of a text for a function, then the domain of σ is the set domainpσq “ tx “

π1pσnq : n ă |σ|u and the image of σ is the set imagepσq “ ty “ π2pσnq : n ă |σ|u.

(k) if σ P SEG is a prefix of a text for a function, then xx, yy is said to be compatible with σ if eitherx R domainpσq or xx, yy P content pσq.

(l) if σ P SEG is a prefix of a text for a function, then the partial function σ̂ of the content of σ is definedas

σ̂pxq “

$

&

%

y, if xx, yy P content pσq


Remark. Just as with texts, the definition of content for a prefix σ implies that content pσq does not includeblank symbols. Thus, the number of distinct numbers in σ is denoted by |content pσq |.

Prefixes — like texts — are sequences of natural numbers. Depending on the given context, thesenumbers may simply be elements of a set or may represent ordered pairs in a function. In the formercase, we say that a prefix belongs to SEQ, and in the latter case, that it belongs to SEG.

1.31 Example.

• The prefix σ “ 5, 9,# is a prefix of the text T “ 5, 9,#, 6, 0,#, 5, 1, 1, . . . for the set of naturalnumbers N, but is not a prefix of the text U “ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, . . . . We have |σ| “ 3, σ “σr3s “ T r3s, σ´ “ 5, 9, and σlast “ #. With the implied context that T is a text for a set and not fora function f , then σ P SEQ.

• The prefix σ “ x3, 9y, x2, 4y is a prefix for the text T “ x3, 9y, x2, 4y,#, x0, 0y, x3, 9y, x7, 49y, . . . forthe function λx . x2. The prefix τ “ x3, 9y, x2, 4y,# is also a prefix for T , and σ Ă τ . Additionally,σ ˛ # “ τ and σ̂ “ τ̂ , where σ̂p2q “ 4, σ̂p3q “ 9, and σ̂pnq is undefined for n P N´ t2, 3u.

Scientists

A scientist is a function that receives a prefix of a text for a recursively enumerable set or a recursivefunction and returns a hypothesis for the index of the set or function in question. We will allow scientiststo be possibly partial functions, as we do not require a guess to be made for every prefix. Additionally,we will permit a scientist to be a possibly noncomputable function.

1.32 Definition (Scientist, Gold [10]).

(a) A scientist for sets M : SEQ Ñ N is a possibly partial and possibly noncomputable function.

(b) A scientist for functions M : SEG Ñ N is a possibly partial and possibly noncomputable function.

A group of scientists is said to be a school of scientists.

1.33 Definition (School). A school of scientists, or simply school, is a set of scientists for sets or for func-tions, typically respecting a certain condition. If a school is a set of scientists for sets, we say it is a setschool, and if it is a school for functions, we say it is a function school. The school of all scientists for setsis denoted by S and the school of all scientists for functions is denoted by Sf .

17

It is important to note that even though scientists for sets and scientists for functions are essentiallythe same, they operate under different domains and are therefore not interchangeable. Nonetheless, wemay occasionally wish to refer to both kinds of scientists simultaneously, and we may generally speakof a scientist of type PREF Ñ N when we do so.

A scientist M is to receive as input successive prefixes of a text and to return hypotheses as output.The goal of a scientist is to identify an index for a recursively enumerable set or a recursive functionusing a text for that set or function.

Remark. Recall that a hypothesis can be any index for a partial recursive function, but that texts (andprefixes) are only defined for (total) recursive functions. When identifying functions, scientists are thuspermitted to hypothesise on functions that are possibly partial, even though they seek to identify onlyrecursive functions. On the other hand, any hypothesis for a partial recursive function is also a hypoth-esis for a recursively enumerable set, so all hypotheses for sets are ‘valid.’

For a scientist to identify a text (and, subsequently, a set or class of sets), we must define a notion ofconvergence for it.

1.34 Definition (Convergence to an index). A scientist M is said to converge to an index i on text T(written MpT qÓ“ i) just in case, for all but finitely many n P N, MpT rnsq “ i. That is, it converges if, forall but finitely many prefixes σ of T , Mpσq “ i.

If a scientist converges to a correct hypothesis for a text, then it is said to identify the text. Notethat this convergence is a particular case of the usual notion of convergence in calculus. Whereas in thelatter case, convergence is limit convergence (in the sense that a sequence may approach a certain pointwhile always being different to it), the former is more akin to finite convergence: the sequence reachesits convergence point in a finite amount of steps.

As we will see, convergence is not the same as identification: the constant scientist M that returns theindex i “ 0 for every prefix trivially converges for every possible text, irrespective of the set or functionfor which the text is for.

1.35 Definition (Identification).

(a) A scientist M identifies a text T for a set S or for a function f just in case M converges to an indexi and i is an index for S or an index for f , respectively.

(b) A scientist M identifies a set S or a function f just in case M identifies every text T for S or for f ,respectively.

(c) A scientist M identifies a class C of sets or functions just in case M identifies every set S P C orevery function f P C, respectively.

(d) A school of scientists S1 identifies a class C of sets or functions just in case there is some scientist Min S1 that identifies C.

It is important to note that nothing is said about how a scientist should behave on a text it does notidentify. Indeed, a scientist is entirely free to converge to an incorrect index or not converge at all for atext which is does not identify. In this sense, identification allows for false positives but never for falsenegatives.9

Additionally, the previous definition implies that there are several distinct levels of identification.As the following examples will illustrate, parts (a) and (b) of Definition 1.35 alone are not enough toprovide a meaningful concept of identification.

9A scientist which converges to an index only if it is a correct index for a text is said to be exact. Exact scientists are beyond thescope of this work.

18

1.36 Example. Consider the text T “ 1, 2, 3, 3, 3, 3, . . . for the set S “ t1, 2, 3u. Note that T is indeeda text for S, since every element in S appears at least once in T , and no other elements besides theelements of S are present in the text. Then, scientist M defined in Algorithm 1.1 correctly identifies T .

Algorithm 1.1: Scientist M identifies text T

1 Scientist Mpσ P SEQq : N2 return the same index for the set t1, . . . , σlastu;3 end

For σ “ 1, the initial hypothesis of M for T will be an index for the set t1u, which is incorrect. Forσ “ 1, 2, the hypothesis will be an index for the set t1, 2u, which is also incorrect. However, as longas the last element of σ is 3, scientist M will always return a correct index for the set t1, 2, 3u for everyσ. In fact, it will always return the same index for the set, which is precisely what is meant in line 2 ofAlgorithm 1.1: whenever a scientist does not wish to change its conjecture (of a set or function), it willcontinue giving out the same index. Thus, M converges to an index for S and identifies T .

However, it is not true that M identifies every text for S. Consider the text U “ 1, 2, 3, 1, 2, 3, 1, 2, . . .

for S. For σ Ď U r3s, M will return the same hypotheses as for T , since T r3s “ U r3s. However, whenσ “ U r4s, Mpσq will be an index for t1u. From then on, M will continually change its mind betweenhypotheses for sets. Even though it will still output a correct hypothesis infinitely often, it does notconverge to a correct hypothesis. Thus, M identifies text T but not the set S.

Consider instead the scientist N, defined in Algorithm 1.2.

Algorithm 1.2: Scientist N identifies the set S

1 Scientist Npσ P SEQq : N2 return the same index for the set t1, 2, 3u;3 end

Now, N will immediately converge to an index for the set S even after only a single element of σ.N will even identify every other text for S, since no matter what text it receives, it always returns andconverges to an index for S. Obviously, N will also converge to an index for S even when the text is forsome other set, since the output of N is entirely independent of σ, and so by definition N identifies theset S.

This sort of blind convergence to an index is the reason why set identification in itself is rarely ofany practical use. It is always trivial to construct a scientist that identifies any set by simply forcing itto converge to an index for that set. The most essential part of identification is being able to distinguishbetween different sets or functions.

Finally, let us consider the scientist F defined in Algorithm 1.3.

Algorithm 1.3: Scientist F identifies the class of sets C1 Scientist Npσ P SEQq : N2 return the same index for the set content pσq;3 end

In this case, because the content of σ converges in a finite number of steps to the set S, F will alsoconverge to an index for S in a finite amount of steps. Indeed, F will always identify any text for S,since by definition content pT q “ S for any text for S. F will also identify every set in the class of allfinite sets, as can will be shown in the following chapter. Note, however, that F does not identify the setN or other examples of infinite sets.

19

Learning environments

In the following chapters, we will consider multiple variations on the notion of text given by Defini-tion 1.26. Each of these is a type of learning environment, and may make it easier or harder for a scientistto identify a given set or function.

A simple variation on the concept of texts is the fat text.

1.37 Definition (Fat text). A fat text T is a learning environment such that

(a) T is a fat text for a recursively enumerable set S just in case content pT q “ S and for all x P S,tn : T pnq “ xu is infinite,

(b) T is a text for a recursive function f just in case content pT q “ Ψpfq and for all xx, yy P Ψpfq,tn : T pnq “ xx, yyu is infinite.

The set of all fat texts is denoted by fat text.

It may appear obvious that a fat text does not provide any additional information than a regulartext. Indeed, any element that is included in fat text but also be included in regular text, and vice versa.However, as we shall see, there are indeed situations in which a scientist is able to identify some class infat text but is not able to do so in regular text.

1.38 Definition (Imperfect text). An imperfect text T is a learning environment such that

(a) T is an imperfect text for a recursively enumerable set S just in case content pT q “ U , where U is afinite variant of S10 (i.e. if content pT q “ S YD ´D1, where both D and D1 are finite),

(b) T is an imperfect text for a recursive function f just in case content pT q “ Ψpgq, where g is a totalrecursive function that is a finite variant of f .10

The set of all imperfect texts is denoted by imp. text.

Unlike fat texts, the ‘quantity’ of information in an imperfect text is less than in a regular text. Anyfinite number of element may be added to or removed from a regular text to form an imperfect text.Consequently, this may prevent accurate identification of many classes.

Recall how Gold [10] studied language learning in children. It can be argued that in some cases,a child learns to speak a language not merely from listening to correct sentences, but also by beingtold which sentences are not correct. This is a scenario where a learner has access to not only positiveinformation, but also to negative information as well. This is the motivating idea behind informants.

1.39 Definition (Informant). An informant T is a learning environment such that

(a) T is an informant for a recursively enumerable set S just in case T is a text for the characteristicfunction χS of S, i.e. for all x P S, there is n P N such that xx, 1y “ T pnq and, for all x R S, there isn P N such that xx, 0y “ T pnq,

(b) T is an informant for a recursive function f just in case T is a text for the characteristic functionχΨpfq of the set representation of f , i.e. for all x, y P N, there is n P N such that xxx, yy, 1y “ T pnq ifxx, yy P f and xxx, yy, 0y “ T pnq otherwise.

The set of all informants is denoted by informant. The set of all prefixes of informants for sets is denotedISEQ and the set of all prefixes of informants for functions is denoted ISEG.

10We say that a set A is a finite variant of B if both pA´Bq and pB ´Aq are finite. Similarly, a function f is a finite variant of afunction g if the set of points tx P N : fpxq ‰ gpxquwhere f and g differ is finite.

20

Informants clearly provide more information than regular texts, and not simply a repetition of in-formation that was already available, as is the case for fat texts. As such, it can be expected that moreclasses are identifiable using informants than using texts.

1.40 Example. Let us denote the set of even numbers byE and the identity function by id. The followingare examples of different types of learning environments.

• Text T “ 0, 0, 2, 0, 2, 4, . . . , which contains every even number infinitely many times, is a fat textfor E and text U “ x0, 0y, x0, 0y, x1, 1y, x0, 0y, x1, 1y, x2, 2y, . . . , which contains every xn, ny infinitelyoften, is a fat text for id.

• Texts T “ 0, 2, 4, 5, 6, 8, 10, . . . , which contains every even number and the number 5, and U “

0, 2, 6, 8, 10, . . . , which contains every even number except 4, are imperfect texts for E, since bothE Y t5u and E ´ t4u are finite variants of E.

• No text for N is an imperfect text for E, since N´ E is infinite.

• Text T “ 0, 0, 0, 0, 0, 0, . . . , which only contains zeros, is an imperfect text for every finite set.

• Text T “ x0, 1y, x1, 1y, x2, 2y, x3, 3y, . . . , which contains xn, ny for every n P N´ t0u, is an imperfecttext for id but text U “ x1, 1y, x1, 0y, x2, 2y, x3, 3y, . . . is not, since it is not a text for a function.

• Text T “ x0, 1y, x1, 0y, x2, 1y, x3, 0y, . . . , which is a text for the characteristic function of E, is aninformant for E but T “ x0, 1y, x2, 1y, x4, 1y, x6, 1y, . . . , which only contains elements of the formx2n, 1y, where n P N, is not, since it is not a text for a function.

• Text T “ xx0, 0y, 1y, xx1, 0y, 0y, xx1, 1y, 1y, xx2, 1y, 0y, . . . , which is a text for χΨpidq, is an informantfor id.

We generalize Definition 1.35 to take into account multiple different categories of learning environ-ments.

1.41 Definition (Identification generalized).

(a) A scientist M identifies a learning environment T for a set S or for a function f just in case M

converges to an index i and i is an index for S or an index for f , respectively.

(b) A scientist M identifies a set S or a function f in learning environment e just in case M identifiesevery learning environment T P e for S or for f , respectively.

(c) A scientist M identifies a class C of sets or functions in learning environment e just in case M

identifies every set S P C or every function f P C, respectively, in e.

(d) A school of scientists S1 identifies a class C of sets or functions in learning environment e just in casethere is some scientist M in S1 that identifies C in learning environment e.

Whenever we do not specify the kind of learning environment in which identification occurs, thereader can assume that we are referring to identification in text, unless it is clear from context that wemean otherwise.

21

Collections of Classes

Whereas a single scientist may be able to identify a single class, a set of scientists that respect a certaincondition is able to identify a large group of classes of sets or functions.

For example, the school of all scientists for sets contains every possible scientist for sets, includingboth trivial scientists, such as the one in Algorithm 1.2, as well as noncomputable scientists. A collectionof classes for some school of scientists S1 and some learning environment e is the set of all classes whichare identifiable by S1 in e.

1.42 Definition (Collection). Let S1 be a school of scientists for sets or for functions and e be a learningenvironment. The set of all classes C such that there is some scientist M P S1 that identifies C in thelearning environment e is called the collection of classes, or simply collection, for S1 and e, and is denotedby rS1, es.

1.43 Example. The collection of all classes of sets that are identifiable by any possible scientist for sets intext is denoted by rS, texts. The collection of all classes of functions that are identifiable by any possiblescientist for functions in informant is denoted by rSf , informants.

Collections of classes may be compared between each other using standard set notation.

1.44 Definition (Comparison of collections). Let S1 and S2 be two schools and e and f be two learning en-vironments. If S1 and S2 are both for sets or for functions, and the collection of classes rS1, es is containedin rS2, fs, we write rS1, es Ď rS2, fs. If the former is strictly contained in the latter, we write rS1, es Ă rS2, fs.

When one collection is strictly contained in another, this means that the latter school can identify alarger amount of classes than the former school in each respective environment. This may be becauseone school is ‘stronger’ than the other, or because one learning environment provides greater power foridentification. In the following chapter, we will study several different schools of scientists and comparetheir identificational power.

1.45 Example. Note that every fat text is also a text, but that not all texts are fat texts. Therefore, everyclass that can be identified in an arbitrary text can also be identified in only those texts that are fat texts.The converse is not necessarily true: if a class is identifiable in every fat text, in need not be identifiablein some of those texts that are not fat. Therefore, a scientist that only identifies some class in fat text isless restricted one who identifies the same class in regular text, and so may potentially identify moreclasses. It follows that rS, texts Ď rS, fat texts and that rSf , texts Ď rSf , fat texts.

22

Chapter 2

Identification of sets and functions

2.1 Main results

One of the primary focuses of this chapter is producing many examples of classes of sets and func-tions that are or are not identifiable by a given school of scientists. Proving that a class is identifiabletypically only requires explicitly constructing a scientist and verifying that it does indeed identify theclass in question, but showing that a class is not identifiable is usually harder. In this first section, weexplore several important results of identification theory that will be very useful in later proofs.

2.1.1 Locking sequences

A locking sequence provides a necessary condition for identification in all types of learning environ-ments for both sets and functions. It relies on a fairly evident observation: if a scientist identifies someset or function, it will necessarily do so after finitely many observations. If, for every text for the givenset or function beginning with this prefix σ, the scientist always returns the same output after readingσ, then σ is said to ‘lock’ the scientist into the correct hypothesis.

2.1 Definition (Locking sequence, Blum and Blum [2]).

1. Let S P E be a recursively enumerable set, M be a scientist for sets and σ P SEQ be a prefixof a text for sets. We say that a nonempty prefix σ is a locking sequence for M on S just in case(a) content pσq Ď S, (b)WMpσq “ S, and (c) for all τ P SEQ, if content pτq Ď S, then Mpσ˛τq “Mpσq.

2. Let f P R be a recursive function, M be a scientist for functions and σ P SEG be a prefix of a textfor functions. We say that σ is a locking sequence for M on f just in case (a) content pσq Ď Ψpfq,(b) φMpσq “ f , and (c) for all τ P SEG, if content pτq Ď Ψpfq, then Mpσ ˛ τq “Mpσq.

A locking sequence is always defined with respect to a scientist and a set or function, so that alocking sequence for one scientist is in general not a locking sequence for another. Note that this impliesthat there do not exist locking sequences for particular texts: a sequence must be locking for all textsfor a given set or function. Additionally, there is no requirement for a locking sequence to be as smallas possible — it need not be a ‘minimal’ locking sequence. Indeed, if a prefix σ P PREF is a lockingsequence for a set S or function f , then any larger prefix τ P PREF such that σ Ă τ and content pτq Ď S

or content pτq Ď Ψpfq is also a locking sequence for S or f , respectively.

2.2 Lemma.

1. Let σ be a locking sequence for some set S P E . If τ P SEQ is a prefix such that σ Ă τ and content pτq Ď S,then τ is a locking sequence for S.

23

2. Let σ be a locking sequence for some function f P R. If τ P SEG is a prefix such that σ Ă τ andcontent pτq Ď Ψpfq, then τ is a locking sequence for f .

The notion of locking sequence is an important one, and it allows the proof of an impressive result.The following theorem states that, for a given scientist M identifying a certain text or function, if alocking sequence σ is a prefix of a text for that set or function, then M will converge to a correct indexafter reading only σ.

2.3 Theorem (Blum and Blum [2]). Let M be a scientist for sets that identifies the set S P E . Then there existsa prefix σ P SEQ that is a locking sequence for M on S.

Proof. Let M be a scientist that identifies the set S P E . Without loss of generality, assume that M isdefined for all σ P SEQ. Now, suppose that no locking sequence σ exists. This implies that,

for all σ P SEQ such that content pσq Ď S and WMpσq “ S, there exists some τ P SEQ

such that content pτq Ď S and Mpσ ˛ τq ‰Mpσq.(2.1)

We will show that this implies the existence of a text T for S which M does not identify.Let U “ u0, u1, u2, . . . be a text for S. We construct text T using the prefixes σi for each i P N. We

begin with σ0 “ ε and build each σi`1 using Algorithm 2.1.

Algorithm 2.1: Construction of prefix σi`1 of the text T for a set S

Data: prefix σi of text TResult: prefix σi`1 of text T

1 if WMpσiq ‰ S then2 σi`1 Ð σi ˛ ui;3 else4 choose τ P SEQ such that content pτq Ď S and Mpσi ˛ τq ‰Mpσiq; // τ must exist by

Equation 2.1 and the fact that content`

σi˘

Ď S5 σi`1 Ð σi ˛ τ ˛ ui;6 end

Observe that every new prefix is built by adding one or more elements to the end of the previousprefix, such that σi Ă σi`1 for all i P N. Then, let T “

Ť

i σi.

Now, content pT q Ď S by induction, since content`

σ0˘

“ ∅ Ď S and both content pτq Ď S andun P S for all n P N, so that content

`

σi`1˘

Ď S. Additionally, adding un to each step of the constructionof T ensures that S Ď content pT q, and so T is a text for S. However, M does not converge on T to anindex for S, since for each prefix σi`1 either WMpσiq ‰ S or Mpσi ˛ τq ‰ Mpσiq for some τ P SEQ withσi ˛ τ Ď σi`1.

Thus, there are infinitely many prefixes such that either M returns an incorrect index for S or M hasinfinitely many mind changes from two different prefixes, and so M does not identify S in text T .

Although Blum and Blum [2] only defined locking sequences for sets, the definitions and theoremequally apply for functions. Indeed, Theorem 2.3 remains valid for a recursive function f , and the proofis also largely identical. Additionally, since Theorem 2.4 is a particular case of Theorem 2.8, we will skipthe proof of the theorem below.

2.4 Theorem. Let M be a scientist for functions that identifies the function f P R. Then there exists a prefixσ P SEG that is a locking sequence for M on f .

These theorems and Lemma 2.2 imply that there is actually an abundance of locking sequences forany given scientist and set or function it identifies. Indeed, for every scientist that successfully identifies

24

some set or function, there must always exist an infinite number of locking sequences. It is temptingto also conclude that if a scientist identifies a given set S or function f , then every text for S or f mustcontain a prefix which is a locking sequence. The following example illustrates a case in which a scientistsuccessfully identifies every text, but not every text contains a locking sequence.

2.5 Example. This example contradicts the hypothesis that if a scientist identifies some set S or function f ,then every text for S or f contains a prefix which is locking sequence.

Consider the scientist M for sets defined in Algorithm 2.2 which, given a prefix σ of text T , returnsan index i for the set N if σ contains exactly one occurrence of the element zero, and returns an indexj ‰ i for the set N, otherwise. It is clear that M identifies N, since whenever T is a text for N, M willconverge on either i or j, which are both indexes for N.1

Algorithm 2.2: Scientist M identifies the set N1 Scientist Mpσ P SEQq : N2 if |tn ă |σ| : σn “ 0u| “ 1 then3 return index i for the set N;4 else5 return index j for the set N;6 end7 end

Let T be a text for N with exactly one occurrence of zero. M certainly identifies T , converging to theindex i for N. However, no prefix of T is a locking sequence for M on N.

Consider σ Ă T such that 0 P content pσq, without loss of generality by Lemma 2.2. Let τ P SEQ besuch that content pτq Ă N and 0 P content pτq. Then, Mpσ ˛ τq “ j but Mpσq “ i. Hence, there is noprefix σ of T which is a locking sequence.

Example 2.5 shows that not all texts contain locking sequences, even if they are identifiable. This is adirect consequence of condition (c) in Definition 2.1, which requires a scientist to always return the sameindex for a set or function, and not merely any index for some set or function.

Another situation that may occur is one in which a locking sequence exists for a scientist on someset or function, but where the scientist is nonetheless unable to achieve identification. This scenario isillustrated in Example 2.6.

2.6 Example. This example contradicts the hypothesis that if there is a locking sequence for a scientist onsome set S or function f , then the scientist identifies S or f .

Consider the scientist M for sets defined in Algorithm 2.3 which, given a prefix σ of text T , returnsan index i for the set N if σ contains two or more occurrences of the element zero, and returns an indexj for the set ∅, otherwise.

Algorithm 2.3: Scientist M does not identify the set N1 Scientist Mpσ P SEQq : N2 if |tn ă |σ| : σn “ 0u| ě 2 then3 return index i for the set N;4 else5 return index j for the set ∅;6 end7 end

1Clearly, M will always converge to either i or j, even on texts which are not for N. This is entirely compatible with thedefinition of convergence in Definition 1.35.

25

Now, let σ P SEQ be a prefix with exactly two occurrences of the number zero. We observe that(a) content pσq Ă N, (b) Mpσq “ i and therefore WMpσq “ N, and (c) for all τ P SEQ, if content pτq Ă N,then Mpσ ˛ τq “ Mpσq “ i. Therefore, σ is a locking sequence for M on N. However, M does notidentify N, since there are texts for N which M does not identify, i.e. any text which contains exactly oneoccurrence of the element zero.

Theorems 2.3 and 2.4 prove that if a scientist identifies some set or function, then there is a prefixwhich is a locking sequence for the scientist on that set or function. In fact, they are simply particularcases of a stronger version of the theorems below, which prove that we may extend any initial segmentof a text with another prefix so that we obtain a locking sequence.

2.7 Theorem. Let M be a scientist for sets that identifies the set S and σ P SEQ a prefix for S. Then there existsa nonempty prefix content pτq Ď S such that σ ˛ τ is a locking sequence for M on S.

Proof. The proof of this theorem is a simple readaptation of the one for Theorem 2.3 and is largelyidentical to it. As such, the reader is free to skip its details with no detriment to its comprehension.

Let M be a scientist that identifies the set S P E . Without loss of generality, assume that M is definedfor all σ P SEQ. Now, suppose that there exists σ P SEQ such that, for all τ P SEQ, no locking sequenceσ ˛ τ exists. This implies that

there exists σ P SEQ such that, for all τ P SEQ such that content pσ ˛ τq Ď S and WMpσ˛τq “ S,

there exists some ρ P SEQ such that content pρq Ď S and Mpσ ˛ τ ˛ ρq ‰Mpσ ˛ τq.(2.2)

We will show that this implies the existence of a text T for S which M does not identify.Let U “ u0, u1, u2, . . . be a text for S. We construct text T using the prefixes τ i for each i P N. We

begin with τ0 “ σ and build each τ i`1 using Algorithm 2.4.

Algorithm 2.4: Construction of prefix τ i`1 of the text T for a set S

Data: prefix τ i of text TResult: prefix τ i`1 of text T

1 if WMpσiq ‰ S then2 τ i`1 Ð τ i ˛ ui;3 else4 choose ρ P SEQ such that content pρq Ď S and Mpτ i ˛ ρq ‰Mpτ iq; // ρ must exist by


τ i˘

Ď S5 τ i`1 Ð τ i ˛ τ ˛ ui;6 end

Observe that every new prefix is built by adding one or more elements to the end of the previousprefix, such that τ i Ă τ i`1 for all i P N. Then, let T “

Ť

i τi.

Now, content pT q Ď S by induction, since content`

τ0˘

“ σ Ď S and both content pρq Ď S and un P Sfor all n P N, so that content

`

τ i`1˘

Ď S. Additionally, adding un to each step of the construction ofT ensures that S Ď content pT q, and so T is a text for S. However, M does not converge on T to anindex for S, since for each prefix τ i`1 either WMpτ iq ‰ S or Mpτ i ˛ ρq ‰ Mpτ iq for some ρ P SEQ withτ i ˛ ρ Ď τ i`1.

Thus, there are infinitely many prefixes such that either M returns an incorrect index for S or M hasinfinitely many mind changes from two different prefixes, and so M does not identify S in text T .

The proof for the case of functions is essentially identical to the one of sets given above, and isobtained by substituting SEQ for SEG and setting S to be the set representation Ψpfq of a function f . Weprove the case for functions in Section 5.2 of the Appendix.

26

2.8 Theorem. Let M be a scientist for functions that identifies the function f and σ P SEG a prefix for f . Thenthere exists a nonempty prefix content pτq Ď Ψpfq such that σ ˛ τ is a locking sequence for M on f .

Locking sequences additionally exist for other learning environments beside texts, and versions ofTheorems 2.7 and 2.8 (including their particular cases of Theorems 2.3 and 2.4) are also valid for theselearning environments. For fat texts, a locking sequence is defined in the same way as for texts, and The-orems 2.7 and 2.8 are equally applicable. We present the definitions for a locking sequence in imperfecttext and informant.

2.9 Definition (Locking sequence generalized). Let S P E be a recursively enumerable set and M be ascientist for sets.

1. We say that σ is a locking sequence in imperfect text for M on S if, for every set D Ă N, there exists anonempty prefix σ P SEQ, such that (a) content pσq Ď S YD, (b) WMpσq “ S, and (c) for all τ P SEQ,if content pτq Ď S YD, then Mpσ ˛ τq “Mpσq.

2. We say that a nonempty prefix σ P ISEQ is a locking sequence in informant for M on S just in case(a) content pσq Ď ΨpχSq, (b) WMpσq “ S, and (c) for all τ P ISEQ, if content pτq Ď ΨpχSq, thenMpσ ˛ τq “Mpσq.

Let f PR be a recursive function and M be a scientist for functions.

1. We say that σ is a locking sequence in imperfect text for M on f if, for every set recursive func-tion g P R that is a finite variant of f , there exists a nonempty prefix σ P SEG, such that(a) content pσq Ď Ψpgq, (b) φMpσq “ f , and (c) for all τ P SEG, if content pτq Ď Ψpgq, thenMpσ ˛ τq “Mpσq.

2. We say that a nonempty prefix σ P ISEG is a locking sequence in informant for M on f just in case(a) content pσq Ď ΨpχΨpfqq, (b) φMpσq “ f , and (c) for all τ P ISEG, if content pτq Ď ΨpχΨpfqq, thenMpσ ˛ τq “Mpσq.

The proofs of the following theorems are essentially identical to the ones for locking sequences intext, and are given in Section 5.2 of the Appendix.

2.10 Theorem. Let M be a scientist for sets that identifies the set S in imperfect text, and let σ P SEQ bea prefix of imperfect text for S. Then, for every D Ă N where content pσq Ď D, there exists a nonemptyprefix τ where content pτq Ď S Y D such that σ ˛ τ is a locking sequence in imperfect text for M on S, i.e.(a) content pσ ˛ τq Ď S Y D, (b) WMpσ˛τq “ S, and (c) for all ρ P SEQ, if content pρq Ď S Y D, thenMpσ ˛ τ ˛ ρq “Mpσ ˛ τq.

2.11 Theorem. Let M be a scientist for functions that identifies the function f in imperfect text, and let σ P SEG

be a prefix of imperfect text for f . Then, for every recursive function g that is a finite variant of f and wherecontent pσq Ď Ψpgq, there exists a nonempty prefix τ where content pτq Ď Ψpgq such that σ ˛ τ is a lockingsequence in imperfect text for M on f , i.e. (a) content pσ ˛ τq Ď Ψpgq, (b) φMpσ˛τq “ f , and (c) for all ρ P SEG,if content pρq Ď Ψpgq, then Mpσ ˛ τ ˛ ρq “Mpσ ˛ τq.

2.12 Theorem. Let M be a scientist for sets that identifies the set S in informant, and let σ P ISEQ be a prefix ofinformant for S. Then there exists a nonempty prefix τ where content pτq Ď ΨpχSq such that σ ˛ τ is a lockingsequence in informant for M on S, i.e. (a) content pσ ˛ τq Ď ΨpχSq, (b) WMpσ˛τq “ S, and (c) for all ρ P ISEQ,if content pρq Ď ΨpχSq, then Mpσ ˛ τ ˛ ρq “Mpσ ˛ τq.

2.13 Theorem. Let M be a scientist for functions that identifies the function f in informant, and let σ P ISEG bea prefix of informant for f . Then there exists a nonempty prefix τ where content pτq Ď ΨpχΨpfqq such that σ ˛ τis a locking sequence in informant for M on f , i.e. (a) content pσ ˛ τq Ď ΨpχΨpfqq, (b) φMpσ˛τq “ f , and (c) forall ρ P ISEG, if content pρq Ď ΨpχΨpfqq, then Mpσ ˛ τ ˛ ρq “Mpσ ˛ τq.

27

2.1.2 Angluin’s Theorem

Locking sequences provide a necessary condition for identification in text by any kind of scientist.In set identification, if we consider the content of a locking sequence, we may expect to obtain some sortof locking set. The precise result was proved by Angluin [1] and actually constitutes a necessary andsufficient condition for set identification in text by a scientist. In effect, it provides the upper bound ofclasses of sets that can be identified by any sort of scientist working in text.

2.14 Definition (Angluin condition, Angluin [1]). A class of sets C is said to respect the Angluin conditionif, for every set L P C, there exists a finite set DL Ď L such that, for all L1 P C where L1 ‰ L, if DL Ď L1,then L1 Ć L.

When considering a given class of sets, it is often practical to determine whether there exists a finiteset DL for each set of the class that is in agreement with Angluin’s condition. We give this set a specialname.

2.15 Definition (Angluin set). Let C be a class of sets satisfying Angluin’s condition. We say thatDL Ď L

is an Angluin set for L P C if DL is a finite set such that, for all L1 P C where L1 ‰ L, if DL Ď L1, thenL1 Ć L.

Angluin’s condition is satisfied if each set L in the class contains a finite Angluin set DL. If each DL

is unique to a set L, then Angluin’s condition is trivially verified. However, in the case that for eachDL Ď L, there is also another set L1 in the class such that DL Ď L1, then L1 cannot be a subset of L1.There are therefore two possibilities that satisfy the condition that L1 Ć L, illustrated in Figure 2.1. Notethat in the case that DL Ď L1 and L Ć L1, then DL is also an Angluin set for L1.

DL

L

L1

DL

LL1

Ď

Ď

Ď

Ă

Ć

Ă

Figure 2.1: Two valid situations for a set L1 Ć L, where DL Ď L and DL Ď L1, such that Angluin’scondition holds for the class

Note that Angluin’s condition and Angluin sets do not exist outside of the context of a class ofsets, and so an Angluin set DL for some set L must always make reference to some class of sets C.Consequently, Angluin sets for a specific set L will in general differ depending on the class of sets inquestion.

2.16 Angluin’s Theorem (Angluin [1]). A class of sets C is identifiable by a scientist in text if and only if itrespects the Angluin condition, that is, if and only if every set L P C has an Angluin set.

Proof. This proof is based on Jain et al. [12]. We first prove the necessary condition for identifiability.Suppose C is a class of sets that is identifiable by a scientist M in text. Let σL be a locking sequence

for some L P C. Then, for all texts T for L, if σL is a prefix of T , then, for all τ P SEQ such that σL Ď τ

and content pτq Ď L, WMpτq “ L.Let us consider the finite set DL “ content pσLq and show that it is of the kind required by Angluin’s

condition. Suppose there exists some L1 P C such that DL Ď L1 and L1 ‰ L. Now, if L1 Ă L, thenfor all texts T 1 for L1 and for all prefixes σ of T 1 such that σL Ď σ, we have content pσq Ă L. But thenWMpσq “ L, and M does not identify L1, a contradiction. Thus, L1 Ć L.

Now, let us show that the Angluin condition is a sufficient condition for identification.

28

Suppose a class C respects the Angluin condition. We define scientist M in Algorithm 2.5 whichidentifies C and that on an input prefix σ, M searches for the least index i such that i is the indexfor some L P C and there exists a finite set DL Ď content pσq Ď L of the kind required by Angluin’scondition.

Algorithm 2.5: Scientist M identifies the class of sets C using Angluin’s condition

1 Scientist Mpσ P SEQq : N2 variable i P N;3 variable found P BOOL;4 iÐ 0;5 found Ð FALSE;6 while found do7 if Wi “ L P C and there exists an Angluin set DL Ď content pσq Ď L then8 found Ð TRUE;9 else

10 iÐ i` 1;11 end12 end13 return i;14 end

Since, by hypothesis, Angluin’s condition is true for C, there must be some prefix large enough forthe previous conditions to hold, but if no index i exists for a given σ, then M is to return no output. Weprove that M identifies C.

Now, let i be the smallest index of some L P C and T be a text for L, and let M receive consecutiveprefixes of T as input. We must show that M identifies L.

If there exists no index j ă i such that Wj “ L1 ‰ L for some L1 P C, then M will immediatelyconverge to the correct index i as soon as it receives a prefix large enough to return an output. Supposethen that there exists some j ă i such that Wj “ L1 ‰ L for some L1 P C. Then:

(a) if L1 Ă L, then there exists some x P L ´ L1 such that for some σ Ă T , x P content pσq andcontent pσq Ę L1, so that M cannot converge to j.

(b) if L Ă L1, then by Angluin’s condition, there exists a finite set DL1 Ď L1 such that if DL1 Ď L, thenL Ć L1. Consequently, DL1 Ę L and M cannot return j for any prefix σ of T .

In both cases, M cannot converge to j, and for every index k such that j ă k ă i such that Wk “

L2 ‰ L for some L2 P C, then either (a) or (b) is observed. Then M must converge to i and thus C isidentifiable.

The class of all finite sets FIN will be used in various examples and propositions in this chapter. Wedefine it below.

2.17 Definition (FIN). The class of sets tD Ă N : D is finiteu consisting of all finite sets of natural num-bers is denoted FIN.

We highlight some properties of Angluin sets in Example 2.18.

2.18 Example.

1. Consider the identifiable class C´ “ tLi : i P Nu, where Li “ tj P N : j ď iu contains the naturalnumbers smaller than or equal to i.

29

Each Li is a finite set, so a reasonable candidate for an Angluin set for each Li is Li itself. Indeed,given Li P C´, if Li Ď Lj , then it immediately follows that Lj Ć Li, for i ‰ j. Hence, Li is anAngluin set for Li. In general, this is true for all finite sets in an identifiable class of sets.

However, there are other sets which are also Angluin sets for elements of C´. Consider the setD3 “ t3u. D3 cannot be an Angluin set for either L1 or L2, since it is contained in either. On theother hand, D3 is an Angluin set for L3, since it is contained in L3, and either D3 Ć Lj (for j ă 3)or Lj Ć L3 (for j ą 3).

Let us consider whetherD3 is an Angluin set for any other set in C´. As was just stated, D3 cannotbe an Angluin set for any Lj with j ă 3, since D3 Ć Lj . Suppose then that D3 is an Angluin setfor some fixed Lj with j ą 3. But then there exists L3 P C´ such that D3 Ď L3 and L3 Ă Lj , soD3 cannot be an Angluin set for any set other than L3. In this particular example, any set is anAngluin set for at most a single set within C´, but this is not true in general, as one of the followingexamples will show.

Finally, consider some other finite set Dk such that maxtDku “ k. We have that Dk Ď Lk andeither Dk Ę Lj for j ă k or Lj Ę Lk for j ą k. Then, Dk is an Angluin set for Lk. In this example,every finite, non-empty set is an Angluin set for some unique Li P C´.

D3

L3L4

Ď

Ă

Ă

Figure 2.2: The hierarchy of sets and Angluin sets in the class C´ “ tLi : i P Nu, whereLi “ tj P N : j ď iu

2. Now consider the identifiable class C` “ tLi : i P Nu, where Li “ tj P N : j ě iu contains thenatural numbers greater than or equal to i.

Note that each Li is an infinite set, so no Li can be an Angluin set.

As an example, the set D2 “ t2u is an Angluin set for L2, since for j ą 2 we have D2 Ć Lj andfor j ă 2 we have L0 “ N Ć L2 and L1 “ N` Ć L2. Note that the set D

1

2 “ t2, 5, 14, 80u is also anAngluin set for L2, as is every finite set whose smallest element is 2.

As was the case in the previous example, every finite, non-empty set is an Angluin set for someLi P C`.

3. Consider the class identifiable C “ tLi : i P Nu where Li “ tj P N : i ď j ď 2iu. Since each Li isfinite, it is also an Angluin set for itself, and only for itself.

The set t2, 5u is not an Angluin set for any set in C, since it is not contained in any Li. On the otherhand, the set D34 “ t3, 4u is an Angluin set for both L2 “ t2, 3, 4u and L3 “ t3, 4, 5, 6u, since it iscontained in each and both L2 Ć L3 and L3 Ć L2.

4. Consider the class FIN “ tLi : i P Nu comprised of every finite set Li. Given any L P FIN and anyother L1 P FIN, we have that L Ă L1 ñ L1 Ć L. Hence, every L P FIN is an Angluin set for itself.

5. Consider the class C “ tLi : i P Nu where Li “ N ´ tiu. Given any two sets Li and Lj in C, itis always true that Li Ć Lj . Hence, every finite set D is trivially an Angluin set for any set in Cwhich contains it, and so C is identifiable since each Li has an infinite number of Angluin sets. Inparticular, note that the empty set ∅ is also an Angluin set for every set in C.

30

D34

L2

L3

Ď

Ď

Ć

Figure 2.3: The hierarchy of sets and Angluin sets in the class C “ tLi : i P NuwhereLi “ tj P N : i ď j ď 2iu

As an example, the set D “ t2, 4, 5u is an Angluin set for every set in C except L2, L4, and L5.

6. Consider the class FIN` “ FIN Y tNu. As previously, it remains true that every finite set L P FIN`

is an Angluin set for itself, since it is also true that L Ď N ñ N Ć L, and so Angluin’s conditionholds for all finite L P FIN`. However, there is no Angluin set for N, as for every finite set L1 P FIN

and for any set D Ď L1, we do not have L1 Ć N.

By Theorem 2.16, the inexistence of an Angluin set for N in FIN` is enough to show that the classFIN` is not identifiable.

We may formalize some of the properties in the following lemma.

2.19 Lemma. Let C be an identifiable class of sets, L P C be a set, and D be an Angluin set for L. Then thefollowing properties are true.

(a) If u is an element of L, then D1 “ D Y tuu is an Angluin set for L.

(b) If L is finite, then L is an Angluin set for itself.

(c) If, for every L and L1 in C, L Ć L1 and L1 Ć L, then every finite set F such that F Ď L is an Angluin setfor L.

(d) A set D may be an Angluin set for more than one set in C, including a countable infinity of sets.

Proof. We will only prove property (a).Let D be an Angluin set for L and D1 “ DL Y tuu, where u P L. For all L1 P C where L1 ‰ L, we have

D1 Ď L1 ñ D Ď L1. On the other hand, because D is an Angluin set for L, we have D Ď L1 ñ L1 Ć L. Itfollows that D1 Ď L1 ñ L1 Ć L.

2.1.3 Limit sets

Angluin’s Theorem, while providing a necessary and sufficient condition for set identifiability, isnot always an easy result to apply in practice. In general, it may be unclear whether a given set is anAngluin set. Thus, an alternative condition for identifiability is desirable. A possible condition stemsfrom the topological notion of a limit set.

Candidate for Definition (Limit set). Let tSiuiPN be a sequence of sets. We say that S is the limit set oftSiuiPN if (a) for all i P N, Si is finite, (b) for all i P N, Si Ď S, and (c) for each finite set D Ď S, there is ani P N such that D Ď Si.

2.20 Example. Let tSiuiPN “ tt0, 1u, t1, 2u, t0, 1, 2, 3uu. It is easy to verify that a limit set S for tSiuiPN ist0, 1, 2, 3u. Note that tSiuiPN is not closed for unions, intersections, or differences, but

Ť

iPN Si “ S.

Remark. For a sequence tSiuiPN of sets to have a limit set S, neither tSiuiPN nor tSiuiPN Y tSu requireclosure for unions, intersections or differences. A trivial justification is tSiuiPN “ tt0, 1u, t1, 2uu Y ttn P

31

N : n ď ku : k ě 3u and S “ N. The only necessary (but not sufficient) condition is that tSiuiPN be acovering of S.2

The idea behind this notion of limit set comes from known examples of classes which are not iden-tifiable, such as the class FIN` “ FIN Y tNu. Indeed, FIN` is not identifiable because the set N does nothave any Angluin set in FIN`. We can provide a necessary condition for identifiability by showing thatan identifiable class of sets cannot contain any infinite limit sets.

2.21 Proposition. If a class C is identifiable, then it contains no infinite limit sets.

Proof. We prove that if a class contains an infinite limit set then it is not identifiable. Let S P C be aninfinite limit set of a sequence of sets tSiuiPN Ď C and take a finite D Ď S. By the proposed definitionof limit set, there exists a finite Si P C such that D Ď Si and Si Ď S, a contradiction with the necessarycondition Si Ć S for identifiability of Angluin’s Theorem. Hence, C is not identifiable.

This definition allows us to easily prove that certain classes are not identifiable, such as the class FIN`.However, this condition is not sufficient for identifiability. Indeed, the class tN ´ txu : x P Nu Y tNu isnot identifiable, but this cannot be shown with Proposition 2.21. We therefore construct a new definitionof limit set and use it to build a necessary and sufficient condition for identifiability.

2.22 Definition (Limit set). Let tSiuiPN be a sequence of sets. We say that S is the limit set of tSiuiPN (orthat tSiuiPN converges to S), and we write tSiuiPN Ñ S, if (a) for all i P N, Si Ĺ S, and (b) for each finiteset D Ď S, there is an i P N such that D Ď Si.

This new definition differs from the previous definition in two aspects. First, sets in the sequence areno longer required to be finite. Second, each set in the sequence must be strictly contained in the limitset. In particular, this means that a sequence of identical sets S does not converge to S.

Lemmas 2.23 and 2.24 illustrate two properties of limit sets.

2.23 Lemma. If a sequence tSiuiPN of sets has a limit set S, then S is unique and S “Ť

i Si.

Proof. Let tSiuiPN be a sequence of sets and let S be a limit set of the sequence. By (a) of Definition 2.22,we have

Ť

i Si Ď S.Conversely, let

Ť

iDi be the union of every finite subset Di Ď S. For every element x P S there issome Di Ď S such that x P Di. Then S Ď

Ť

iDi. By part (b) of the definition of limit set, for every finiteDi there is an Si such that Di Ď Si. Thus

Ť

iDi ĎŤ

i Si and so S ĎŤ

i Si.Then S “

Ť

i Si and S is unique.

Recall the sequence of sets tSiuiPN “ tt0, 1u, t1, 2u, t0, 1, 2, 3uu given in Example 2.20. Unlike thedefinition of limit set that had been initially suggested, Definition 2.22 implies that the sequence doesnot have a limit set, since if S “

Ť

i Si as required by Lemma 2.23, then the set D “ S would not satisfycondition (b) of Definition 2.22.

The following lemma is a result of the previous observation.

2.24 Lemma. If a sequence of sets has a limit set S, then S is infinite.

Proof. Let tSiuiPN be a sequence of sets that converges to a set S. Suppose S is finite. Now, by part (b)of Definition 2.22 of limit set and letting the finite set D “ S, there exists an Si such that S Ď Si. But bypart (a) we have Si Ĺ S, a contradiction. Therefore, S is infinite.

2A collection C of subsets of a set S is said to cover S, or to be a covering of S, if the union of the elements of C is equal to S (seeMunkres [20, p. 164]).

32

Note that Definition 2.22 implies our previous candidate for definition that had been initially pro-posed, but is not implied by it. Consequently, any result that may be demonstrated using Proposition2.21 can also be verified using Proposition 2.25 below. Thus, we will henceforth only use the definitionof limit sets given by Definition 2.22 and will no longer refer to our initial candidate for a definition.

We can show that Definition 2.22 is a necessary and sufficient condition for identifiability.

2.25 Proposition. Let S be the limit set of a sequence tSiuiPN of sets. Then a class C is identifiable if and only ifit does not contain tSiuiPN and S.

Proof. Necessary condition for identifiability. Suppose an identifiable class C contains a sequencetSiuiPN of sets and its limit S. Let DS Ă S be an Angluin set for S. By Definition 2.22 of limit sets,there exists some i P N such that DS Ď Si. But then there exists Si ‰ S such that DS Ď Si and Si Ă S,and so no Angluin set DS for S can exist. Thus, C is not identifiable.

Sufficient condition for identifiability. Conversely, suppose C is not identifiable. Then by Angluin’sTheorem, there exists some S P C that does not have an Angluin set. In other words, for every finite setD Ď S, there is some other S1 P C such that D Ď S1 is true and S1 Ć S is false.

Now, S must be a countably infinite set, since otherwise, for a finite D “ S and any S1 ‰ S such thatD Ă S1, it would always be true that S1 Ć S. Then, there is an infinite number of finite sets D Ă S.

For every finite setDi Ă S, let Si P C, where Si ‰ S, be the set which contradicts Angluin’s conditionfor Di, i.e. the corresponding set such that Di Ď Si is true and Si Ć S is false. Consider some enumer-ation of finite sets Di and the corresponding sequence tSiuiPN of sets (with possible repetitions of eachSi for different values of i). Then, tSiuiPN is a sequence of sets such that for all i P N, Si Ć S is false, andfor every finite set Di Ă S there is an Si P C such that Di Ď Si. We conclude that S is the limit set of thesequence tSiuiPN.

2.26 Example. Let COFIN “ tN ´ S : S P FINu, be the class of all recursively enumerable sets whosecomplement is finite. If Li “ tj P N : j ď iu, then N P COFIN is the limit set of the sequence of setstLiuiPN, and so COFIN is not identifiable.

It could be conjectured that removing N from COFIN makes the class identifiable, as is the case of theclass FIN Y tNu. In fact, this is not true for COFIN ´ tNu which still contains the sequence tL`i “ tj PN` : j ď iuuiPN` and its limit set N`. Indeed, COFIN contains an infinite number of sequences of setsand their respective limit sets.

One may wonder if a similar result is valid for functions, or even merely characteristic functions inparticular. The following proposition shows that such a general result is not valid for functions.

2.27 Proposition. The class of functions C0 containing the characteristic of the set of natural numbers and thecharacteristics of FIN is identifiable.

Proof. Consider a scientist M that, given a prefix σ, returns the same index for N if xk, 0y R σ for allk P domainpσq and returns the same index for tk : xk, 1y P σu otherwise. Now, if σ is a prefix for a textfor N, then M will always output the smallest index for N, thus identifying N.

Otherwise, suppose σ is a prefix of a text for some set S in FIN. Then there is an s R S such thatxs, 0y will be in the text. For a sufficiently large prefix of the text, xs, 0y will be in the prefix and M

will output an index for a set S1 P FIN such that S1 Ď S. Finally, for a sufficiently large prefix σ ˛ τ ,xk, 1y P content pσ ˛ τq if and only if xk, 1y P S, and so M identifies S. Then, C0 is identifiable.

33

2.2 Identification by general scientist

There exist countless constraints and requirements one may conceivably place on scientists. A ‘gen-eral’ scientist — or simply a ‘scientist’ — is a scientist without any such restrictions, and therefore thecollection of classes which are identified by the school of general scientists represents the upper limitof what may be identifiable in a given learning environment. As we shall see, even scientists on whichno restrictions are applied may still have limited identification power. This limitation therefore stemsexclusively from the nature of inductive inference.

2.2.1 In sets

The collection of all classes of sets that are identifiable in text by the school of all scientists for sets —that is, rS, texts— contains several different classes. For example, the scientist in Algorithm 1.3 identifiesFIN — the class of all finite sets — in text, as is proved by the following proposition.

2.28 Proposition. FIN P rS, texts

Proof. As shown in item 4 of Example 2.18, each set in FIN has an Angluin set, and therefore is identi-fiable by a general scientist in text by Angluin’s Theorem. The scientist defined in Algorithm 1.3 is aconcrete example of a scientist in S which identifies FIN.

Even classes where each set has an infinite number of elements may be identifiable, such as the classof the sets of natural numbers with a single element removed.

2.29 Definition (Ni). The class of sets tN ´ tiu : i P Nu consisting of the natural numbers with a singleelement removed is denoted Ni.

2.30 Proposition. Let Ni “ tN´ tiu : i P Nu be a class of sets. Then Ni P rS, texts

Proof. As stated in item 5 of Example 2.18, for any two sets L and L1 in Ni “ tN´tiu : i P Nu, we alwayshave that L Ć L1 and L1 Ć L. Thus, every set in the class has an Angluin set, and is therefore identifiableby Angluin’s Theorem.

Alternatively, let us verify that the scientist M defined in Algorithm 2.6 identifies Ni.

Algorithm 2.6: Scientist M identifies the class of sets Ni “ tN´ tiu : i P Nu1 Scientist Mpσ P SEQq : N2 variable i P N;3 iÐ the smallest n P N such that n R content pσq;4 return the same index for the set N´ tiu;5 end

Let T be a text for Li “ N´ tiu in C. We will show that M converges to an index for Li on the text T .Suppose σ is a prefix of T such that for all j ă i, j P content pσq. This prefix must necessarily exist,

since the definition of text for a set requires that content pT q “ Li. The output of M on σ is then an indexfor Li. For every prefix τ such that σ Ă τ Ă T , we have Mpσq “ Mpτq, since the smallest element notcontained in the content of τ is i, and M will always return the same index for the set N´ tiu.

Thus, M converges to a correct index for Li, and hence identifies Ni.

It is often the case that adding even a single set to a class makes it unidentifiable. In both of the pre-vious examples, adding the set of natural numbers to the class renders it unidentifiable by any scientistin text.

34

2.31 Proposition (Gold [10]). FIN Y tNu R rS, texts

Proof. The limit set of the class of sets FIN is N. Then by Proposition 2.25, the class FIN Y tNu is notidentifiable.

2.32 Proposition. Ni Y tNu R rS, texts

Proof. The limit set of the class of sets Ni “ tN ´ tiu : i P Nu is N. Again, by Proposition 2.25, the classNi is not identifiable.

Note that while FIN is identifiable and the class consisting of only the set N is also (trivially) identi-fiable (e.g. by the scientist in Algorithm 2.2), the union of both classes is not identifiable. This impliesthat the collection of classes rS, texts is not closed under unions. In general, given an arbitrary school ofscientists and an arbitrary learning environment, it is not true that the collection of classes identifiableby that school in that environment is closed under unions.

2.33 Theorem. The collection rS, texts is not closed under unions.

Proof. We have FIN P rS, texts and tNu P rS, texts but FIN Y tNu R rS, texts, by Proposition 2.31. Thus,rS, texts is not closed under unions.

Whereas the class FIN Y tNu of sets is not identifiable in text, it is identifiable in another learningenvironment.

2.34 Proposition. FIN Y tNu P rS, informants

Proof. We define the scientist M that identifies the class C “ FIN Y tNu in informant in Algorithm 2.7.

Algorithm 2.7: Scientist M identifies the class of sets C “ FIN Y tNu in informant

1 Scientist Mpσ P ISEQq : N2 variable S : set of N;3 if there exists n P N such that xn, 0y P content pσq then4 S Ð tn P N : xn, 1y P content pσq;5 return the same index for the set N´ S;6 else7 return the same index for N;8 end9 end

Let T be an informant for N P C and σ be a prefix of T . M is defined in such a way that if there isno n P N such that xn, 0y P content pσq, then Mpσq will always be the same index for N. Thus, if T is aninformant for N, then M identifies N.

Otherwise, let T be an informant for some N´S P C, where S P FIN is a finite, nonempty set. Let σ bea prefix of T such that, for all n P S, xn, 0y P content pσq. Since S is nonempty, Mpσq will be an index forthe set N´ S. Then, for every prefix τ such that σ Ă τ Ă T , we have Mpσq “Mpτq and so M convergesto an index for N´ S.

Thus, M identifies C.

Informants provide a much more robust learning environment for scientists, enabling the identifica-tion of a much larger class of sets. Indeed, the additional power imparted by informants is enough toenable the identification of the entire class of recursively enumerable sets by a general scientist.

35

2.35 Proposition. E P rS, informants

Proof. We define the scientist M that identifies the class E of all recursively enumerable sets in informantin Algorithm 2.8.

Algorithm 2.8: Scientist M identifies the class E of all recursively enumerable sets in informant

1 Scientist Mpσ P ISEQq : N2 variable i P N;3 variable I, J : sets of N;4 iÐ 0;5 I Ð tn P N : xn, 1y P content pσqu;6 J Ð tn P N : xn, 0y P content pσqu;7 while I ĘWi or J ĘWi do8 iÐ i` 1;9 end

10 return the same index for Wi;11 end

Note that unlike every other example given until now, the scientist defined in Algorithm 2.8 is non-computable, since it must decide in step 7 if a given number belongs to the domain of a partial recursivefunction.

Let L P E be some recursively enumerable set and j be the smallest index for L, i.e. the smallestindex such that Wj “ L. Take an informant T for L and a prefix σ of T . Now, Mpσq “ i will always bethe smallest index such that tn P N : xn, 1y P content pσqu Ď Wi and tn P N : xn, 0y P content pσqu Ď Wi.Thus, for every prefix σ of T , Mpσq ď j. Additionally, for all τ such that σ Ă τ Ă T , we have tn P N :

xn, 1y P content pτqu Ď Wj and tn P N : xn, 0y P content pτqu Ď Wj , and so if Mpσq “ j, M will convergeto j.

Suppose Mpσq “ i ă j, where Wi ‰Wj . Then there is some τ such that σ Ă τ Ă T and either

(a) Wi ĂWj and there exists some x PWj ´Wi such that xx, 1y P τ or

(b) Wi ĆWj and there exists some x PWi ´Wj such that xx, 0y P τ

and so Mpτq ąMpσq. Therefore, M cannot converge to any index i ă j.It follows that M must converge to j for every recursively enumerable set L “Wj , and so M identi-

fies E .

The class of all recursively enumerable sets E is an example of a class which is not identifiable by ageneral scientist in text, but is identifiable in informant. This shows that the collection of classes of setsfor scientists and texts is strictly contained in the collection for scientists and informants.

2.36 Proposition. rS, texts Ă rS, informants

Proof. Every set collection is always a subset of or is equal to E , so rS, texts Ď rS, informants “ 2E . ByPropositions 2.31 and 2.35, it immediately follows that rS, texts Ă rS, informants.

Note that Proposition 2.35 implies that in fact every class of sets is identifiable in informant. This istrue in general: if a class is an element of a collection, then any class that is a proper subset of the firstclass is also an element of the same collection.

2.37 Theorem. Let S1 be a school of scientists for sets or for functions and env some learning environment. If Cis a class such that C P rS’, es and S is a class such that S Ă C, then S P rS’, es

36

Proof. Let C be a class that is identifiable by some school of scientists in some learning environment,and let M be a scientist that identifies C. Given a class S Ă C of sets (or functions) and any set S (orfunction f ) in S, M identifies S (or f , respectively). Then, M identifies every element of S, and thus alsoidentifies S.

Unlike the case of identification in informant, in some situations, a different learning environmentprovides no advantage in identifying power. For general scientists for sets, the collection of classes thatis identifiable in fat text is equal to the collection identifiable in text.

2.38 Proposition (Osherson et al. [21, p. 111]). rS, fat texts “ rS, texts

Proof. It is immediate that rS, texts Ď rS, fat texts, since if a class is identifiable by a scientist in any textfor the class, it is also identifiable in the subset of texts that are fat. To see that fat text does not increasea scientist’s identifying power, consider a class C P rS, fat texts and let M be a scientist that identifies Cin fat text. Now, let L be some set in C and N be a scientist that, for a every prefix σ of size n of a text forL, constructs the prefix of fat text σ1 “ σr1s ˛ σr2s ˛ σr3s ˛ ¨ ¨ ¨ ˛ σrn´ 1s ˛ σ and simulates M on σ1. Sinceσ1 is a prefix of fat text for L, the scientist M converges to a correct index for L. Then N identifies everyset in C and thus also identifies C.

For imperfect text, a general scientist for sets can identify a strictly smaller collection of classes. Thefollowing lemma provides a necessary condition for identifiability in imperfect text, and allows us togiven examples of a several classes that are not identifiable in imperfect text.

2.39 Lemma. Let C be a class of sets such that L P C and L1 P C are finite variants. Then C R rS, imp. texts.

Proof. Let C be a class of sets such that L P C and L1 P C are finite variants, and T be a text (and hence animperfect text as well) for L. On the other hand, T is also an imperfect text for L1, since content pT q “ L

is a finite variant of L1. Now, for any scientist M, either M does not converge for T , or M converges toan index for L or L1, or neither. In any case, M cannot identify either L or L1 in text T , and therefore noscientist can identify C in imperfect text.

Corollaries 2.40 and 2.41 follow immediately from Lemma 2.39.

2.40 Corollary. FIN R rS, imp. texts

2.41 Corollary. Ni R rS, imp. texts

A stronger version of Lemma 2.39 can be demonstrated, which not only requires that L and L1 befinite variants, but also that both L ´ L1 and L1 ´ L be infinite. This implies that for any class of sets Cwhere L P C and L1 P C such that L Ă L1 (and hence L1 ´ L “ ∅), C is not identifiable in imperfect text.

2.42 Definition (Infinite variant class). We say that a class of sets C is an infinite variant class if for everyL P C and L1 P C, both L´ L1 and L1 ´ L are infinite.

We now show that if a class of sets is identifiable in imperfect text, then it is an infinite variant class.

2.43 Proposition (Osherson et al. [21, p. 100]). Let C be a class of sets that is identifiable in imperfect text.Then C is an infinite variant class.

Proof. Let C be a class of sets such that L P C and L1 P C. If both L´L1 and L1´L are finite, then L and L1

are finite variants, and Lemma 2.39 shows that C is not identifiable in imperfect text. Thus, suppose bycontradiction that C is identifiable and without loss of generality that F “ L´L1 is finite and I “ L1´L

is infinite.

37

F I

L L1

Figure 2.4: Two sets L and L1 where F “ L´ L1 is finite and I “ L1 ´ L is infinite

By supposition, L1 is identifiable in imperfect text by some scientist M. By Theorem 2.10, forevery finite set F there is a locking sequence in imperfect text for L1. Let σ be a prefix such that(a) content pσq Ď L1 Y F , (b) WMpσq “ L1, and (c) for all τ P SEQ, if content pτq Ď L1 Y F , thenMpσ ˛ τq “Mpσq.

Now, consider some (not imperfect) text U for L such that content pUq “ L, and let T “ σ ˛ U

be an imperfect text for L, where content pσ ˛ T q “ L Y content pσq. But then for all τ Ă U , we havecontent pτq Ď L Ď L1YF , and so Mpσ ˛ τq “Mpσq. Then, M converges to an index for L1 on text T , andhence does not identify L in T . Thus, M cannot identify C.

Proposition 2.43 allows us to give examples of classes which are not identifiable in imperfect text butwhich Lemma 2.39 is not enough to show so.

2.44 Proposition. Let MULT “ tMi : i P Nu be the class of sets of multiples of natural numbers, whereMi “ tiˆ k : k P Nu. Then MULT R rS, imp. texts.

Proof. We have, for example,M4 ĂM2. Then,M4´M2 is empty (and hence finite), and so by Proposition2.43, MULT is not identifiable in imperfect text.

These results show that imperfect text implies a loss of identifying power for the school of generalscientists for sets.

2.45 Proposition. rS, imp. texts Ă rS, texts

Proposition 2.43 provides not only a necessary condition, but also a sufficient condition of identifia-bility in imperfect text.

2.46 Proposition (Osherson et al. [21, p. 101]). Let C be an infinite variant class of sets. Then C is identifiablein imperfect text.

Proof. Let C be an infinite variant class of sets and L0, L1, . . . be an indexing of the sets of C such thatevery set appears infinitely often. We define the scientist M in Algorithm 2.9 using the function f :

SEQ Ñ N such that fpεq “ 0 and

fpσq “

$

&

%

fpσ´q, if content pσq ´ content pσ´q Ď Lfpσ´q

fpσ´q ` 1, otherwise.(2.3)

Algorithm 2.9: Scientist M identifies an infinite variant class of sets in imperfect text

1 Scientist Mpσ P SEQq : N2 return the smallest index for the set Lfpσq; // f is defined in Equation 2.33 end

Let T be an imperfect text for L P C and j be the smallest index for L. Then there exists a sufficientlylarge prefix σ Ă T such that content pT q ´ content pσq Ď L. Thus, if there exists some prefix τ such thatσ Ď τ Ă T and Mpτq “ j, then M will converge to j for L.

38

It remains to be shown that M will eventually conjecture the index j for some prefix τ such thatσ Ď τ Ă T . Since L appears infinitely often in the indexing L0, L1, . . . , M will eventually conjecture anindex for L on some prefix τ unless there exists some i and ρ, where Li ‰ L and σ Ď ρ Ă T , such thatfpρq “ fpτq “ i for all τ where ρ Ď τ Ă T . But L ´ Li is infinite, and so there must exist some prefix τsuch that content pτq ´ content pρq Ę Li. This implies that fpρq ‰ fpτq, a contradiction, and so no such iand ρ can exist. Thus, M converges to j.

2.47 Corollary. Let C be a class of sets. Then C P rS, imp. texts if and only if C is an infinite variant class.

2.48 Proposition. Let PMULT “ tMi P MULT : i is primeu be the subclass of MULT made up of the multiples ofprime numbers. Then PMULT P rS, imp. texts.

Proof. Let p and q be any two primes. Both sets tp ˆ pk : k P Nu and tq ˆ qk : k P Nu are infiniteand subsets of Mp ´Mq and Mq ´Mp, respectively. Then by Proposition 2.46, PMULT is identifiable inimperfect text.

2.2.2 In functions

Function identification by scientists in text is powerful enough to identify the entire set R of recur-sive functions.

2.49 Theorem (Jain et al. [12, p. 52]). R P rSf , texts

Proof. Let M be the scientist defined in Algorithm 2.10 that outputs the index i for each prefix σ, suchthat φi is a total function and content pσq Ă φi.

Algorithm 2.10: Scientist M identifies the class R of all recursive functions in text

1 Scientist Mpσ P SEGq : N2 return the least index i such that φi is total and content pσq Ă Ψpφiq;3 end

Suppose i is the smallest index for some recursive function φi and T is a text for φi. If there is arecursive function φj ‰ φi with index j and some prefix σ Ă T such that Mpσq “ j ă i, then there is aprefix τ where σ Ă τ Ă T and some x, y, z P N such that xx, yy P content pτq, xx, zy P Ψpφjq, and y ‰ z.Then M cannot converge to any index j ă i, and does converge to index i for φi.

By the previous theorem and Theorem 2.37 we have rSf , texts “ 2R. This also implies that given anytwo classes in rSf , texts, their union is also in rSf , texts.

2.50 Corollary. The collection rSf , texts is closed under unions.

Since rSf , texts Ď rSf , fat texts and R P rSf , texts, it follows that rSf , texts “ rSf , fat texts.

2.51 Corollary. rSf , texts “ rSf , fat texts “ 2R

Similarly, since every informant can be used to computably generate a text, it also follows thatrSf , texts Ď rSf , informants. Again, this shows that rSf , texts “ rSf , informants.

2.52 Corollary. rSf , texts “ rSf , informants “ 2R

As with sets, function identification in imperfect text also implies a loss of identifying power relativeto identification in text. An example of a class of functions which is not identifiable in imperfect text isany class which contains a pair of functions which are finite variants of one another.

39

2.53 Proposition. Let C be a class of functions such that there are two functions f P C and g P C which are finitevariants of each other. Then C is not identifiable in imperfect text.

Proof. Let C be a class of functions and consider two functions f P C and g P C that are finite variants ofeach other. Thus, the set tx P N : fpxq ‰ gpxqu is finite. Then any text T for f is also an imperfect textfor g. But T is also an imperfect text for f , and so no scientist can converge to a correct index for both fand g on T . Therefore, no scientist can identify C in imperfect text.

2.54 Corollary. Let C be a class of sets such that there are two different sets L P C and L1 P C that are finitevariants, and χC be the class of characteristic functions of sets of C. Then χC R rSf , imp. texts.3

2.55 Corollary. χFIN R rSf , imp. texts

2.56 Corollary. χNiR rSf , imp. texts

An example of a class of functions which is identifiable in imperfect text is the class of polynomialfunctions with integer variable and coefficients. We denote the class of recursive functions that representthese polynomials to be POLY.

2.57 Definition (POLY). Let p be a polynomial function of type p : Z Ñ Z such that ppxq “ anxn `

an´1xn´1 ` ¨ ¨ ¨ ` a1x ` a0, for all x P Z and where n, an, . . . , a0 P Z. The class of functions POLY is

the set of all recursive functions that encode some polynomial p. Thus, φi P POLY just in case there isa polynomial function p with integer coefficients such that z´1pφipnqq “ ppz´1pnqq for all n P N, wherez´1 : NÑ Z is a decoding of Z.

To simplify notation, it will be convenient to refer to elements of POLY with some degree of ambiguity.For example, if φi P POLY encodes the polynomial f , we may say that i is an index for p, even though p isnot a partial recursive function. Additionally, we may say that φi is a polynomial function of degree n tomean that φi is a recursive function which encodes the polynomial p of degree n according to Definition2.57, or that an element of a text for φi is a point px, ppxqq of the graph of the polynomial instead ofsaying that xzpxq, φipzpxqqy represents the the point px, ppxqq of polynomial p. Despite this, texts andprefixes for POLY are still officially comprised solely of natural numbers, and the reader may verify thatany abuse of notation can be easily made more precise without any impact to rigour.

2.58 Proposition. POLY P rSf , imp. texts

Proof. Let M be the partial scientist defined in Algorithm 2.11 that finds the interpolating polynomialfor the last half of σ.

Algorithm 2.11: Scientist M identifies the class POLY of polynomials in imperfect text

1 Scientist Mpσ P SEGq : N2 variable n, i P N;3 nÐ t|σ|{2u;4 if there exists some f P POLY that is the interpolating polynomial of the set of points

tpx, yq : xx, yy P content pσr´nsq ´ content pσrnsqu then5 return the same index i such that φi “ f ;6 end7 end

3It is worth noting that in some cases, there is no index i such that φi is a recursive function which is the characteristic functionof a set. An example of such a set is K in Definition 2.62, which is not a decidable set and so, by Definition 1.8, does not have arecursive characteristic function. Thus, the class of characteristic functions of sets of K˚ (defined in Proposition 2.63) is not onlynot identifiable in imperfect text, it is also not identifiable in informant, since χK˚ ĘR.

40

Let T be an imperfect text for a polynomial p P POLY of degree k. For a sufficiently large prefix σ Ă T ,every ‘incorrect point’ in T will be in σrns, where n is half of |σ|. Then txx, yy P content pT q : xx, yy R

Ψpfqu Ď content pσrnsq. For such a prefix σ, the set content pσr´nsq´content pσrnsq only contains pointswhich are in the function f . When this set contains k ` 1 distinct points, M will find a polynomialfunction with integer coefficients and return an index for it. For every point added to the set of points,M can verify that the interpolating polynomial remains the same, and therefore does not change itshypothesis. Thus, M converges to a correct index for f and identifies POLY.

2.2.3 Comparison of collections of classes for general scientists

rS, informants “ 2E

rS, texts “ rS, fat texts

rS, imp. texts

E

FIN Y tNu

FINNi

MULTNi Y tNu

PMULT

Figure 2.5: Collections of classes of sets,categorized by general scientists

rSf , texts “ rSf , fat texts “ rSf , informants “ 2R

rSf , imp. texts

RχFIN

χNi

POLY

Figure 2.6: Collections of classes of functions,categorized by general scientists

2.3 Identification by computable scientist

The results put forth in the previous section present a very promising picture for function and setidentification by scientists. In particular, as was shown in Theorem 2.49, there is a scientist that is able toidentify the class R of all recursive functions in text. We must note, however, that the scientist definedin Algorithm 2.10 is not computable, and therefore cannot be simulated by a Turing machine.

Attempts made at developing artificial ‘scientists’ involve using computers or similar machines toautomatically learn or identify scientific laws (see, for example, Langley et al. [16]). Indeed, even ma-chines such as quantum computers cannot perform any operation that may not also be performed bya Turing machine. Thus, if we wish to analyze these machines, it is natural to impose a computabilityrestriction on our study of scientists.

On the other hand, it is also probable that even the inductive learning process performed by humanscientists is computable. Even if the entirety of the human mind could not be replicated by a computer,the scientific method performed by human beings is in all likelihood computable, since there is noevidence to suggest that humans can uniformly solve the halting problem through any sort of reasoning.

It is therefore clear that computable scientists are a subject of great interest, and that computabilityis at least a very desirable property of our learning functions. We thus define a computable scientist asfollows.

2.59 Definition (Computable scientist). A scientist is said to be computable if it can be simulated by adeterministic Turing machine. We denote the school of computable scientists for sets by SCP and theschool of computable scientists for functions by SCP

f .

41

The frequent study of computable scientists leads to a more abbreviated notation for the collection ofsets and functions that are identifiable by computable scientists which is commonly used in literature.4

For consistency, we will favor the bracketed notation we have used so far, but may use both notationsinterchangeably.

2.60 Definition (TXTEX and EX).

1. The collection of classes of sets that are identifiable by a computable scientist in text is TXTEX.Hence, TXTEX “ rSCP, texts. If a set or a class of sets is identifiable by a computable scientist intext, we say that it is TXTEX-identifiable.

2. The collection of classes of functions that are identifiable by a computable scientist in text is EX.Hence, EX “ rSCP

f , texts. If a function or a class of functions is identifiable by a computable scientistin text, we say that it is EX-identifiable.

The definitions given in the previous chapter do not require that a scientist be defined on every prefix.However, we may limit our study exclusively to scientists who are defined for every input without anyloss of generality. Such scientists which are represented by total functions are called total scientists.

2.61 Proposition. Let M0,M1, . . . ,Mi, . . . be an enumeration of all computable scientists. Then there exists a(total) recursive function f : NÑ N such that, for all i P N: (a) Mfpiq is a total scientist and (b) if Mi identifiesthe (total) recursive function φ, then Mfpiq also identifies φ.

Proof. We consider any (either total or partial) of the scientists Mi, assuming that it identifies a givenfunction φ, and we algorithmically specify a new total scientist Mfpiq that identifies φ.

Let T be any text for φ. On input T rts, the new scientist Mfpiq calls the scientist Mi to run for tsuccessive steps on all the prefixes of T rts. Then the scientist Mfpiq takes the longest prefix σ̃ of T rts forwhich Mi output an hypothesis in t steps. If no hypothesis has been produced in t steps for any prefix,the new scientist Mfpiq outputs 0, otherwise it outputs Mipσ̃q.

Let us now suppose that T is a text for a function φ. Then, there exists an order p P N such thatscientist Mi on input T rps converges to the final hypothesis, and there exists an order q ě p such that,for j ě q, j steps are enough for Mi to converge on T rps. Thus, after order q, Mfpiq outputs the finalhypothesis on all prefixes of text T of size greater than or equal to q.

2.3.1 In sets

Many sets that are identifiable by a general scientist are also identifiable by a computable sci-entist. For example, the scientist in Algorithm 1.3 that identifies FIN is clearly computable, and soFIN P rSCP, texts. However, there do exist sets which are identifiable by a general scientist but are notcomputably identifiable. We present such a class of sets in Proposition 2.63.

2.62 Definition (Set K and K˚). The set ti P N : i PWiu is denoted K.

The setK is a well-known example of a nondecidable, recursively enumerable set, as is demonstratedin Section 5.3 of the Appendix.

Observe that even though the set K is not decidable, it is nonetheless trivially identifiable by a com-putable scientist who, for any given input, returns the same index for K. Indeed, any class containinga single set or function is always identifiable by a constant scientist that returns the same index for theset or function. However, if we consider the class of sets K˚ such that for all x P N, K Y txu P K˚, thenthere is no computable scientist that can distinguish between every set in K˚.

4These terminologies are due to Case and Lynes [4] and used extensively in Jain et al. [12].

42

2.63 Proposition (Osherson et al. [21, p. 48]). Let K˚ “ tK Y txu : x P Nu. Then K˚ P rS, texts butK˚ R rSCP, texts, i.e. K˚ is identifiable in text by a noncomputable scientist but not by a computable scientist.

Proof. Let k be an index for the set K and, for each K Y txu ‰ K, let kx be an index for K Y txu. Theclass of sets K˚ “ tK Y txu : x P Nu is trivially identified by a noncomputable scientist M which, givena prefix σ of a text T for a set in K˚, is defined such that

Mpσq “

$

&

%

k, if content pσq Ă K

kx, otherwise, where x is the smallest element of content pσq ´K.

Since there is at most one x P content pT q ´K, the scientist M correctly identifies K˚.Now suppose there exists a computable scientist N that identifies K˚, and let σ be a locking sequence

for N on K.Let k0, k1, k2, . . . be a fixed, recursive enumeration of the elements of K. For every x P N, consider

the text T x “ σ ˛ x ˛ k0 ˛ k1 ˛ k2 ˛ . . . . Then

for all x P K, T x is a text for K, and

for all x P K, T x is a text for K Y txu, and K Y txu P K˚.

Suppose that x P K. Then NpT xrnsq “ Npσq for all n ą |σ|. On the other hand, if x P K, then there issome n ą |σ| such that NpT xrnsq ‰ Npσq. Thus, we have that x P K if and only if there is n ą |σ| suchthat NpT xrnsq ‰ Npσq.

Using N, we can construct the function ψ such that

ψpxq “ least n ą |σ| such that NpT xrnsq ‰ Npσq.

Since k0, k1, k2, . . . is a recursive enumeration, it follows that ψ is partial recursive. Now ψ has do-main K and so K is recursively enumerable. But K is also recursively enumerable, and consequentlydecidable, a contradiction with Lemma 5.4 in the Appendix. Thus, K˚ is not identifiable by any com-putable scientist.

2.64 Proposition. rSCP, texts Ă rS, texts

2.65 Corollary. FIN Y tNu R rSCP, texts

Note that while K˚ and FIN Y tNu are both unidentifiable by a computable scientist in text, thereason for each case is distinct. In the former class, K˚ presents a problem that is too computationallycomplex for a computable scientist to solve, whereas the computational nature of the class FIN Y tNuposes no issue for identification. Instead, it is the informational complexity of FIN Y tNu that renders itunidentifiable, since there is no finite prefix that can distinguish between a finite set and the set N of allnatural numbers.

This computational barrier continues to persist even for computational scientists operating on infor-mant.

2.66 Proposition (Osherson et al. [21, p. 48]). E R rSCP, informants

Proof. Suppose there is a computable scientist M P SCP that identifies the class E of all recursivelyenumerable sets. We will show that there is a set L P E and a text T for L which M does not converge

43

on, and therefore cannot identify. For the construction of T , we rely on the following claim:

for every prefix of informant σ “ x0, b0y, . . . , xn, bny, where bi P t0, 1u,

there are integers j and k such that if τ “ σ ˛ xn` 1, 0y ˛ ¨ ¨ ¨ ˛ xn` j, 0y and

τ 1 “ τ ˛ xn` j ` 1, 1y ˛ ¨ ¨ ¨ ˛ xn` j ` k, 1y, then Mpτq ‰Mpτ 1q.

(2.4)

To verify that this claim is true, for every σ P ISEQ, let L0 be the finite, recursively enumerable setL0 “ tn P N : xn, 1y P content pσqu. Now, M identifies L0, and so it identifies the text σ ˛ xn ` 1, 0y ˛

¨ ¨ ¨ ˛ xn ` j, 0y ˛ . . . for L0. Then, there is some j P N such that if τ “ σ ˛ xn ` 1, 0y ˛ ¨ ¨ ¨ ˛ xn ` j, 0y,Mpτq is an index for L0. On the other hand, τ ˛ xn ` j ` 1, 1y ˛ ¨ ¨ ¨ ˛ xn ` j ` k, 1y ˛ . . . is a text for therecursively enumerable set L1 “ L0 Y tm P N : m ě n ` j ` 1u. Thus, if M identifies both L0 and L1,there is some integer k such that if τ 1 “ τ ˛ xn` j ` 1, 1y ˛ ¨ ¨ ¨ ˛ xn` j ` k, 1y, then Mpτ 1q is an index forL1 and Mpτq ‰Mpτ 1q.

We construct successive prefixes σi of an informant T . Let σ0 “ x0, 1y, and for each i P N, we buildσi`1 using σi such that σi`1 “ τ 1 defined in Equation 2.4. Now, T “

Ť

i σi is a text for some L P E , but

M cannot converge on T , since it must change its hypothesis for each σi.

Since by Proposition 2.35 we have E P rS, informants, it follows from the previous proposition thatrSCP, informants Ă rS, informants.

2.67 Corollary. rSCP, informants Ă rS, informants

On the other hand, informants do allow for computable scientists to identify classes which were notidentifiable by computable scientists in text.

2.68 Proposition. FIN Y tNu P rSCP, informants

Proof. The reader may verify that the scientist that identifies FINYtNu in informant which was specifiedin Algorithm 2.7 is computable.

2.69 Corollary. rSCP, texts Ă rSCP, informants

Proof. If a class of sets is identifiable by a computable scientist in text, then it is also identifiable by a com-putable scientist in informant, since texts provide no additional information compared to informants.Therefore, rSCP, texts Ď rSCP, informants. By Proposition 2.68 we have FIN Y tNu P rSCP, informants, but byCorollary 2.65 we have FIN Y tNu R rSCP, texts. Thus rSCP, texts Ă rSCP, informants

2.70 Corollary. rSCP, informants Ę rS, texts

As was the case with general scientists, fat text does not provide a greater identifying power forcomputable scientists for sets. The reader may easily verify that the steps described in Proposition 2.38are all computable and that the case for computable scientists constitutes a simple readaptation of thatproof.

2.71 Proposition. rSCP, fat texts “ rSCP, texts

As with general scientists, imperfect text also strictly reduces the number of classes of sets a com-putable scientist may identify.

2.72 Proposition. FIN R rSCP, imp. texts

Proof. It is clear that rSCP, imp. texts Ď rS, imp. texts. By Corollary 2.40, we have FIN R rS, imp. texts andtherefore FIN R rSCP, imp. texts.

44

2.73 Corollary. rSCP, imp. texts Ă rSCP, texts

Proof. It is clear that rSCP, imp. texts Ď rSCP, texts. Now, the reader may easily verify that the scien-tist specified in Algorithm 1.3 is computable and identifies FIN. By Proposition 2.72, it follows thatrSCP, imp. texts Ă rSCP, texts.

An example of a class of sets which is identifiable by a computable scientist in imperfect text isPMULT.

2.74 Proposition. PMULT P rSCP, imp. texts

Proof. We specify the computable scientist M in Algorithm 2.12 and show that identifies PMULT in im-perfect text.

Algorithm 2.12: Computable scientist M identifies the class of sets PMULT in imperfect text

1 Scientist Mpσ P ISEQq : N2 variable p P N;3 pÐ maxtx P N : xx, 1y P content pσq and x “ pk where p is prime and k P Nu;4 return the same index for the set Mp;5 end

Let T be an imperfect text for the set Mp P PMULT. Then there are an infinite number of elements pi

in Mp for i P N. For a sufficiently large prefix σ of T , the largest power of a prime in content pσq willalways be an element of Mp. Because p is prime and i P N`, pi is an element of exactly one set in PMULT,and so M converges to a correct index for the set Mp.

2.3.2 In functions

Computable identification of classes of functions is one of the most relevant topics for the goal ofdeveloping automatic learning machines. In Section 2.2, it was shown that the class R of all recursivefunctions is identifiable in text by a single general scientist of simple construction. However, the scientistdefined in Algorithm 2.10 is not computable since there is no mechanical way for deciding whether agiven number is an index for a (total) recursive function. One may wonder if some other scientist is ableto identify R and still be computably constrained. The results in this section give a negative answer tothis question.

Function identification by computable scientists can be facilitated by restricting our analysis to textsof a particular kind.

2.75 Definition (Canonical text, Jain et al. [12, p. 69]). Let T be a text for a recursive function f . We saythat T is a canonical text just in case T pnq “ xn, fpnqy for every n P N.

Canonical text is intuitively an ‘ordered text’ where every element of the graph of a function is givenin order and in which each element of the text appears a single time. The following proposition showsthat it is enough to only consider canonical texts in the computable identification of functions.

2.76 Proposition (Jain et al. [12, p. 70]). Let S be a class of recursive functions. Then S P rSCPf , texts if and

only if there is a computable scientist that identifies any function in S in canonical text.

Proof. The necessary condition for S P rSCPf , texts is immediate, since if there is a computable scientist

which identifies S in text, it a fortiori identifies S in only those texts which are canonical.

45

Algorithm 2.13: Computable scientist N identifies the class of sets S in text

1 Scientist Npσ P SEGq : N2 variable τ P SEG;3 τ Ð the longest prefix such that for all i ă |τ |, τi “ xi, jy P content pσq;4 return the index returned from simulating M on prefix τ ; // Computable scientist M

identifies S in canonical text5 end

To see that it is enough for a computable scientist to consider only those texts which are canonical,suppose there is a scientist M that identifies S in canonical text. Let N be the computable scientistdefined in Algorithm 2.13.

Let T be a text for a recursive function f . Since f is a total function, it follows that for every i P N,there is always some prefix σ Ă T such that xi, fpiqy P content pσq. All the steps performed by N arecomputable (including the simulation of M), and since M converges to a correct index for every functionin S in text, N also does so in text.

An example of two simple classes of functions which are identifiable by a computable scientist intext is SD and AEZ.

2.77 Proposition. Let SD “ tφi P R : φip0q “ iu be the class of functions that are self-defining. ThenSD P rSCP

f , texts.

Proof. By Proposition 2.76, it is enough to only consider canonical texts in order to show that SD isidentifiable by a computable scientist in text. Then, let M be the scientist that on input σ returns theindex i, where i is such that σ0 “ x0, iy. Then M identifies SD.

2.78 Proposition. Let AEZ “ tφi P R : there exists a finite set D Ă N such that φipxq “ 0 for all x R Du bethe class of functions that are almost everywhere zero. Then AEZ P rSCP

f , texts.

Proof. Let M be defined in Algorithm 2.14.

Algorithm 2.14: Computable scientist M identifies the class of sets AEZ in text

1 Scientist Mpσ P SEGq : N2 variable D : set of N;3 D Ð txxi, yiy P content pσq : xi P N and yi ‰ 0u;4 return the same index for the function f such that D belongs to the graph of f and fpxq “ 0 for every

x P N such that xx, yy R D, where y ą 0;5 end

Let f be a function in AEZ and T be a text for f . Because f is different from zero in a finite set ofpoints, the set Df “ txx, fpxqy P content pT q : x P N and fpxq ‰ 0u is also finite. Then, for a sufficientlylarge prefix, the set D defined in line 3 of Algorithm 2.14 will converge to Df , and thus M will alwaysreturn the same index for the function f .

Although both SD and AEZ are EX-identifiable (i.e. identifiable in text by a computable scientist forfunctions), their union is not. This result is called the Nonunion Theorem.

2.79 Theorem (The Nonunion Theorem, Blum and Blum [2]). SD Y AEZ R rSCPf , texts.

Proof. Suppose that the computable scientist M identifies AEZ. By Proposition 2.61, we may assumewithout loss of generality that M is total. We show that a function φe P R exists such that φe P SD (andso φep0q “ e), but which M fails to identify.

46

We begin by constructing an auxiliary partial recursive function f in Algorithm 2.15 which receivesas input a pair of natural numbers.

Algorithm 2.15: f is a partial recursive function of type Nˆ NÑ N1 Function fpe P N, n P Nq : N2 variable σ, τ P SEG;3 σ Ð x0, ey;4 while |σ| ă n do5 τ Ð the least prefix of canonical text such that σ Ă τ and Mpτq ‰Mpσq;6 σ Ð τ ;7 end8 return τ̂pnq;9 end

Note that for any given σ, we may always effectively perform an exhaustive search for some τ forwhich line 5 in Algorithm 2.15 must be true. To check this, suppose by contradiction that, for all τ suchthat σ Ă τ , Mpτq “ Mpσq. But then M could only identify at most one function whose canonical textbegins with x0, ey, and therefore could not identify AEZ.

Now, by Kleene’s Recursion Theorem,5 there must exist some e such that, for all n, φepnq “ fpe, nq,and so φe P SD. But then by the construction of f , M cannot converge on canonical text for φe, and socannot identify SD Y AEZ.

2.80 Corollary. R R rSf , texts

2.81 Corollary. rSCPf , texts Ă rSf , texts

This result shows that enforcing a computability constraint on scientists for functions in text impliesthat a strictly smaller collection of classes is identifiable. Nonetheless, many complex functions are stillidentifiable by a computable scientist in text. An example of such a class is PR, the class of all primitiverecursive functions with domain and range in N.

2.82 Proposition. PR P rSCPf , texts

Proof. Let ϕ0, ϕ1, . . . be a recursive indexing of PR (see Meyer and Ritchie [19]). Recall that a primitiverecursive function is a total function that can be constructed through the composition of simple opera-tions, as described in Definition 1.4. These operations are all computable, and we may therefore easily(and computably) find a recursive function φi with index i for each primitive recursive function ϕj suchthat φi “ ϕj . Then, let M be the scientist defined in Algorithm 2.16.

Let σ be a prefix of a text T for some primitive recursive function ϕ. Given that primitive recursivefunctions are total, it is clear from the specification of M that Mpσq “ i if and only if i is an index forthe first primitive recursive function ϕk in the indexing of PR such that content pσq is in the graph ofϕk. Thus, if M ever conjectures an index for ϕ, it will converge to the index i for ϕ. On the other hand,it cannot converge to any index for some primitive recursive function ϕk ‰ ϕ since there is some prefixσ Ă T and an integer n ă |σ| such that ϕkpπ1pσnqq ‰ π2pσnq. Then M must converge to an index forϕ.

One may ask whether a different learning environment such as fat text or informant enables com-putable scientists to identify a greater collection of classes and perhaps make up for their loss of identi-fying power. The following propositions show that this question has a negative answer.

5Kleene’s Recursion Theorem [12, p. 21] states thatFor each partial recursive function ψ, there is a partial recursive function φe such that, for all x, φepxq “ ψpe, xq, and that there is a

uniform, effective procedure for finding e (that is, there is a recursive function r such that, for each p, φrppq “ λx . φpprppq, xq).

47

Algorithm 2.16: Scientist M identifies the class PR of all primitive recursive functions in text

1 Scientist Mpσ P SEGq : N2 variable j, n P N;3 j Ð 0;4 nÐ 0;5 while n ă |σ| do6 if ϕjpπ1pσnqq “ π2pσnq then7 nÐ n` 1;8 else9 nÐ 0;

10 j Ð j ` 1;11 end12 end13 return the same index i for ϕj ;14 end

2.83 Proposition. rSCPf , fat texts “ rSCP

f , texts

Proof. It is immediate that rSCPf , texts Ď rSCP

f , fat texts. On the other hand, it is clear that rSCPf , fat texts Ď

rSCPf , texts, since fat text provides no additional advantage over canonical text by Proposition 2.76.

2.84 Proposition. rSCPf , informants “ rSCP

f , texts

Proof. Let S be a class of functions and M be a computable scientist that identifies S in canonical text.Consider the computable scientist N defined in Algorithm 2.17.

Algorithm 2.17: Computable scientist N identifies the class of functions S in informant

1 Scientist Npσ P ISEGq : N2 variable τ P SEG;3 τ Ð the longest prefix such that for all i ă |τ |, τi “ xi, jy and xxi, jy, 1y P content pσq;4 return the index returned from simulating M on prefix τ ; // Computable scientist M

identifies S in canonical text5 end

Let T be a text for a function f P S. Since f is a total function, it follows that for every i P N,there is always some prefix σ Ă T such that xxi, fpiqy, 1y P content pσq. All the steps performed by N arecomputable (including the simulation of M), and since M converges to a correct index for every functionin S in canonical text, N also does so in informant. Since it is enough to only consider canonical text foridentification in text, we have rSCP

f , texts Ď rSCPf , informants.

On the other hand, let S be a class of functions and N be a computable scientist that identifies Sin informant. For a fixed prefix σ of canonical text, let τ be a prefix of informant obtained from thecomputable function g defined in Algorithm 2.18.

Observe that for a prefix σ of canonical text of length n, gpσq is a prefix of informant containingn2 elements such that for all i ă |σ| and j ă |σ|, we have xxi, jy, 1y P content pgpσqq if and only ifxi, jy P content pσq, and xxi, jy, 0y P content pgpσqq if and only if xi, jy R content pσq.

We define the scientist M in Algorithm 2.19 that uses N to identify S in canonical text.Let σ be a prefix of a canonical text T for a function f P S. The scientist M computably generates a

prefix ρi P ISEG such that ρi “ gpσr0sq ˛ ¨ ¨ ¨ ˛ gpσrnsq, where n is the length of σ. This prefix is indeed a

48

Algorithm 2.18: Function g constructs a prefix τ of informant from a prefix σ of canonical textData: prefix σ of a canonical textResult: prefix τ of an informant

1 Function gpσ P SEGq : ISEG2 variable τ P ISEG;3 variable n,m P N;4 nÐ 0;5 while n ă |σ| do6 mÐ 0;7 while m ă |σ| do8 if m ‰ π2pσnq then9 τ Ð τ ˛ xxn,my, 0y;

10 else11 τ Ð τ ˛ xxn,my, 1y;12 end13 mÐ m` 1;14 end15 nÐ n` 1;16 end17 return τ ;18 end

Algorithm 2.19: Computable scientist M identifies the class of functions S in canonical text

1 Scientist Mpσ P SEGq : N2 variable ρ P ISEG;3 variable n P N;4 ρÐ ε;5 nÐ 0;6 while n ă |σ| do7 ρÐ ρ ˛ gpσrnsq; // Function g is defined in Algorithm 2.188 nÐ n` 1;9 end

10 return the index returned from simulating N on prefix ρ; // Computable scientist Nidentifies S in informant

11 end

prefix of an informant U for f , since ρi Ă ρi`1 Ă U for all i P N,6 and

xn,my P Ψpfq if and only if there is some prefix ρ such that xxn,my, 1y P content pρq , and

xn,my R Ψpfq if and only if there is some prefix ρ such that xxn,my, 0y P content pρq .

Thus, U “Ť

iPN ρi is a prefix of an informant for f that is computably generated by M, and since

Mpσq “ Npρq and N computably identifies f in informant, then M computably identifies f in canonicaltext. Thus rSCP

f , informants Ď rSCPf , texts and finally rSCP

f , informants “ rSCPf , texts.

As with general scientists, imperfect text also represents a loss of identifying power for computablescientists.

2.85 Proposition. rSCPf , imp. texts Ă rSCP

f , texts

Proof. It is enough to consider the class AEZ of functions that are almost everywhere zero. By Proposition2.78 we have AEZ P rSCP

f , texts. However, for any two functions f and g in AEZ, the text T for the function

6Note that it is not true that gpσrisq Ă gpσri` 1sq, and therefore each gpσrisq is a prefix of a different informant for f .

49

λx . 0 is an imperfect text for both f and g, since they are all finite variants. Therefore, there can be noscientist which identifies AEZ.

The class POLY is an example of a class which is identifiable in imperfect text, as the reader can checkby verifying that the scientist defined in Algorithm 2.11 is computable.

2.86 Proposition. POLY P rSCPf , imp. texts

2.3.3 Comparison of collections of classes for computable scientists


rSCP, informants

rSCP, texts “ rSCP, fat texts

rSCP, imp. texts

E

FIN Y tNu

FINNi

PMULT

Figure 2.7: Collections of classes of sets,categorized by computable scientists

rSf , texts “ 2R

rSCPf , texts “ rSCP

f , fat texts “ rSCPf , informants

rSCPf , imp. texts

RSD Y AEZ

PRSDAEZ

POLY

Figure 2.8: Collections of classes of functions,categorized by computable scientists

2.4 Identification by memory-limited scientist

It seems clear that a scientist — whether it be of human or of mechanical nature — must have alimited memory for the list of observations or experiments with which it is presented. As soon as eachobservation has finished being taken into consideration and possibly triggered a change in hypothesis,it is forgotten and erased from the scientist’s memory. In this section, we consider the possible loss inidentifying power imposed by such a limitation.

2.87 Definition (Memory-limited scientist, Wexler and Culicover [27]). A scientist M is said to bememory-limited just in case for all nonempty prefixes σ, τ P PREF, if Mpσ´q “ Mpτ´q and σlast “ τlast,then Mpσq “Mpτq. We denote the school of memory-limited scientists for sets by SML and the school ofmemory-limited scientists for functions by SML

f .

σ

τ

Mpσq “Mpτq

ρ

Mpσ ˛ ρq “Mpτ ˛ ρq

Figure 2.9: The memoryless property of memory-limited scientist M

The concept of a memory-limited scientist is akin to the one of memorylessness commonly used instatistics and probability theory. Intuitively, a memory-limited scientist is one whose future hypotheses

50

depend only future observations and on the scientist’s present state. Consequently, given prefixes σ, τ ,and ρ for a set S or function f , if a memory-limited scientist M who produces the same hypothesis forσ and τ , then M will also produce the same hypothesis for σ ˛ ρ and τ ˛ ρ, as illustrated in Figure 2.9.

A useful way of specifying a scientist which guarantees that it is memory-limited is to use a Markovscientist.

2.88 Definition (Markov scientist and memory function). Let s : N2 Ñ N be a recursive, one-to-onefunction such that, for all i,m P N, ωpi,mq is an index forWi (or φi, depending on context).7 Additionally,let ω1 : NÑ N and ω2 : NÑ N be the inverse projections of ω, such that ω1pkq “ i and ω2pkq “ m if andonly if ωpi,mq “ k.

We say that a scientist M is a Markov scientist just in case there is a function µ : pNY t#uq ˆN2 Ñ N2

(called the memory function of M) such that, for all σ P PREF,

(a) Mpσq “ ωpµ0q, where µ0 “ pi0,m0q and i0,m0 P N, if σ “ ε, and

(b) Mpσq “ ωpµpσlast, i,mqq, where i “ ω1pMpσ´qq and m “ ω2pMpσ

´qq, if σ ‰ ε.

The memory function µ of a Markov scientist M may be thought of as the process through whicha memory-limited scientist obtains a new hypothesis. It takes three inputs — σlast, i, and m — wherei represents the conjecture for the set or function that M gives on σ´ and m acts as the ‘memory’ usedby M. Intuitively, a Markov scientist M only converges if both its conjecture i and its memory m alsoconverge to an integer. Thus, M identifies a set L or a function f if and only if it converges to ωpi,mq,where Wi “ L or φi “ f , respectively.

A necessary (but not sufficient) condition for the convergence of a Markov scientist is the existenceof a ‘fixed point’ of its memory function.

2.89 Definition (Fixed point). Let µ be the memory function of a Markov scientist. We say that pi,mq isa fixed point of µ just in case there is some n P N such that µpn, i,mq “ pi,mq.

2.90 Lemma. If a Markov scientist M identifies some set or function, then its memory function µ has at least onefixed point.

Proof. Suppose, by contradiction, that a Markov scientist has a memory function µ with no fixed points.Then for all n P N, we have µpn, i,mq ‰ pi,mq. Then the hypothesis ωpi,mq returned by M is alwaysdifferent and so it cannot converge.

The following theorem shows that the definition of a Markov scientist is equivalent to the one of amemory-limited scientist.

2.91 Theorem. Let C be a class of sets or functions. Then there is a memory-limited scientist M that identifies Cin text if and only if there is a Markov scientist K that identifies C in text.

Proof. The right-to-left implication is easy to verify, as any Markov scientist K is also memory-limited.To check this, let σ and τ be any two nonempty prefixes such that Kpσ´q “ Kpτ´q and σlast “ τlast.Thus, we have

Kpσq “ ωpµpσlast, ω1pKpσ´qq, ω2pKpσ

´qqqq

“ ωpµpτlast, ω1pKpτ´qq, ω2pKpτ

´qqqq

“ Kpτq,

and therefore K is a memory-limited scientist.

7Such an ω must exist by the s-m-n Theorem.

51

Now, suppose a class C is identifiable in text by some memory-limited scientist M. Without loss ofgenerality, let M be a total scientist. Let g : PREF Ñ N be a recursive bijection that encodes each prefixσ P PREF into a unique positive integer.8 We define the memory function µ of the Markov scientist Ksuch that µ0 “ pMpεq, gpεqq and

µpσlast, i,mq “

$

&

%

pi,mq, if Mpτ ˛ σlastq “ i, where τ “ g´1pmq

pj, gpτ ˛ σlastqq, if Mpτ ˛ σlastq “ j ‰ i, where τ “ g´1pmq.(2.5)

Let us now consider some set or function in C and let T be a text for it. We must show that K

identifies T .Let σ “ T rns be a prefix of T . We will prove by induction on the length n of σ that there exists some

m P N such that Kpσq “ ωpMpσq,mq and Mpσq “Mpg´1pmqq.For n “ 0, we have σ “ ε. Then, takingm “ gpεqwe get Kpεq “ ωpµ0q “ ωpMpεq, gpεqq “ ωpMpεq,mq

and Mpεq “ Mpg´1pgpεqqq “ Mpmq. Thus, by the definition of ω, Kpεq and Mpεq are indexes for thesame set or function.

Now, suppose that for some σ, there is some m P N such that Kpσ´q “ ωpMpσ´q,mq and Mpσ´q “

Mpτq, where τ “ g´1pmq. Then, we have

Kpσq “ ωpµpσlast, ω1pKpσ´qq, ω2pKpσ

´qqqq

“ ωpµpσlast,Mpσ´q,mqq (by the induction hypothesis)

“ ωpµpσlast,Mpτq,mqq (by the induction hypothesis)

“

$

&

%

ωpMpτq,mqq, if Mpτ ˛ σlastq “Mpτq

ωpMpτ ˛ σlastq, gpτ ˛ σlastqqq, if Mpτ ˛ σlastq ‰Mpτq.(by Equation 2.5) (2.6)

Suppose Mpτ ˛ σlastq “Mpτq. By the induction hypothesis, Mpτ ˛ σlastq “Mpτq “Mpσ´q. Since M

is memory-limited, it follows that Mpτ ˛ σlast ˛ σlastq “Mpτ ˛ σlastq “Mpσ´ ˛ σlastq, and consequentlyMpτq “Mpσq. Thus, if Mpτ ˛ σlastq “Mpτq, then Kpσq “ ωpMpσq,mqq.

On the other hand, we have Mpτq “ Mpσ´q by the induction hypothesis, and since M is memory-limited, it follows that Mpτ ˛ σlastq “ Mpσ´ ˛ σlastq “ Mpσq. Thus, ωpMpτ ˛ σlastq, gpτ ˛ σlastqqq “

ωpMpσq, gpσqqq.Substituting in Equation 2.6, we obtain

Kpσq “

$

&

%

ωpMpσq,mqq, if Mpτ ˛ σlastq “Mpτq

ωpMpσq, gpσqqq, if Mpτ ˛ σlastq ‰Mpτq.

This proves that for every prefix σ of T , we have that both Kpσq and Mpσq produce an index for thesame set or function. Finally, we must verify that K converges to a single index and does not continuallyreturn different hypotheses for the same set or function. Indeed, it is clear from Equation 2.5 that if Midentifies the text T for a set or function in C, then M converges to some index i, and therefore K

converges to the index ωpi,mq for the same set or function.

8For instance, let g be such that gpεq “ 0, and for all nonempty prefixes σ P PREF of length n` 1,

gpσq “ 2σ0`1 ˆ 3σ1`1 ˆ 5σ2`1 ˆ ¨ ¨ ¨ ˆ pσn`1n ´ 1,

where pn is the n-th prime.

52

2.4.1 In sets

The restriction of memory limitation on scientists for sets in text provokes a real loss of identifyingpower when compared to general scientists for sets in text. The following proposition gives an exampleof a class of sets that is not identifiable by any memory-limited scientist.

2.92 Proposition. Let the set L “ tx0, xy : x P Nu and, for each n P N, the sets Ln “ L Y tx1, nyu andL1n “ Ln ´ tx0, nyu. Consider the class of sets LL “ tLu Y tLn : n P Nu Y tL1n : n P Nu. Then LL P rS, textsbut LL R rSML, texts.

Proof. Let M be the scientist for sets such that, given a prefix σ of text, M returns

Mpσq “

$

’

’

’

’

’

’

’

’

’

&

’

’

’

’

’

’

’

’

’

%

the same index for L, if content pσq Ă L

the same index for Ln, if n is the smallest integer such that x1, ny P content pσq

and x0, ny P content pσq

the same index for L1n, if n is the smallest integer such that x1, ny P content pσq

and x0, ny R content pσq .

Clearly, M identifies LL.On the other hand, suppose that N is a memory-limited scientist that identifies the set LL in text.

Then, let σ be a locking sequence for N on L. Take n0 P N such that x0, n0y R content pσq.Now, since x0, n0y P L, we have Npσq “ Npσ ˛ x0, n0yq. Adding the element x1, n0y to both σ and

σ ˛ x0, n0y, we haveNpσ ˛ x1, n0yq “ Npσ ˛ x0, n0y ˛ x1, n0yq,

since N is memory-limited.Finally, let U be a text for L in which every instance of the element x0, n0y has been removed from U .

Consider the following texts T and T 1 such that

T “ σ ˛ x1, n0y ˛ U , and

T 1 “ σ ˛ x0, n0y ˛ x1, n0y ˛ U .

Now, T is a text for L1n0and T 1 is a text for Ln0

. However, since Npσ˛x1, n0yq “ Npσ˛x0, n0y˛x1, n0yq

and N is memory-limited, then N must converge to the same index for both T and T 1 and thereforecannot identify both. Hence, N cannot identify LL.

2.93 Corollary. rSML, texts Ă rS, texts

The previous proposition clearly shows that memory limitation reduces the identifying power ofscientists for sets in text. However, the following proposition states that this loss can be entirely com-pensated by working instead in fat text.

2.94 Proposition. rSML, fat texts “ rS, texts

Proof. It is immediate that rSML, fat texts Ď rS, fat texts. Since by Proposition 2.38 we have rS, fat texts “rS, texts, it follows that rSML, fat texts Ď rS, texts.

Conversely, let us show that rS, texts Ď rSML, fat texts. Consider any class of sets C P rS, texts. Letg : Nˆ FIN Ñ N be a recursive, one-to-one function that encodes each pair consisting of an integer n anda finite set D Ă N into the integer gpn,Dq.9 For each set L P C, let DL be an Angluin set for L.

9For instance, take gpn,Dq “ 2n ˆ 3b0 ˆ 5b1 ˆ ¨ ¨ ¨ ˆ pbkk`2 ´ 1, where k “ maxtDu, pi is the i-th prime, and bi is 1 if i P D

and 0 otherwise.

53

We define the Markov scientist M with memory function µ such that µ0 “ pi∅, gp0,∅qq, where i∅ isthe least index for the set ∅, if Wi “ L P C and DL Ď D Y tσlastu Ď L then

µpσlast, i, gpn,Dqq “

$

&

%

pi, gpn,Dqq, if n ă σlast

pi, gpn,D Y tσlastuqq, if n ě σlast,(2.7)

and if Wi R C or D Y tσlastu ĘWi then

µpσlast, i, gpn,Dqq “

$

&

%

pj, gpn` 1, D Y tσlastuqq, if there is L1 P C and DL1 Ď D Y tσlastu Ď L1

pi∅, gpn` 1, D Y tσlastuqq, otherwise,(2.8)

where j is the least index for L1 P C.Since M is a Markov scientist, it is memory-limited by Theorem 2.91. Let us verify that M identifies

C in fat text.Let T be a fat text for L P C. From inspection of Equations 2.7 and 2.8, it is clear that the mem-

ory function µ has a single fixed point. If M converges on fat text, then for all σlast P L, we haveµpσlast, i, gpn,Dqq “ pi, gpn,Dqq, where i is an index for some Wi P C, D is such that DWi

Ď DYtσlastu Ď

Wi, and DWiis an Angluin set for Wi. Since T is a fat text for L, this implies that DWi

Ď L Ď Wi. Butby the Angluin condition, if DWi

Ď Wi is an Angluin set for Wi, then there cannot exist L P C whereL ‰Wi such that DWi

Ď L P C and L ĂWi. Thus, Wi “ L.Now, suppose M does not converge on fat text T for L P C. This implies that the conditions of

Equation 2.8 will occur infinitely often and that n will increase to be arbitrarily large. Then, let usconsider some prefix τ of T such that τlast “ x P L and Mpτ´q “ ωpi, gpn,Dqq, where n ě x. Such aprefix must exist, since T is a fat text for L. Then, µpτlast, i, gpn,Dqq “ pi

1, gpn1, DYtxuqq, for some i1 andn1. Thus, every element of L is eventually added to the set D in the memory function of M, and so thereis some prefix ρ such that the set D contains an Angluin set DL for L.

Then, for any prefix that contains ρ, M will always return some ωpj, gpn,Dqq, where j is an index forsome set L1 P C and j is less than the least index for L. But by supposition, M cannot converge to anyL1, and so M will eventually conjecture an index for L. Thus, M will converge on an index for L.

2.95 Proposition. The scientist defined in Algorithm 2.5 of the proof of Angluin’s Theorem is not memory-limited.

Proof. Consider C “ tA,B,Cu where A “ t1, 2u, B “ t1, 3u, and C “ t2, 3u. Assume that the smallestindex for A is smaller than the smallest index for B. Then for σ “ 1 and τ “ 2 we have Mpσq “ A andMpτq “ A but Mpσ ˛ 3q “ B and Mpτ ˛ 3q “ C. Thus, M is not memory-limited.

An example of a class which is identifiable by a memory-limited scientist in text is the class of finitesets.

2.96 Proposition. FIN P rSML, texts

Proof. For each finite set D, let iD be an index for D and let h : FIN Ñ N be an encoding of the finite sets.Consider the Markov scientist M with memory function µ such that µ0 “ pi∅, hp∅qq and

µpσlast, i,mq “

$

&

%

piD, hpDqq, if σlast R h´1pmq, where D “ h´1pmq Y tσlastu

pi,mq, if σlast P h´1pmq.

Clearly, given any finite set D P FIN, M converges to an index for D as soon as all of its elementshave been observed.

54

One might wonder what effect a composition of constraints will have on the identifying power of ascientist. For example, it may seem reasonable to assume that if a particular class is identifiable by botha computable scientist and by a memory-limited scientist, then there must also exist some computable,memory-limited scientist which identifies the same class. However, this is not the case, as is shown bythe following proposition.

2.97 Proposition. Consider the class LA made up of the set L “ tx0, iy : i P Au, where A is a nondecidable,recursively enumerable set, the set Ln “ tx1, ny Y Lu and the set L1n “ tx1, ny, x0, nyu Y L. Then

(a) LA P rSML, texts

(b) LA P rSCP, texts

(c) LA R rSCP X SML, texts

Proof. Let iL be an index for L and, for each n P N, let iLnand iL1n be indexes for Ln and L1n, respectively.

We first prove (a). Consider the Markov scientist with memory function µ such that µ0 “ piL, 0q and

µpσlast, i,mq “

$

’

’

’

&

’

’

’

%

piL1n ,mq, if σlast “ x0, ny and n R A

piLn,mq, if σlast “ x1, ny and i ‰ iL1n

pi,mq, otherwise.

The scientist will correctly converge to an index for L in texts which contain neither x0, ny nor x1, nywhere n R A. On the other hand, it will converge to an index for L1n in texts which contain x1, ny but notx0, ny and converge to an index for Ln in texts which contain both elements.

Observe that the memory function does not modify the ‘memory component’ m of µ — every newconjecture for a set depends only on the previous conjecture i and the newest observation σlast.

To prove (b), we construct a computable scientist M. Let T be a Turing machine that computes thepartial characteristic of A.

Given a prefix σ, let M return iL if and only if x1, ny R content pσq for all n P Z. In the case thatx1, ny P content pσq but x0, ny R content pσq, let M return iLn

. Finally, if tx1, ny, x0, nyu Ď content pσq,M returns iLn

if T halts for n in at most |σ| steps, and returns iL1n otherwise. Since A is recursivelyenumerable, T will eventually halt for n if and only if n P A, in which case M correctly identifies thecorrect set.

Finally, we prove (c). Suppose there exists a computable memory-limited scientist N that identifiesLA. Let σ be a locking sequence for N on L. Then, for every n P A, Npσ˛x0, nyq “ Npσq. Therefore, theremust be some m P A such that Npσ ˛ x0,myq “ Npσq, or otherwise A would be recursively enumerable,and A would be decidable.

Let us fix such an m and let U be a text for L. We define texts T and T 1 such that

T “ σ ˛ x0,my ˛ U , and

T 1 “ σ ˛ x0,my ˛ x1,my ˛ U .

Now T is a text for Ln and T 1 is a text for L1n, where Ln ‰ L1n. But by the choice of m and memory-limitation of N, we have Npσ ˛x0,myq “ Npσ ˛x0,my˛x1,myq, and so N will converge to the same indexfor both T and T 1. Thus, N cannot identify both T and T 1.

55

2.98 Corollary. rSCP X SML, texts Ă rSCP, texts X rSML, texts

Proof. It is immediate that rSCP X SML, texts Ď rSCP, texts and rSCP X SML, texts Ď rSML, texts, and thereforerSCP X SML, texts Ď rSCP, texts X rSML, texts. Proposition 2.97 provides an example of a class of sets thatbelongs to rSCP, texts X rSML, texts but not to rSCP X SML, texts

In some cases, even the constraint of memory limitation still allows the identification of classes ofsets which are not computably identifiable in text.

2.99 Proposition. K˚ P rSML, texts

Proof. Let k be an index for the setK and, for eachKYtxu ‰ K, let kx be an index forKYtxu. The classof sets K˚ “ tK Y txu : x P Nu is trivially identified by the Markov scientist M with memory function µsuch that µ0 “ pk, 0q and

µpσlast, i,mq “

$

&

%

k, if σlast P K

kx, otherwise, where x “ σlast.

Since there is at most one x P content pT q ´K, the Markov scientist M correctly identifies K˚.

2.100 Corollary. The collections rSML, texts and rSCP, texts are not comparable.

Proof. Consider the class LL “ tLu Y tLn : n P Nu Y tL1n : n P Nu defined in Proposition 2.92. Itwas shown that LL R rSML, texts. Additionally, the reader can verify that the scientist which identifiesLL in text is computable, and so LL is identifiable by a computable scientist in text. Hence, LL P

rSCP, texts ´ rSML, texts.On the other hand, it follows by Propositions 2.63 and 2.99 that K˚ P rSML, texts ´ rSCP, texts.

2.101 Proposition. Let LL “ tL,Ln, L1n : n P Nu be the class of sets such that L “ tx0, xy : x P Nu,

Ln “ LY tx1, nyu and L1n “ Ln ´ tx0, nyu. Then LL is identifiable by a memory-limited scientist in informantbut is not identifiable by a memory-limited scientist in text.

Proof. It was shown in Proposition 2.92 that LL is not identifiable by a memory-limited scientist in text.However, LL is easily identifiable by a memory-limited scientist in informant.

Let iL be an index for L and, for each n P N, let iLnand iL1n be indexes for Ln and L1n, respectively.

Consider the Markov scientist M with memory function µ such that µ0 “ piL, 0q and

µpσlast, i,mq “

$

’

’

’

&

’

’

’

%

piL1n ,mq, if σlast “ xx0, ny, 0y

piLn ,mq, if σlast “ xx1, ny, 1y and i ‰ iL1n

pi,mq, otherwise.

Clearly, M will that always returns an index for L until either xx0, ny, 0y or xx1, ny, 1y is found in theprefix. In that case, the scientist will return iLn

or iL1n , respectively. Then, M will converge to iLnif the

element xx0, ny, 0y is never found in the prefix and M will converge to iL1n if it is. Thus, M identifiesLL.

2.102 Corollary. rSML, texts Ă rSML, informants

2.103 Corollary. rSML X SCP, texts Ă rSML X SCP, informants

2.104 Corollary. rSML X SCP, texts Ă rSCP, informants

56

2.105 Proposition (Osherson et al. [21, p. 117]). Let C “ FIN Y tNu. Then C is identifiable by a computablescientist in informant but is not identifiable by a memory-limited scientist in informants.

Proof. By Proposition 2.68 we have FIN Y tNu P rSCP, informants. Suppose there exists a memory-limitedscientist M that identifies C in informant. Then by the locking sequence theorem for informant in Theo-rem 2.12, for all S P C, there is a locking sequence for M on S in informant.

Let σ be a locking sequence for N,D “ tx : xx, 1y P content pσqu be a finite set, and τ “ σ˛σ1 be a lock-ing sequence forD.10 Now, let n be the smallest integer such that n R tx : xx, iy P content pτq for some iu.Thus, we have Mpσq “Mpσ˛xn, 1yq because σ is a locking sequence for N and Mpσ˛σ1q “Mpσ˛xn, 1y˛

σ1q is an index for D by memory-limitation of M. Now, if we extend this prefix by concatenating to itevery element of the set txx, 0y : x R D Y tnuu, then we obtain an informant

T “ σ ˛ xn, 1y ˛ σ1 ˛ xx0, 0y ˛ xx1, 0y ˛ . . .

for the finite set D Y tnu, where xi P txx, 0y : x R D Y tnuu. However, M converges to an index for D onT because of memory-limitation, and so M cannot identify D Y tnu.

2.106 Corollary. rSCP, informants Ę rSML, informants

2.107 Corollary. rSML, informants Ă rS, informants

As with the case for computable scientists, imperfect text implies a loss of identifying power formemory-limited scientists.

2.108 Proposition. rSML, imp. texts Ă rSML, texts

Proof. An example of a class of sets that is identifiable my a memory-limited scientist in text but notin imperfect text is FIN. By Proposition 2.96, we have FIN P rSML, texts. However, no memory-limitedscientist identifies FIN in imperfect text, since an imperfect text for any finite set is also an imperfect textfor every other finite set.

2.4.2 In functions

As with sets, memory-limitation implies a strict loss of identifying power in functions.

2.109 Proposition. rSMLf , texts Ă rSf , texts

Proof. This proof is a simple adaptation of Proposition 2.105 for memory-limited set identification ininformant.

Consider C be the class made up of all the charateristic functions of finite sets and the constant func-tion χN “ λn . 1. By Theorem 2.49, C is identifiable in text. Suppose there exists a memory-limitedscientist M that identifies C in informant. Then by the locking sequence theorem for informant in Theo-rem 2.13, for all f P C, there is a locking sequence for M on f in informant.

Let σ be a locking sequence for χN, D “ tx : xx, 1y P content pσqu be a finite set, χD be the charac-teristic function of D, and τ “ σ ˛ σ1 be a locking sequence for χD.11 Now, let n be the smallest integersuch that n R tx : xx, iy P content pτq for some iu. Thus, we have Mpσq “ Mpσ ˛ xn, 1yq because σ is alocking sequence for χN and Mpσ ˛ σ1q “Mpσ ˛ xn, 1y ˛ σ1q is an index for χD by memory-limitation ofM. Now, if we extend this prefix by concatenating to it every element of the set txx, 0y : x R D Y tnuu,then we obtain an informant

T “ σ ˛ xn, 1y ˛ σ1 ˛ xx0, 0y ˛ xx1, 0y ˛ . . .

10Such a prefix must exist by Theorem 2.12.11Such a prefix must exist by Theorem 2.13.

57

for the characteristic function of the finite set D Y tnu, where xi P txx, 0y : x R D Y tnuu. However,M converges to an index for χD on T because of memory-limitation, and so M cannot identify thecharacteristic function of D Y tnu.

The classes AEZ and SD are examples of class of functions that are identifiable by a memory-limitedscientist in text.

2.110 Proposition. AEZ P rSMLf , texts

Proof. Let h : FIN Ñ N be an encoding of the finite sets and for each finite set D of points of a function,let iD be an index for the function f such that fpxq “ y if xx, yy P D and fpxq “ 0 otherwise. We builda Markov scientist M where the defining computable function f receives the current hypothesis i andthe set Z of points previously observed to be different from zero. Let the memory function of M be suchthat µ0 “ pi∅, hp∅qq and

µpσlast, i,mq “

$

&

%

piD, hpDqq, if π2pσlastq ‰ 0 and σlast R h´1pmq, where D “ h´1pmq Y tσlastu

pi,mq, otherwise.

Clearly, given any finite function f P AEZ, M converges to an index for f as soon as all of the pointsof its graph which are different than zero have been observed.

2.111 Proposition. SD P rSMLf , texts

Proof. Consider the Markov scientist M with memory function µ such that µ0 “ p0, 0q and

µpσlast, i,mq “

$

&

%

pπ2pσlastq,mq, if σlast “ x0, ky

pi,mq, otherwise.

Clearly, M identifies SD in text.

2.112 Proposition (Jain et al. [12]). The class of functions C0 containing the characteristic of the set of naturalnumbers and the characteristic of all its finite subsets is not identifiable by any memory-limited scientist.

Proof. We show a memory-limited scientist that identifies the characteristic function of N cannot identifythe characteristic function of every finite set.

Let M be a memory-limited scientist, h be the characteristic function of N and suppose M identifiesC0. Then, there is a locking sequence σ such that content pσq Ă h and

φMpσq “ h.

Let D “ tn : xn, 1y P content pσqu and g be the characteristic function of the set D. Since D is a finiteset, then M must identify g. Therefore, there is some τ such that σ ˛ τ is a locking sequence for g and

φMpσ˛τq “ g.

Let D1 “ D Y tnu such that neither xn, 0y nor xn, 1y is in content pσq Y content pτq. It is clear that nmust exist since both σ and τ are finite. Let g1 be the characteristic function of D1. Since D1 is also a finiteset, then M must also identify g1. Notice that n is the only point where g and g1 differ, such that xn, 0y P gand xn, 1y P g1.

Let T be a text for g and let T åxn,0y be the text T in which all occurrences of xn, 0y have been removed.Let ρ be a prefix of T åxn,0y.

58

Finally, we show that M cannot identify h, g and g1.

(a) Mpσq “Mpσ ˛ xn, 1yq since σ is a locking sequence for h and xn, 1y P h.

(b) Mpσ ˛ τq “Mpσ ˛ xn, 1y ˛ τq because M is memory-limited and by (a).

(c) Mpσ ˛ τq “Mpσ ˛ τ ˛ xn, 0yq since σ ˛ τ is a locking sequence for g and xn, 0y P g.

(d) Mpσ ˛ xn, 1y ˛ τq “Mpσ ˛ τ ˛ xn, 0yq by (b) and (c).

(e) Mpσ ˛ xn, 1y ˛ τ ˛ ρq “Mpσ ˛ τ ˛ xn, 0y ˛ ρq because M is memory-limited and by (d).

Now, σ ˛ τ is a locking sequence for g, so for every prefix ρ, φMpσ˛τ˛xn,0y˛ρq “ g because xn, 1y R ρand content pρq Ă g. For a sufficiently large prefix ρ, σ ˛ xn, 1y ˛ τ ˛ ρ is also a locking sequence for g1,and so φMpσ˛xn,1y˛τ˛ρq “ g1. Then it is possible to continue both prefixes in (d) with the same ρ, so thatσ ˛ xn, 1y ˛ τ ˛ ρ will always be a prefix of a text for g1 and σ ˛ τ ˛ xn, 0y ˛ ρ will always be a prefix of a textfor g. But by (e), the scientist must return the same conjecture in both cases, so that it cannot identifyboth texts. Therefore, there can be no memory-limited scientist which identifies C0.

2.113 Corollary. The collection rSMLf , texts of classes of functions identifiable by memory-limited scientists in text

is not closed under unions.

Proof. The class consisting of the single function χN “ λx . 1 is trivially identifiable by the constantscientist that always returns the same index for χN. On the other hand, the class consisting of the charac-teristic functions of FIN is also identifiable by a memory-limited scientist — since χFIN is a subset of theclass AEZ, the Markov scientist defined in Proposition 2.110 also identifies χFIN. However, by Proposition2.112, C0 “ χN Y χFIN is not identifiable by any memory-limited scientist.

As with every other previous case, imperfect text presents a strict loss of identifying power formemory-limited scientists in text.

2.114 Proposition. rSML, imp. texts Ă rSML, texts

Proof. It is enough to consider the class AEZ. By Proposition 2.110, AEZ is identifiable in text by amemory-limited scientist. However, for any two functions f and g in AEZ, the text T for the func-tion λx . 0 is an imperfect text for both f and g, since they are all finite variants. Therefore, there can beno memory-limited scientist which identifies AEZ.

2.4.3 Comparison of collections of classes for memory-limited scientists


rSML, fat texts

rSCP, texts rSML, texts

rSCP X SML, texts

E

K˚LL

LA

FIN

Figure 2.10: Collections of classes of sets,categorized by memory-limited scientists

rSf , texts “ 2R

rSMLf , texts

rSMLf , imp. texts

RC0

AEZ

Figure 2.11: Collections of classes of functions,categorized by memory-limited scientists

59

Chapter 3

Identifying scientific laws

3.1 The nature of empirical laws

Until now, we have only considered function and set identification under the premise that everyelement of a text was exact and did not include any measurement errors. In reality, the nature of thephysical universe implies that experimental observations are not as well-defined as points on a functionor elements of a set. Purely mathematical objects such as the ones studied in Chapter 2 are in many waysdissimilar to the experimental data collected by real-world scientists, and so it is convenient to consideran identification paradigm which takes into account these kinds of errors.

In what follows, a scientific law is loosely defined as a mathematical function f that relates a depen-dent variable y to an independent variable x. Under the reasonable assumption that multiple indepen-dent variables can be isolated and treated separately, this notion is arguably an adequate description ofcountless laws of nature, such as Newton’s law of universal gravitation or Boyle’s ideal gas law. The jobof an empirical scientist is therefore to inductively infer scientific laws using a finite set of experimentswhich are inevitably prone to all sorts of random and systematic errors.

To an extent, the previously studied learning environment of imperfect text allows for some degreeof error consideration. However, an argument can be made that it does not truly reflect how errors occurin scientific discovery. On the one hand, imperfect text only contains a finite number of ‘errors’ fromwithin all possible instances in a text, whereas an empirical scientist observing instances of an empiricallaw will likely encounter an infinite number of errors. On the other hand, imperfect text may also be toopessimistic for modelling scientific discovery: errors tend to ‘cancel out’ in empirical observations, butin imperfect text, every error is repeated infinitely and is never ‘corrected.’

To account for an infinite number of errors, Jain et al. [12] suggest an alternative concept of approxi-mate identification. Here, a scientist identifies some function f approximately if it converges to an indexof a function g which differs from f in an infinite number of points with a density greater than 0. Thismodel does indeed allow us to consider infinitely more errors than with imperfect text, but it does notaddress the issue that text for f must still be exact in the points where it is correct. In other words, whilethe identification of the function itself may be approximate, the text for the function is still required to beexactly correct in an infinite number of points. Other approaches such as the Probably ApproximatelyCorrect (PAC) learning model presented by Kononenko and Kukar [15] also face similar problems or aremore suited for functions that have a discrete range, such as categorization functions.

On the other end of the spectrum, Langley et al. [16] use a more practical method of function iden-tification with very positive results. Using a computer program aptly named BACON, they are ableto identify several different scientific laws with nothing more than data on physical observations andexperiments. Examples of scientific laws that were successfully identified by BACON include Kepler’s

61

law, Ohm’s law, and even more complex properties such as Snell’s law of refraction or the law of con-servation of momentum. While this paradigm has clearly proved to be successful, it is almost entirelybased on the statistical properties of the program and is therefore somewhat unrelated to the concept ofa learner we have presented thus far.

We propose a new kind of empirical text and empirical scientist, inspired on the aforementionedconcepts, and closely related to the computable physical models presented in Szudzik [26].

The largest obstacle we face in specifying a suitable model of empirical identification is a meansof encoding the real numbers into the natural numbers. Such an objective is evidently impossible, asthere is an uncountable infinity of real numbers but only a countable infinity of natural numbers. Areasonable alternative would be to study functions whose domain and codomain lie in Q, which isindeed a countable set. Unfortunately, such a model would be poorly suited for identifying even triviallaws, such as the function λ r . 2πr for the perimeter of a circle of radius r.

A naive attempt to solve this issue is to consider a function to be acceptably identified if the ‘real’function (e.g. λ r . 2πr) and the conjectured function (in this case, a linear function of type λ r . ax` b,with a, b P Q) always differ by an error less than some fixed absolute error. However, a brief considera-tion of this model makes it evident that the only linear functions which satisfy this condition are thosewhere a “ π. Similarly, if we were to use a relative error instead of an absolute one, a function such asλx . πx` π could not be suitably approximated with a bounded relative error by any linear function forwhich b ‰ π.

A much richer set of numbers which is still countable is the set of computable numbers: the set of allreal numbers which may be calculated to any desired precision by a Turing machine. Constructing aparadigm for empirical identification with the use of computable numbers appears to solve the issueswhich one faces with the rational numbers — indeed, in addition to all the numbers in Q, numbers suchas π, e,

?5, and log 2 are all computable. Unfortunately, despite being a countable set, the computable

numbers are not computably enumerable. This poses several difficulties, namely that it implies thatno surjective computable function from the natural numbers to the computable real numbers exists —as such, no computable decoding of the computable numbers exists either, which would force us toconsider noncomputable empirical scientists.

We therefore chose as our basis a subset of the computable numbers, the primitive recursive numbers,which are both countable and computably enumerable, and use them as the foundation of our model ofempirical identification.

3.1 Definition (Primitive recursive number and primitive recursive defining function). Let x P R. Wesay that x is a primitive recursive number if there exists a primitive recursive function γ : NÑ N such that

(a) γp0q “ maxt0, sgnpxqu “

$

&

%

0, if x ď 0

1, if x ą 0,

(b) γpnq “ t10n´1 ˆ |x|u for n ą 0,

where sgn is the signum function. We say that γ is the primitive recursive defining function for x, and ingeneral we denote γ by dx. The set of all primitive recursive numbers is denoted by P.

Thus, a real number x is said to be primitive recursive if there is some primitive recursive function γsuch that γp0q encodes the sign of x, and, for all n ą 0, γpnq encodes x up to n´ 1 decimal places.

62

3.2 Example.

• The number 13 is primitive recursive, since the function

d 13“ λn .

$

’

’

’

’

&

’

’

’

’

%

1, if n “ 0

0, if n “ 1

33 . . . 33looomooon

n´ 1 times

, if n ě 2

is clearly primitive recursive.

• The rational numbers are all primitive recursive, since the operations of division, multiplication,and division are all primitive recursive and are enough to generate the decimal expansion of anyrational number using bounded loops.

• The number π is primitive recursive, as we can use Machin’s formula to compute π to an arbitrarydecimal precision.1 The first 5 terms of the primitive recursive defining function for π are

dπp0q “ 1, dπp1q “ 3, dπp2q “ 31, dπp3q “ 314, dπp4q “ 3141.

• The algebraic numbers are the numbers which are roots of any nonzero polynomial with integercoefficients. They are primitive recursive.2

• Given a rational number x, the image of x under the basic trigonometric functions (e.g. the num-bers sinpxq, cospxq, tanpxq, etc.) is algebraic, and therefore primitive recursive.

Clearly, since P is a countable set, it follows that almost all irrational numbers are not primitive re-cursive, as there are uncountably many of them. However, the primitive recursive numbers, whileundeniably forming a proper subset of the computable numbers, nonetheless serve to construct a richmodel of empirical identification.

We define an indexable set of primitive real recursive functions based on primitive recursive functions.These functions will form the basis of empirical identification.

3.3 Definition (Primitive real recursive function). Let γ : N2 Ñ N be a primitive recursive function. Wesay that γ is a primitive real recursive function if, for each fixed m P N and for all n P N, we have γpm,nq “dypnq for some primitive recursive number y. In other words, for each fixed m P N, λn . γpm,nq is theprimitive recursive defining function for some y P P.

Essentially, a primitive real recursive function will enable us to encode a range of primitive recursivedefining functions into a single function of type N2 Ñ N. Informally speaking, m will represent the‘independent variable’ of the phenomenon that the scientist will attempt to identify, and λn . γpm,nq

will represent the decimal expansion of its ‘dependent variable.’

1Machin’s formula is given by π “ 16 arctan`

15

˘

´4 arctan`

1239

˘

. Using Taylor’s expansion for arctan, we may compute π toany arbitrary precision. Conveniently, Machin’s formula computes an additional decimal place for each additional term computedin the Taylor series.

2To show that an algebraic number α is primitive recursive, it is enough to show that there exists some m P N and someprimitive recursive function f that computes t10n ˆ |α|u for all n ą m.

Let p be a nonzero polynomial of minimal degree with integer coefficients such that ppαq “ 0. Then α is a simple root of p, i.e.p1pαq ‰ 0, where p1 is the derivative of p. Without loss of generality, assume that α ą 0, p1pαq ą 0, and α is irrational. Let a and bbe rational numbers such that 0 ă a ă α ă b and ppxq is negative for a ă x ă α and positive for α ă x ă b. Let m be such that,for all n ą m, fpnq is the least k such that

a ăk

10nă b and p

ˆ

k ` 1

10n

˙

ą 0.

Since we are applying the bounded least number operator, f is primitive recursive and is as intended.

63

3.4 Example. Table 3.1 shows values of the primitive real recursive function γ, where each line with afixed m contains successive values of the primitive recursive defining function dypnq “ λn . γpm,nq fory “ mπ.

γn

0 1 2 3 40 0 0 0 0 01 1 3 31 314 3141

m 2 1 6 62 628 62833 1 9 94 942 94244 1 12 125 1256 12566

Table 3.1: Values for a primitive real recursive function γpm,nq, where for each fixed m,dypnq “ λn . γpm,nq is the primitive recursive defining function, where y “ mπ

We may use our indexation of the partial recursive functions to obtain an index for each primitivereal recursive function.

3.5 Definition (Index for a primitive real recursive function). Let γ : N2 Ñ N be a primitive real recursivefunction and φ : NÑ N be the recursive function such that, for all m,n P N, γpm,nq “ φipxm,nyq. If i isan index for φ then we say that i is also an index for γ.

3.6 Corollary. There exists a recursive enumeration of the primitive real recursive functions.

Using primitive real recursive functions, we may study the identification of functions of primitiverecursive numbers. We call these discoverable functions, in the sense that one may ‘discover’ each one byresorting to its index. Discoverable functions are based on a particular type of function which we calldistinguishable function.

3.7 Definition (Distinguishable function). Let f : R Ñ R be a total real function. We say that f is adistinguishable function just in case fpQq Ď P, i.e. if, for all x P Q, fpxq is a primitive recursive number.

3.8 Definition (Discoverable function). Let F : R Ñ R be a distinguishable function and f : Q Ñ P bethe restriction of F to Q. We say that f is a discoverable function if there exists a primitive real recursivefunction γ : N2 Ñ N such that, for allm,n P N, γpm,nq “ dypnq, where y “ fpxq P P and x “ q´1pmq P Q.In this case, we say that γ is the primitive real recursive function of f . The class of all discoverablefunctions is denoted DS.

In other words, a function f : Q Ñ P is discoverable if there is a primitive real recursive function γsuch that λn . γpm,nq is the primitive recursive defining function for each fpxq, with x P Q and qpxq “ m.Note that discoverable functions (which are of type QÑ P) are not a subset of distinguishable functions(which are of type R Ñ R), but rather that the latter is associated with the former on a many-to-onebasis.

Note that not all distinguishable functions have a discoverable function associated to it, as there areuncountably many functions of type Q Ñ P and only countably many discoverable functions (sincethere are only countably many primitive recursive functions). However, this has the advantage of al-lowing the set of discoverable functions to be recursively enumerable: it is enough to see that there isa recursive enumeration of the primitive recursive functions of type N2 Ñ N and that we may recur-sively obtain the primitive real recursive function for each discoverable function. Figure 3.1 illustratesthe relationship between primitive real recursive functions and discoverable functions.

As before, we may use the indexation of the partial recursive functions to obtain an index for eachdiscoverable function.

64

x P Q y P P

m P N dypnq P N

fpxq

γpm,nq

qpxq limnÑ8 d1ypnq

Figure 3.1: A primitive real recursive function γ provides a way of computing the image y of adiscoverable function f to an arbitrary precision, where d1ypnq “ p´1qpdyp0q`1q ˆ

dypnq10n´1

3.9 Definition (Index for a discoverable function). Let f be a discoverable function and γ be its primitivereal recursive function. If i is an index for γ then i is also an index for f .

3.10 Corollary. There exists a recursive enumeration of the discoverable functions.

Note that the partial recursive function φ effectively represents the primitive real recursive functionγ of f (and consequently, a primitive recursive function of type N2 Ñ N). In this sense, we may onoccasion use some abuse of terminology and say that φi is also the primitive real recursive function off .

Discoverable functions (using their respective primitive real recursive functions) are what our con-cept of empirical scientist will attempt to identify. This obviously implies the underlying assumptionthat scientific laws are distinguishable functions, and that a discoverable function is enough to ‘mostly’3

model them. It logically follows that if these assumptions are not true, then scientific laws are not iden-tifiable in this sense described here.

The nature of empirical measurements

We must now consider what we wish to mean by ‘empirical text.’ In real-world experiments, sci-entists do not make exact measurements: there always exists some error or uncertainty associated toeach one. These errors may be classified under two separate categories. The first kind is the one due tothe limitations of the measuring instruments: even in the absence of random noise or external factors,measurements have a finite precision (for example, in attempting to determine the length of an object, ascientist can be no more exact than the smallest division of their ruler permits). The second kind of errortypically involves what is referred to as random and systematic errors. Examples of these would be amisalignment of the ruler with the edge of the object, or an expansion or contraction of the same objectdue to thermal variations in the environment.

The nature of these errors suggests that empirical measurements are not point values but ratherintervals. Indeed, the use of error bars is a common to graphically represent the amount of uncertaintyin each measured value. In this way, we may think of each measurement as an interval of two rationalnumbers.

On the other hand, we note that each measurement must correspond to some observation or exper-iment. These experiments may be discrete (e.g. measuring the lengths of four different metal bars) orcontinuous (e.g. measuring the length of a single metal bar as a function of temperature). In the formercase, we do not generally ascribe each experiment with an error margin, since each case is clearly dis-tinguishable from the other. However, in the latter case, there exists an error margin not only associated

3We use this term in a rough sense: the reader may take it to mean that the behavior of a scientific law for irrational values inthe domain of a distinguishable function is either irrelevant in practice or interpolatable using rational measurements. In any case,it may be reasonably argued that all empirical measurements are necessarily of rational nature (since typically it is not possible tocarry out a measurement with infinite precision), and that therefore it is only possible to observe the behavior of scientific laws inQ.

65

with the dependent variable (in this example, the length of the metal bar), but also with the independentvariable (the temperature of the bar).

It would seem that this kind of double error especially associated with ‘continuous’ experimentswould require that each empirical measurement be a pair of rational intervals. This introduces a largeamount of undesirable complexity that would later make identification much more difficult. To avoidthis, we assume that each dependent variable is always exact, and that the error in this assumptionpropagates into the error of the independent variable. Note that since the error in any measurementis always finite, there always exists some sufficiently large error margin such that the ‘true’ value lieswithin the margin. Of course, a scientist must always take care in selecting an suitably large errorinterval which is guaranteed to contain the real value.4

One final assumption is that it is possible to make an arbitrarily precise measurement, and that givenenough ‘time’ (where time can be taken to mean the number of experiments carried out by some scientist,technological advancements that enable the collection of more accurate data, or even literal time), sucha measurement will eventually be made. Admittedly, this is a rather significant assumption, and itmay be argued that no process exists for the collection of measurements with an unbounded precision— Heisenberg’s uncertainty principle suggests as much.5 On the other hand, one can also make thereasonable claim that if it is impossible to distinguish a theory from a ‘true’ scientific law by any amountof experimentation or observation, then the two are equivalent for all practical purposes.

Using this model, we may define an empirical text for a total function f as a sequence of encodedtriplets in Q3 that meet certain specific conditions.

3.11 Definition (Empirical text). Let f : RÑ R be a total function and q be the encoding of Q defined inSection 5.1 of the Appendix. We say that a learning environment T is an empirical text for f if:

(a) for all elements n P T , n is of the form xqpxq, xqpy1q, qpy2qyy, where x, y1, y2 P Q and y1 ď y2. Wewrite xx, y1, y2y as shorthand for xqpxq, xqpy1q, qpy2qyy.6

(b) if xx, y1, y2y P T and xx, y11, y12y P T then ry1, y2s X ry11, y

12s ‰ ∅.

(c) for all x P Q and for all ε ą 0, there is xx, y1, y2y P T such that fpxq´ε ă y1 ď fpxq ď y2 ă fpxq`ε.

We denote the set of all empirical texts by emp. text and the set of all prefixes for empirical text by ESEG.

Elements of empirical texts can be thought of as the error bars shown in Figure 3.2. Here, we thinkof the x-variable as being an exact, rational measurement, and the pair py1, y2q of elements from themeasured variable y as encompassing the entirety of inaccuracies in each experiment or observation. Inreality, since the x variable always has some level of error associated to it, this uncertainty is passed alongto the measured variable with a suitable increase of each ∆y. Put another way, an element xx, y1, y2y inan empirical text for a function f can simply be read as it is absolutely certain that fpxq is in the intervalry1, y2s.7

Conditions (b) and (c) of Definition 3.11 require empirical texts to contain arbitrarily precise andnon-conflicting measurements: given enough time and with the appropriate instruments, a real-worldscientist should be able to make increasingly more accurate observations and, crucially, create moredemanding trials against which to test his or her hypothesis.

4This holds true even for laws which are given by discontinuous functions: as the independent value approaches the point ofdiscontinuity, the error margin increases accordingly.

5Roughly speaking, Heisenberg’s uncertainty principle describes a property of quantum mechanics which states that thereexists a fundamental limit to the precision with which the position and momentum of a particle can be known simultaneously.

6Some care should be taken as to not confuse the term xx, y1, y2ywith the extended Cantor pairing function for xx1, x2, x3y “xxx1, x2y, x3y. However, in general this risk is not significant, as px, y1, y2q are triplets in Q3 and px1, x2, x3q are triplets in N3.

7If we wished to do so, it would be possible to slightly modify Definition 3.11 to allow y1 P pQYt´8uq and y2 P pQYt`8uqby simply encoding the countable set QYt´8,`8u into the natural numbers. This has the advantage of enabling measurementswith an infinite error margin — however, this would unnecessarily complicate our model, and it is not clear that this kind ofmeasurements would provide any actual benefits for empirical texts.

66

x

∆y

Figure 3.2: Measurements of an empirical text. The ‘true’ scientific law is the function shown in blue.For each experiment performed with a rational independent variable x, a ∆y interval of rationals is

measured.

Empirical identification

With these concepts in mind, all that remains is to define an empirical scientist and to suitably adaptthe previous definitions of identification to account for empirical identification. Given the more appli-cable nature of this topic, we require empirical scientists to be computable functions. Additionally, dueto the type of scientific laws which we wish to study, empirical scientists are defined exclusively forfunctions and not for sets.

3.12 Definition (Empirical scientist). An empirical scientist M : ESEG Ñ N for functions is a computableand possibly partial function. The school of empirical scientists is denoted SCP

e .

3.13 Definition (Empirical identification). We say that an empirical scientist M empirically identifies, orsimply identifies, a class of discoverable functions C if, for all f P C and for all empirical texts T for f ,M converges to an index for f , that is, if for all but finitely many prefixes σ of T , Mpσq is an index forf . The collection of all classes of functions which are identifiable by a computable empirical scientist inempirical text is rSCP

e , emp. texts.

Empirical identification is therefore conceptually identical to regular identification, with the maindistinction being the text used for identification.

3.2 Empirical identification of the class of discoverable functions

The most significant result of empirical identification is that the entirety of the class of discoverablefunctions is identifiable. These functions appear to at least constitute the majority of the scientific lawsthat have ever been discovered throughout history. They include Boyle’s Law, Einstein’s equations forGeneral Relativity, Maxwell’s equations, and many others. In fact, we are not aware of any functionalscientific law which cannot be expressed or computed using discoverable functions. It is therefore verypromising that these functions are identifiable in the context of empirical identification by a computablescientist.

3.14 Theorem. DS P rSCPe , emp. texts

Proof. Let g P DS be some discoverable function and let γ0, γ1, γ2, . . . be a fixed recursive enumerationof the primitive real recursive functions. We define function f in Equation 3.1 which tests if a given

67

primitive recursive function γi is the primitive recursive defining function of some number y in aninterval ry1, y2s up to n ´ 1 decimal places. For an integer n and for some i,m P N, f returns 1 ifλn . γipm,nq is in accordance with Definition 3.1, and 0 otherwise.

fpi,m, n, y1, y2q “

$

’

’

’

’

’

’

&

’

’

’

’

’

’

%

1, if n “ 0, γipm, 0q “ 0 and y1 ď 0

1, if n “ 0, γipm, 0q “ 1 and y2 ą 0

1, if n ą 0 and y1 ďγipm,nq10n´1 ď y2

0, otherwise.

(3.1)

Hence, for some fixed i,m P N, λn . γipm,nq is the primitive recursive defining function y P P if andonly if fpi,m, n, y1, y2q “ 1 for all n P N and for all y1, y2 where y1 ď y ď y2. For convenience, we alsodefine function F , the ‘cumulative’ version of f , such that

F pi,m, n, y1, y2q “

$

&

%

1, if fpi,m, k, y1, y2q “ 1 for all k ď n

0, otherwise.(3.2)

Now, let σ be a prefix of an empirical text T for some primitive function g. The objective is toconstruct an empirical scientist which searches for the first primitive function γ that, for every x P Q,λn . γpm,nq is the primitive recursive defining function for gpxq, where m “ qpxq. This step is done bychecking that, for some i P N and for each element σj “ xxj , y1j , y2jy of the prefix σ, F pi,mj , n, y1j , y2jq “

1 for increasing values of n, where mj “ qpxjq.Then, let G be the empirical scientist defined in Algorithm 3.1. We will show that G identifies DS.

Algorithm 3.1: Empirical scientist G identifies the class of discoverable functions in empirical text

1 Scientist Gpσ P ESEGq : N2 variable i, j P N;3 iÐ 0;4 j Ð 0; // j ranges through the elements of σ, where σj “ xxj , y1j , y2jy

5 while j ă |σ| do6 if F pi,mj , |σ|, y1j , y2jq “ 1 then7 j Ð j ` 1;8 else9 iÐ i` 1;

10 j Ð 0;11 end12 end13 return the same index for γi14 end

M searches for the first γi such that, for all σj in σ, F pi,mj , |σ|, y1j , y2jq “ 1. In other words, G findsthe first primitive recursive function γi of type N2 Ñ N that ‘fits’ the prefix σ of empirical text, such that,for all σj “ xxj , y1j , y2jy P content pσq and for all k ď |σ|,

$

’

’

’

&

’

’

’

%

γipqpxjq, 0q “ 0, if k “ 0 and y1j ď 0

γipqpxjq, 0q “ 1, if k “ 0 and y2j ą 0

y1j ďγipqpxjq,kq

10k´1 ď y2j , if k ą 0.

Clearly, if G converges, then it converges to an index for a primitive real recursive function for thediscoverable function g, and consequently identifies g.

68

Suppose then that G does not converge. Since G is defined for every input, it follows that G mustdiverge. Now g is a discoverable function, and so by definition there exists a primitive real recursivefunction γ for g, such that, for all m,n P N, γpm,nq “ dypnq, where y “ gpxq P P and x “ q´1pmq P Q.Thus, γ belongs to the recursive enumeration of the primitive real recursive functions.

Now, since G sequentially tests all the primitive real recursive functions in the enumeration, it willeventually test γ. But γ is a primitive real recursive function for g, and so G will output an index for γand never change its hypothesis for any larger prefix for g. Thus, G must converge and identify g.

While this result may itself be rather trivial given the framework we have established, it nonethelessshows the not obvious fact that such a wide range of functions of scientific laws is empirically identifi-able. Indeed, the scientist G defined in Algorithm 3.1 merely performs the elementary task of searchingfor a particular function within a given list of possibilities, which is quite unlike the approach real-worldscientists take when attempting to discover new scientific laws.

3.3 An example from the history of science: planetary orbits

Before the discoveries of Kepler, the scientific consensus held that planets moved along orbits com-prised of a composition of uniform circular paths. The reasoning behind this theory was simple: circleswere clearly the most perfect geometric shape, and so it must follow the planets and stars move aboutin a perfect, circular paths. Alas, it was apparent to even the ancient Greeks that celestial bodies did infact not move in perfect circles, a fact made plain by the apparent retrograde motion of Mars across thenight sky, shown in Figure 3.3. The solution to this problem was to place each planet on the edge ofrotating circle whose center rotated about another circle. Using this method, scientists from Antiquitywere able to more closely match their orbital model with the perceived motion of each planet.

Figure 3.3: Path of the planet Mars in apparent retrograde motion between June and November of2003. Image modified from the original by Eugene Alvin Villar.

Each of these circles is called an epicycle, with the exception of the centermost circle which is giventhe name of deferent. Informally, the deferent and each epicycle form a geometric model comprised of apoint which rotates about another point with a constant angular velocity. In the case of the deferent, thecentral point is fixed, whereas the central point of each epicycle rotates about the deferent or anotherepicycle. The conjunction of multiple epicycles resulted in astronomical models similar to the one shownin Figure 3.4, which were able to predict the motion of planets to quite an impressive degree.

69

Figure 3.4: The orbits of the Sun, Mercury, and Venus constructed using a deferent and multipleepicycles. In the geocentric model, the center of the deferent coincides with the position of Earth.

Image taken from the Astronomy article in the first edition of Encyclopædia Britannica, from 1771.

In addition to using deferents and epicycles, both the geocentric model of Ptolemy and the heliocen-tric model of Copernicus made use of another tool called the eccentric circle. In practice, the eccentriccircle can be simulated with a pair of epicycles, so we will ignore its use.8 As such, the fundamentaldifferences between both orbital models boiled down to little more than a shift in the frame of reference(see Figure 3.5).

E

P

(a) An orbital system using Ptolemy’s model

S

P

E

(b) An orbital system using Copernicus’model

Figure 3.5: Diagram of orbits of Copernicus and Ptolemy with one epicycle, with the Sun S, the Earth Eand the planet P

3.3.1 Orbits on the complex plane

Since an epicycle or a deferent is simply a representation of a uniform circular motion, we can modelthe position of a point z on the complex plane using Euler’s formula as a function of time t as zptq “re2πippf`1qt`pq, where r P R` is the radius of the circle, f P R is the angular frequency of the point aboutthe center,9 and p P R is the initial position of the point on the circle. Then, assuming the orbit of a planetlies within a plane and since the orbit zP of any planet in the Ptolemaic model is a sum of a deferent and

8Ptolemy’s model also resorted to another geometric tool called the equant. Though the equant cannot be replicated with thecomposition of a finite number of epicycles, its use is not relevant to the topics of this chapter.

9We merely follow historical tradition and the example of Crowe [7] by setting f “ 0 for the unitary angular frequency, asthere is no mathematically significant reason for defining zptq as re2πippf`1qt`pq instead of re2πipft`pq.

70

various epicycles, it can be modeled on the complex plane using a sum of zkptq such that

zP ptq “mÿ

k“0

zkptq,

where zkptq “ rke2πippfk`1qt`pkq. For simplicity, we define the deferent z0 as

z0ptq “ e2πit,

obtained by setting r0 “ 1, f0 “ 0, and p0 “ 0.

3.15 Definition (Ptolemaic orbit). A Ptolemaic orbit is any function zP of t P rt0,`8r, with t0 P R, on thecomplex plane such that

zP ptq “mÿ

k“0

zkptq

for some m P N, where z0ptq “ e2πit and zkptq “ rke2πippfk`1qt`pkq if k ą 0, for rk P R` and fk, pk P R.

Since a Ptolemaic model is essentially a system where one body orbits another, we may think ofCopernicus’ model as the composition of two Ptolemaic models centered on the Sun: one model ze forthe Earth and another zp for the observed planet. The position of the planet with respect to the Earth istherefore given by zp ´ ze, where

zpptq “mÿ

k“0

zpkptq and zeptq “nÿ

k“0

zekptq.

As before, we may define the deferent ze0 for the Earth as

ze0ptq “ e2πit.

3.16 Definition (Copernican orbit). A Copernican orbit is any function zC of t P rt0,`8r, with t0 P R, onthe complex plane such that

zCptq “ zpptq ´ zeptq “mÿ

k“0

zpkptq ´nÿ

j“0

zej ptq

for some m,n P N, where ze0ptq “ e2πit, zpkptq “ rke2πippfk`1qt`pkq for k ě 0 and zej ptq “ rje

2πippfj`1qt`pjq

for j ą 0, where rk, rj P R` and fk, fj , pk, pj P R.

Despite what common conceptions of the Copernican revolution might imply, Ptolemaic and Coper-nican orbits defined in this way are actually equivalent. Indeed, given an appropriate change in variable,both kinds of orbits can be written as a sum of epicycles.

3.17 Lemma. A function is a Ptolemaic orbit if and only if it is a Copernican orbit.

Proof. The proof of this lemma is given in Section 5.5 of the Appendix.

Given a suitable parametrization of Copernican orbits (for example, considering both the real andimaginary part of zCptq as separate functions), it is not difficult to show that these are discoverablefunctions. It is therefore not unreasonable to expect that the empirical scientist G in Algorithm 3.1, uponreceiving the position of a planet as an empirical text, could conjecture a hypothesis for a Copernicanor Ptolemaic orbit before converging to the correct theory, mimicking the process taken by astronomersthroughout history.

71

It is certainly surprising to note that, despite the scientific breakthrough that was the CopernicanRevolution, a scientist that identifies only Copernican orbits can account for no more phenomena thanone that identifies only Ptolemaic orbits. Indeed, Copernicus’ contributions were more significant inpaving the way for the discoveries of Johannes Kepler than in producing more precise predictions —Copernicus’ model was actually often worse than Ptolemy’s since it was based off of incorrect measure-ments.

The strength of the use of epicycles in describing the real-world orbits of celestial bodies is that thecomposition of multiple epicycles can closely approximate almost any closed path on the plane to anarbitrary degree. In a sense, Copernican and Ptolemaic orbits can be used as a sort of Fourier series forplanetary orbits, which is why they were such an effective predictive tool.

Figure 3.6: A triangle-like orbit obtained using a deferent and a single epicycle. A closer approximationis made possible through the addition of more epicycles. For an interactive, online demonstration of

Ptolemaic orbits that can be built using up to three epicycles, see Lobão [17].

The American philosopher of science Norwood Russell Hanson showed that these planetary modelscan approximate any “reasonable” orbit on a plane to an arbitrary precision with a finite number ofepicycles. Indeed, Hanson [11] showed that it is possible to construct a orbit that resembles a triangle(see Figure 3.6), a square, or even one that closely matches the elliptic orbit of a body in accordance withKepler’s laws using a finite sum of epicycles.

3.18 Theorem (Hanson [11]). For any continuous, bounded, and periodic function fptq on the complex plane,and for any ε ą 0, there is a finite sum Smptq “

řmk“0 zkptq of epicycles such that, for all t P R, |fptq´Smptq| ă ε.

A paper by Ginnobili and Carman [9] expanded on these examples, illustrating an exceedingly in-tricate orbit comprised of one thousand epicycles, which traces out the outline of the fictional characterHomer Simpson, illustrated in Figure 3.7.10,11

However, despite being able to model such complex orbits as the one in Figure 3.7, neither Coper-nican nor Ptolemaic orbits ever exactly match the orbit of a body moving according to Kepler’s laws(aside from the trivial example of a constant circular motion). Indeed, for a Copernican orbit to do so,we would require an infinite number of epicycles.

The distinction is an important one, since if a Keplerian orbit (here taken to mean a path on thecomplex plane which obeys Kepler’s laws of motion) is not also a Copernican orbit, then a scientist

10A video animation of this orbit can be found at https://www.youtube.com/watch?v=QVuU2YCwHjw.11Allowing for orbits which were never experimentally observed — such as triangular orbits — was a common criticism of the

planetary models of Ptolemy and Copernicus.

72

https://www.youtube.com/watch?v=QVuU2YCwHjw

Figure 3.7: An approximation of a drawing of Homer Simpson using 1000 epicycles, constructed byChristián Carman and Ramiro Serra [9]

who only identifies Copernican orbits in empirical texts cannot identify a Keplerian orbit.12 Indeed,such a scientist will never converge on a single hypothesis and will instead continue to change its mindforever, outputting indexes for Copernican orbits that are successively closer to a Keplerian orbit.

This situation illustrates how identification in the limit may in some cases be an unnecessarily strongcondition in practice. In this case, a scientist which only identifies Copernican orbits can provide hy-potheses which are ‘good enough’ and which only diverge from a Keplerian orbit by an arbitrarily smallamount, but which nonetheless cannot be said to identify it since it will never converge to a single func-tion.

12This is not the case with scientist G defined in Algorithm 3.1, which identifies all discoverable functions, including Keplerianorbits.

73

Chapter 4

Conclusion

The results presented in Chapter 2 present a concise overview of the classes of sets and functionswhich can be identified ‘in the limit.’ For a scientist for functions without any restrictions such as mem-ory limitation or computability, it is possible to identify any recursive function in finite time. If weadditionally require that scientists be computable, then the collection of classes of functions and of setsis strictly smaller.

However, these results are merely the theoretical limits or upper bounds to scientific identification.Indeed, aside from imperfect text, there are no considerations made about any kind of error (whetherthey be systematic or random) that are inherent to scientific observations. Chapter 3 therefore presentsan adaptation of the concept of scientist in order to deal with errors of the sort usually encounteredin experiments. It is worth pointing out that, while the concepts for empirical texts and primitive realrecursive functions were studied as the relation between an independent and a dependent variable, theresults are also applicable to any other function or scientific law where variables can be isolated fromone another.

The applicability of the results of empirical identification presented in Chapter 3 is also heavily de-pendent on several assumptions which arguably do not reflect real-world situations. Empirical text inparticular can be the object of many valid criticisms. For example, it may not be the case that errorsin measurements necessarily converge to zero, either because of practical issues such as systematic er-rors (thereby offsetting the convergence to a non-zero number), or because of theoretical issues such asan impossibility to repeat an experiment an unlimited number of times (which might arguably impedeconvergence altogether). It can also be argued that it is in general not possible to specify a discrete errorbound for measurements, requiring that errors be considered as a random variable instead.

The validity of these proofs in practice is therefore entirely dependent on the validity of the assump-tions made about the process of scientific discovery. However, in the case that these assumptions aresound and limiting our scope of analysis to scientific laws that can be given by discoverable functions,we may conclude that the empirical identification of scientific laws is possible in the limit.

Certain topics were also only mentioned in passing in this thesis and might be worth exploring onin future works. For example, it is not clear how — or even if — we can extend the concept of empiricalscientists to areas of science that conjecture theories which are not expressed mathematically, such asthe theory of genetics or evolution. On the other hand, it is possible to continue improving the conceptof empirical identification. One suggestion would be to construct a kind of empirical text that betterreflects the randomness in error measurements, and another would be to devise a suitable definitionfor discoverable functions based on computable numbers instead of the primitive recursive numbers,thereby enriching the universe of functions that may be studied by empirical scientists.

75

Chapter 5

Additional proofs

5.1 An explicit bijection from N to Q

Let z´1 : NÑ Z be the decoding of Z such that

z´1pnq “

$

&

%

0 if n “ 0

p´1qntn`12 u if n ą 0.

For successive values of n, the sequence is 0,´1, 1,´2, 2, . . .

For a positive integer n, let n “ pa11 ¨ pa22 ¨ ¨ ¨ ¨ ¨ pakk be the prime factorization of n, where pi are primesand ai are positive integers. We now specify a function ρ defined for N` such that

ρpnq “

$

&

%

1 if n “ 1

pz´1

pa1q1 ¨ p

z´1pa2q

2 ¨ ¨ ¨ ¨ ¨ pz´1

pakqk if n ą 1.

For successive values of n, ρ takes the values

1, 1{2, 1{3, 2, 1{5, 1{6, 1{7, 1{4, 3, 1{10, 1{11, 2{3, 1{13, 1{14, 1{15, 4, 1{17, 3{2, 1{19, 2{5, 1{21, 1{22, 1{23, 1{12, 5, 1{26, 1{9, . . .

Now, ρ is a bijection from N` to Q`. To show this, consider any positive rational r “ st , where s and

t are coprime. Now, either s “ 1 or t “ 1 (or both), or the prime factorizations of s and t are given bysome s “ pa11 ¨ pa22 ¨ ¨ ¨ ¨ ¨ p

ajj and t “ p

aj`1

j`1 ¨ paj`2

j`2 ¨ ¨ ¨ ¨ ¨ pakk where pi are all different primes (since s and t

are coprime) and ai are positive integers. Then,

r “ pa11 ¨ pa22 ¨ . . . pajj ¨ p

áj`1

j`1 ¨ páj`2

j`2 ¨ . . . pákk ,

with possibly j “ 0 if s “ 1 and k “ j if t “ 1.If r “ 1 then the inverse image of r in ρ is also 1. Otherwise, the inverse image of r is

ρ´1prq “ pzpa1q1 ¨ p

zpa2q2 ¨ . . . p

zpajqj ¨ p

zpáj`1q

j`1 ¨ pzpáj`2q

j`2 ¨ . . . pzpákqk ,

where z is the inverse of z´1. Since zpnq is a non-negative integer for any integer n, ρ´1prq is a positiveinteger.

77

Finally, we define a decoding q´1 : NÑ Q of Q based on ρ such that

q´1pnq “

$

&

%

0 if n “ 0

p´1qnρ`

tn`12 u

˘

if n ą 0.

For successive values of n, q´1 takes the values

0,´1, 1,´1{2, 1{2,´1{3, 1{3,´2, 2,´1{5, 1{5,´1{6, 1{6,´1{7, 1{7,´1{4, 1{4,´3, 3,´1{10, 1{10,´1{11, 1{11,´2{3, . . .

The encoding of Q is q : QÑ N such that

qpnq “

$

’

’

’

&

’

’

’

%

0 if n “ 0

2ρ´1pnq if n ą 0

2ρ´1pnq ´ 1 if n ă 0.

5.2 Locking sequence theorems for functions and for other learning

environments

We give the proof of Theorem 2.8 below, which is just a slight variation of Theorem 2.7.

5.1 Theorem. Let M be a scientist for functions that identifies the function f and σ P SEG a prefix for f . Thenthere exists a nonempty prefix content pτq Ď Ψpfq such that σ ˛ τ is a locking sequence for M on f .

Proof. Let M be a scientist that identifies the function f PR. Without loss of generality, assume that Mis defined for all σ P SEG. Now, suppose that there exists σ P SEG such that, for all τ P SEG, no lockingsequence σ ˛ τ exists. This implies that

there exists σ P SEG such that, for all τ P SEG such that content pσ ˛ τq Ď Ψpfq and φMpσ˛τq “ f ,

there exists some ρ P SEG such that content pρq Ď Ψpfq and Mpσ ˛ τ ˛ ρq ‰Mpσ ˛ τq.(5.1)

We will show that this implies the existence of a text T for f which M does not identify.Let U “ u0, u1, u2, . . . be a text for f , where each ui “ xxi, yiy. We construct text T using the prefixes

τ i for each i P N. We begin with τ0 “ σ and build each τ i`1 using Algorithm 5.1

Algorithm 5.1: Construction of prefix τ i`1 of the text T for a function f

Data: prefix τ i of text TResult: prefix τ i`1 of text T

1 if φMpσiq ‰ f then2 τ i`1 Ð τ i ˛ ui;3 else4 choose ρ P SEG such that content pρq Ď Ψpfq and Mpτ i ˛ ρq ‰Mpτ iq; // ρ must exist by


τ i˘

Ď Ψpfq5 τ i`1 Ð τ i ˛ τ ˛ ui;6 end


Ť

i τi.

Now, content pT q Ď Ψpfq by induction, since content`

τ0˘

“ σ Ď Ψpfq and both content pρq Ď Ψpfq

and un P Ψpfq for all n P N, so that content`

τ i`1˘

Ď Ψpfq. Additionally, adding un to each step of the

78

construction of T ensures that Ψpfq Ď content pT q, and so T is a text for Ψpfq. However, M does notconverge on T to an index for Ψpfq, since for each prefix τ i`1 either φMpτ iq ‰ f or Mpτ i ˛ ρq ‰ Mpτ iq

for some ρ P SEG with τ i ˛ ρ Ď τ i`1.Thus, there are infinitely many prefixes such that either M returns an incorrect index for f or M has

infinitely many mind changes from two different prefixes, and so M does not identify f in text T .

The proof for the locking sequence theorem for sets in imperfect text is given below. The case of thetheorem for functions is obtained by simple adjustments.

5.2 Theorem. Let M be a scientist for sets that identifies the set S in imperfect text, and let σ P SEQ be aprefix of imperfect text for S. Then, for every finite D Ă N where content pσq Ď D, there exists a nonemptyprefix τ where content pτq Ď S Y D such that σ ˛ τ is a locking sequence in imperfect text for M on S, i.e.(a) content pσ ˛ τq Ď S Y D, (b) WMpσ˛τq “ S, and (c) for all ρ P SEQ, if content pρq Ď S Y D, thenMpσ ˛ τ ˛ ρq “Mpσ ˛ τq.

Proof. Let M be a scientist that identifies the set S P E in imperfect text. Without loss of generality,assume that M is defined for all σ P SEQ. Now, suppose that no locking sequence σ in imperfect textexists. This implies that

there exists some D Ă N and σ P SEQ such that, for all τ P SEQ such that

content pσ ˛ τq Ď S YD and WMpσ˛τq “ S, there exists some ρ P SEQ

such that content pρq Ď S YD and Mpσ ˛ τ ˛ ρq ‰Mpσ ˛ τq.

(5.2)

We will show that this implies the existence of an imperfect text T for S which M does not identify.Let D be a set in the conditions of Equation 5.2 and U “ u0, u1, u2, . . . be an imperfect text for S such

that content pUq “ S Y D. We construct the imperfect text T using the prefixes τ i for each i P N. Webegin with τ0 “ σ and build each τ i`1 using Algorithm 5.2.

Algorithm 5.2: Construction of prefix τ i`1 of the imperfect text T for a set S

Data: prefix τ i of imperfect text TResult: prefix τ i`1 of imperfect text T

1 if WMpτ iq ‰ S then2 τ i`1 Ð τ i ˛ ui;3 else4 choose ρ P SEQ such that content pρq Ď S YD and Mpτ i ˛ ρq ‰Mpτ iq; // ρ must exist by


τ i˘

Ď S YD5 τ i`1 Ð τ i ˛ τ ˛ ui;6 end

Observe that every new prefix is built by adding a one or more elements to the end of the previousprefix, such that τ i Ă τ i`1 for all i P N. Then, let T “

Ť

i τi.

Now, content pT q Ď SYD by induction, since content`

τ0˘

“ σ Ď SYD and both content pρq Ď SYD

and un P S YD for all n P N, so that content`

τ i`1˘

Ď S YD. Additionally, adding un to each step of theconstruction of T ensures that SYD Ď content pT q, and so T is an imperfect text for S. However, M doesnot converge on T to an index for S, since for each prefix τ i`1 either WMpτ iq ‰ S or Mpτ i ˛ ρq ‰Mpτ iq

for some ρ P SEQ with τ i ˛ ρ Ď τ i`1.Thus, there are infinitely many prefixes such that either M returns an incorrect index for S or M has

infinitely many mind changes from two different prefixes, and so M does not identify S in imperfecttext T .

79

The demonstration of the theorem for informants is also similar, with the proof for sets given belowand the proof for functions being identical but with minor modifications.

5.3 Theorem. Let M be a scientist for sets that identifies the set S in informant, and let σ P ISEQ be a prefix ofinformant for S. Then there exists a nonempty prefix τ where content pτq Ď ΨpχSq such that σ ˛ τ is a lockingsequence in informant for M on S, i.e. (a) content pσ ˛ τq Ď ΨpχSq, (b) WMpσ˛τq “ S, and (c) for all ρ P ISEQ,if content pρq Ď ΨpχSq, then Mpσ ˛ τ ˛ ρq “Mpσ ˛ τq.

Proof. Let M be a scientist that identifies the set S P E in informant. Without loss of generality, assumethat M is defined for all σ P ISEQ. Now, suppose that there exists σ P ISEQ such that, for all τ P ISEQ, nolocking sequence σ ˛ τ exists. This implies that

there exists σ P ISEQ such that, for all τ P ISEQ such that content pσ ˛ τq Ď ΨpχSq and WMpσ˛τq “ S,

there exists some ρ P ISEQ such that content pρq Ď ΨpχSq and Mpσ ˛ τ ˛ ρq ‰Mpσ ˛ τq.

(5.3)

We will show that this implies the existence of a informant T for S which M does not identify.Let U “ u0, u1, u2, . . . be an informant for S, where each ui “ xxxi, yiy, by and b P t0, 1u. We construct

informant T using the prefixes τ i for each i P N. We begin with τ0 “ σ and build each τ i`1 usingAlgorithm 5.3.

Algorithm 5.3: Construction of prefix τ i`1 of the informant T for a set S

Data: prefix τ i of informant TResult: prefix τ i`1 of informant T

1 if WMpσiq ‰ S then2 τ i`1 Ð τ i ˛ ui;3 else4 choose ρ P SEQ such that content pρq Ď S and Mpτ i ˛ ρq ‰Mpτ iq; // ρ must exist by


τ i˘

Ď S5 τ i`1 Ð τ i ˛ τ ˛ ui;6 end


Ť

i τi.

Now, content pT q Ď ΨpχSq by induction, since content`

τ0˘

“ σ Ď S and both content pρq Ď ΨpχSq

and un P ΨpχSq for all n P N, so that content`

τ i`1˘

Ď ΨpχSq. Additionally, adding un to each step ofthe construction of T ensures that ΨpχSq Ď content pT q, and so T is a informant for S. However, M doesnot converge on T to an index for S, since for each prefix τ i`1 either WMpτ iq ‰ S or Mpτ i ˛ ρq ‰Mpτ iq

for some ρ P SEQ with τ i ˛ ρ Ď τ i`1.Thus, there are infinitely many prefixes such that either M returns an incorrect index for S or M has

infinitely many mind changes from two different prefixes, and so M does not identify S in informantT .

5.3 A nondecidable, recursively enumerable set

Recall definition 2.62 of the set K.

5.4 Lemma (Rogers [25, p. 62]). K is a recursively enumerable set but not a decidable set.

80

Proof. Consider the function ψ such that

ψpxq “

$

&

%

1, if φxpxq is defined


The function ψ is partial recursive and K is its domain. Hence, K is a recursively enumerable set.Now, assume that K is also a decidable set. Then its complement K is recursively enumerable and

so there exists a partial recursive function φm with index m such that the domain of φm is K “Wm.Then, m P K ô m P Wm by the definition of K, but m P K ô m P Wm by choice of m. This is a

contradiction, and so K cannot decidable.

5.4 Properties of set representations

The set representation of a set of ordered pairs is the set of natural numbers xx, yy that represent theordered pairs px, yq, obtained through the Cantor pairing function described in Definition 1.12. If a setis the set representation of the graph of a function, then it is single-valued. Additionally, if it represents atotal function, then it is a total set.

5.5 Definition (Single-valued set). If S Ď N is the set representation of the graph of a function f , then Sis single-valued, and we say that S is the set representation of f . That is to say, S is single-valued if, forall xx, yy P S, if xx, zy P S, then y “ z.

5.6 Definition (Total set). If S Ď N is a single-valued set that is the set representation of a total function,then S is total. That is, S is total if, for all x P N, there is some y P N such that xx, yy P S.

When S Ď N is the set representation of a partial recursive function then it has certain properties.

5.7 Proposition (Rogers [25, chapter 5, theorem IX]). (i) Let f be a total function and Ψpfq its set repre-sentation. Then

f PRô Ψpfq P D ô Ψpfq P E.

(ii) Let φ be a partial function and Ψpφq its set representation. Then

φ P P ô Ψpφq P E.

5.8 Corollary. If f is a recursive function, then its set representation Ψpfq is total and recursive.

5.5 Equivalence of Ptolemaic and Copernican orbits

We show that a function is a Ptolemaic orbit if and only if it is a Copernican orbit.Let zP be a Ptolemaic orbit such that

zP ptq “mÿ

k“0

zkptq

for some m P N, where z0ptq “ e2πit and zkptq “ rke2πippfk`1qt`pkq if k ą 0, for rk P R`, fk, pk P R. Now,

z0ptq “ e2πit “ éπip2t`1q. Performing a change in variable t “ s` 12 , we obtain

z0ptq “ éπip2ps` 1

2 q`1q “ éπip2s`2q “ é2πis “ ´z0psq.

81

Similarly, for each zkptqwhere k ą 0 we have

zkptq “ rke2πippfk`1qt`pkq

“ ´rke2πippfk`1qt`pk`

12 q

“ ´rke2πippfk`1qps` 1

2 q`pk`12 q

“ ´rke2πippfk`1qs`pk`

fk2 `1q

“ ´rke2πippfk`1qs`pk`

fk2 q

“ ´rk1e2πippfk1`1qs`pk1 q

“ ´zk1psq

where rk1 “ rk, fk1 “ fk, and pk1 “ pk `fk2 .

Then, zP is of the formm1ÿ

k1“0

zpk1psq ń1ÿ

j1“0

zej1psq,

where m1 “ 0, n1 “ m, s0 “ t0 ´12 , rj1 “ rk, fj1 “ fk, and pj1 “ pk `

fk2 .

Thus, every Ptolemaic orbit is a Copernican orbit.Conversely, let us consider a Copernican orbit zC such that

zCptq “mÿ

k“0

zpkptq ńÿ

j“0

zej ptq

for some m,n P N, where ze0ptq “ e2πit, zpkptq “ rke2πippfk`1qt`pkq for k ě 0 and zej ptq “ rje

2πippfj`1qt`pjq

for j ą 0, where rk, rj P R` and fk, fj , pk, pj P R.Now, ´ze0ptq “ é2πit “ eπip2t`1q. Performing a change in variable t “ s` 1

2 , we obtain

´ze0ptq “ eπip2ps`12 q`1q “ eπip2s`2q “ e2πis “ z0psq.

Again, for each zej ptqwhere j ą 0 we have

´zej ptq “ ´reje

2πippfej`1qt`pjq

“ rje2πippfj`1qt`pj`

12 q

“ rje2πippfj`1qps` 1

2 q`pj`12 q

“ rje2πippfj`1qs`pj`

fj2 `1q

“ rje2πippfj`1qs`pj`

fj2 q

“ rj1e2πippfj1`1qs`pj1 q

“ zj1psq

where rj1 “ rj , fj1 “ fj , and pj1 “ pj `fj2 .

Then, zC is of the formnÿ

j1“0

zj1psq `mÿ

k1“0

zk1psq,

where s0 “ t0 ´12 , rj1 “ rj , fj1 “ fj , pj1 “ pj `

fj2 , rk1 “ rk, fk1 “ fk, and pk1 “ pk.

Thus, every Copernican orbit is a Ptolemaic orbit.

82

Bibliography

[1] Angluin, Dana. Inductive inference of formal languages from positive data. Information and Control,45(2):117–135, 1980.

[2] Blum, Lenore and Blum, Manuel. Toward a mathematical theory of inductive inference. Informationand Control, 28(2):125–155, 1975.

[3] Case, John. Algorithmic Scientific Inference: Within Our Computable Expected Reality. Interna-tional Journal of Unconventional Computing, 8(3):192–206, 2012.

[4] Case, John and Lynes, Christopher. Machine inductive inference and language identification. InProceedings of the 9th International Colloquium on Automata, Languages and Programming, volume 140,pages 107–115. Springer Berlin Heidelberg, 1982.

[5] Case, John and Smith, Carl. Anomaly hierarchies of mechanized inductive inference. In Proceedingsof the tenth annual ACM symposium on Theory of computing, pages 314–319. ACM, 1978.

[6] Case, John and Smith, Carl. Comparison of identification criteria for machine inductive inference.Theoretical Computer Science, 25(2):193–220, 1983.

[7] Crowe, Michael J. Theories of the World from Antiquity to the Copernican Revolution. Dover Publica-tions, Mineola, New York, second edition, 2001. First edition published in 1990.

[8] Cutland, Nigel J. Computability: An introduction to recursive function theory. Cambridge UniversityPress, 1980.

[9] Ginnobili, Santiago and Carman, Christián C. Deferentes, epiciclos y adaptaciones. In Martins,Roberto de Andrade, Silva, Cibelle Celestino, Ferreira, Juliana Mesquita Hidalgo, and Martins,Lilian Al-Chueyr Pereira, editors, Filosofia e História da Ciência no Cone Sul: Seleção de Trabalhos do5o Encontro, pages 399–408. Campinas: Associação de Filosofia e História da Ciência do Cone Sul(AFHIC), 2008.

[10] Gold, E. Mark. Language identification in the limit. Information and Control, 10:447–474, 1967.

[11] Hanson, Norwood Russell. Constellations and Conjectures. D. Reidel Publishing Company, 1973.

[12] Jain, Sanjay, Osherson, Daniel, Royer, James S., and Sharma, Arun. Systems That Learn: An Introduc-tion to Learning Theory. MIT Press, Cambridge, Massachusetts, second edition, 1999.

[13] Johnson, Kent. Gold’s Theorem and Cognitive Science. Philosophy of Science, 71:571–592, 2004.

[14] Kemeny, John G. A Philosopher Looks at Science. D. Van Nostrand Company, 1959.

[15] Kononenko, Igor and Kukar, Matjaž. Machine Learning and Data Mining: Introduction to Principlesand Algorithms. Horwood Publishing, 2007.

83

[16] Langley, Pat, Simon, Herbert A., Bradshaw, Gary L., and Zytkow, Jan M. Scientific Discovery: Com-putational Explorations of the Creative Process. MIT Press, Cambridge, Massachusetts, 1987.

[17] Lobão, Martim Cortez de. Orbits with epicycles on a deferent, Published May 6, 2014. http:

//demonstrations.wolfram.com/OrbitsWithEpicyclesOnADeferent/.

[18] Marcus, Gary F. Negative evidence in language acquisition. Cognition, 46(1):53–85, 1993.

[19] Meyer, Albert R. and Ritchie, Dennis M. The complexity of loop programs. In Proceedings of the22nd national ACM conference, pages 465–469. ACM, 1967.

[20] Munkres, James. Topology. Prentice Hall, second edition, 2000. First edition published in 1975.

[21] Osherson, Daniel N., Stob, Michael, and Weinstein, Scott. Systems That Learn: An Introduction toLearning Theory for Cognitive and Computer Scientists. MIT Press, Cambridge, Massachusetts, 1986.

[22] Péter, Rózsa. Recursive Functions in Computer Theory. Number 9 in Computers and Their Applica-tions. Ellis Horwood Publishers, 1981.

[23] Popper, Karl R. The Logic of Scientific Discovery. Routledge Classics, 2002. First edition published in1935 by Verlag von Julius Springer.

[24] Rogers, Hartley, Jr. Gödel Numberings of Partial Recursive Functions. The Journal of Symbolic Logic,23(3):331–341, 1958.

[25] Rogers, Hartley, Jr. Theory of Recursive Funtions and Effective Computability. MIT Press, third edition,1987. First edition published in 1967 by McGraw Hill.

[26] Szudzik, Matthew P. The Computable Universe Hypothesis. A Computable Universe, pages 479–523,2013. arXiv:1003.5831v6.

[27] Wexler, Kenneth and Culicover, Peter W. Formal Principles of Language Acquisition. MIT Press,Cambridge, Massachusetts, 1980.

84

http://demonstrations.wolfram.com/OrbitsWithEpicyclesOnADeferent/

http://demonstrations.wolfram.com/OrbitsWithEpicyclesOnADeferent/

http://arxiv.org/abs/1003.5831v6

Index

˛, 16#, 14σ̂, 17σ, 16σrns, 16σ´, 16σlast, 16σn, 16

AEZ, 46Angluin

condition, 28set, 28theorem, 28

BACON, 61

Cantor pairing function, 9COFIN, 33collection, 22computable number, 62content, 15convergence

of a sequence of sets, 32to an index, 18

D, 8decoding, 9deferent, 69domain, 12dx, 62

E , 7encoding, 9endless loop, see infinite loopepicycle, 69EQi, 9ESEG, 66EX, 42

F , 6fat text, 20

fat text, 20FIN, 29finite variant, 20FIN`, 31, 32fixed point, 51function

Ackermann, 8almost everywhere zero, 46characteristic, 40computable, 6decoded, 10discoverable, 64distinguishable, 64encoded, 10natural number, 6noncomputable, 6partial recursive, 6primitive real recursive, 63primitive recursive, 7primitive recursive defining, 62recursive, 6self-defining, 46total recursive, 6

halting problem, 8hypothesis, 14

identification, 18, 21empirical, 67

imperfect text, 20imp. text, 20index

ν, 12discoverable function, 65for a function, 12for a set, 12primitive real recursive function, 64same, 19

indexing, 12infinite loop, see endless loop

85

infinite variant class, 37informant, 20informant, 20ISEG, 20ISEQ, 20

K, 14, 42K˚, 42

LA, 55learning environment, 14, 20limit set, 32LL, 53, 56locking sequence, 23, 27

for fat text, 23for imperfect text, 27for informant, 27for text, 23in imperfect text, 27, 79in informant, 27

memory function, 51fixed point, 51

Mi, 38µ, 51MULT, 38

Ni, 34

ω, 51orbit

Copernican, 71Ptolemaic, 71

P, 62P , 6PMULT, 39POLY, 40PR, 7PREF, 16prefix, 16

compatible, 17primitive recursive number, 62Probably Approximately Correct, 61programming system, 12

acceptable, 12Ψ, 15

q, 10, 78

R, 6representation, 9

S, 17S, 7school, 17scientific law, 61scientist, 17

computable, 41empirical, 67exact, 18Markov, 51memory-limited, 50total, 42

SD, 46rS1, es, 22SEG, 16SEQ, 16sequence, 14set

decidable, 8listable, 7natural number, 7recursive, 8recursively enumerable, 7representation, 15single-valued, 81total, 81

Sf , 17

text, 15canonical, 45empirical, 66for a function, 15for a set, 15

text, 15T pnq, 15T rns, 15Turing machine, 6TXTEX, 42

W νp , 12

χC , 40χFIN, 40χNi

, 40

z, 10

86

Identifying Empirical Laws - fenix.tecnico.ulisboa.pt · Martim Maria Mathias Cortez de Lobão...

Documents

Transcript of Identifying Empirical Laws - fenix.tecnico.ulisboa.pt · Martim Maria Mathias Cortez de Lobão...