Lifelong Machine Learning and Reasoning

Intelligent Information Technology Research Lab, Acadia University, Canada1

Lifelong Machine Learning and Reasoning

Daniel L. SilverAcadia University,

Wolfville, NS, Canada

Intelligent Information Technology Research Lab, Acadia University, Canada

Talk Outline

Position and Motivation Lifelong Machine Learning Deep Learning Architectures Neural Symbolic Integration Learning to Reason Summary and Recommendations

2

Team Silver


Position

It is now appropriate to seriously consider the nature of systems that learn and reason over a lifetime

Advocate a systems approach in the context of an agent that can: Acquire new knowledge through learning Retain and consolidate that knowledge Use it in future learning, reasoning

and other aspects of AI

3


Moving Beyond Learning Algorithms - Rationale

1. Strong foundation in prior work

2. Inductive bias is essential to learning (Mitchell, Utgoff 1983; Wolpert 1996)

Learning systems should retain and use prior knowledge as a source for shifting inductive bias

Many real-world problems are non-stationary; have drift

4



3. Practical Agents/Robots Require LML Advances in autonomous robotics and intelligent

agents that run on the web or in mobile devices present opportunities for employing LML systems.

The ability to retain and use learned knowledge is very attractive to the researchers designing these systems.

5



4. Increasing Capacity of Computers Advances in modern computers provide the

computational power for implementing and testing practical LML systems

IBMs Watson (2011) 90 IBM Power-7 servers Each with four 8-core processors 15 TB (220M text pages) of RAM Tasks divided into thousands of stand-alone

jobs distributed among 80 teraflops (1 trillion ops/sec)

6



5. Theoretical advances in AI: ML KR “The acquisition, representation and transfer of

domain knowledge are the key scientific concerns that arise in lifelong learning.” (Thrun 1997)

KR plays an important a role in LML Interaction between knowledge retention & transfer

LML has the potential to make advances on the learning of common background knowledge

Leads to questions about learning to reason

7


Lifelong Machine Learning

8

1994 2013

My first biological learning system


9

Considers systems that can learn many tasks over a lifetime from one or more domains

Concerned with methods of retaining and using learned knowledge to improve the effectiveness and efficiency of future learning

We investigate systems that must learn: From impoverished training sets For diverse domains of tasks Where practice of the same task happens

Applications: Agents, Robotics, Data Mining, User Modeling

Lifelong Machine Learning


Lifelong Machine Learning Framework

Instance Space X

TrainingExamples

TestingExamples

(xi, y =f(xi))

Model ofClassifier

h

Inductive Learning Systemshort-term memory

Prediction/Action = h(x)

Universal Knowledge

Retention

DomainKnowledge

InductiveBias, BD

KnowledgeSelection

KnowledgeTransfer

S


Essential Ingredients of LML

The retention (or consolidation) of learned task knowledge Knowledge Representation perspective Effective and Efficient Retention

Resists the accumulation of erroneous knowledge Maintains or improves model performance Mitigates redundant representation Allows the practice of tasks

11



The selective transfer of prior knowledge when learning new tasks Machine Learning perspective More Effective and Efficient Learning

More rapidly produce models That perform better

Selection of appropriate inductive bias to guide search

12



A systems approach Ensures the effective and efficient interaction of

the retention and transfer components Much to be learned from the writings of early

cognitive scientists, AI researchers and neuroscientists such as Albus, Holland, Newel, Langley, Johnson-Laird and Minsky

13


Overview of LML Work

Supervised Learning Unsupervised Learning Hybrids (semi-supervised, self-taught, co-

training, etc) Reinforcement Learning

Mark Ring, Rich Sutton, Tanaka and Yamamura

14


Supervised LML

Michalski (1980s) Constructive inductive learning

Utgoff and Mitchell (1983) Importance of inductive bias to learning - systems should be able

to search for an appropriate inductive bias using prior knowledge

Solomonoff (1989) Incremental learning

Thrun and Mitchell (1990s) Explanation-based neural networks (EBNN) Lifelong Learning

15


x1 xnc1 ck

Task Context Standard Inputs

Long-termConsolidated

Domain Knowledge

Network

f1(c,x)

Short-termLearningNetwork Representational

transfer from CDKfor rapid learning

f’(c,x)

LML via context sensitve csMTL

One outputfor all tasks

Silver, Poirier, Currie (also Tu, Fowler)Inductive transfer with context-sensitive neural networks MMach Learn (2008) 73: 313–336

Task Rehearsal Functional transfer (virtual examples) forslow consolidation


An Environmental Example

Stream flow rate prediction [Lisa Gaudette, 2006]

x = weather data

f(x) = flow rate

11

12

13

14

15

16

0 1 2 3 4 5 6Years of Data Transfered

MA

E (

m^

3/s)

No Transfer Wilmot Sharpe Sharpe & Wilmot Shubenacadie


csMTL and Tasks with Multiple Outputs

Liangliang Tu (2010) Image Morphing:

Inductive transfer between tasks that have multiple outputs

Transforms 30x30 grey scale images using inductive transfer

Three mapping tasks

19

NA NH NS



20



21Demo


Two more Morphed Images

Passport Angry Filtered

Passport Sad Filtered

22


Unsupervised LML

Deep Learning Architectures

Consider the problem of trying to classify these hand-written digits.

Hinton, G. E., Osindero, S. and Teh, Y. (2006)A fast learning algorithm for deep belief nets.Neural Computation 18, pp 1527-1554.

Layered networks of unsupervised auto-encoders efficiently develop hierarchies of features that capture regularities in their respective inputs



2000 top-level artificial neurons

0500 neurons

(higher level features)

500 neurons(low level features)

Images of digits 0-9

(28 x 28 pixels)

1 2 3 4

5 6 7 8 9

DLA Neural Network:- Unsupervised training, followed

by back-fitting - 40,000 examples - Learns to: * recognize digits using labels * reconstruct digits given a label- Stochastic in nature

2

3

1



25Courtesy of http://youqianhaozhe.com/research.htm

Develop common features from unlabelled examples using unsupervised algorithms



Andrew Ng’s work on Deep Learning Networks (ICML-2012) Problem: Learn to recognize human

faces, cats, etc from unlabeled data Dataset of 10 million images; each

image has 200x200 pixels 9-layered locally connected neural

network (1B connections) Parallel algorithm; 1,000 machines

(16,000 cores) for three days

26

Building High-level Features Using Large Scale Unsupervised LearningQuoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. NgICML 2012: 29th International Conference on Machine Learning, Edinburgh, Scotland, June, 2012.



Results: A face detector that is 81.7%

accurate Robust to translation, scaling,

and rotation

Further results: 15.8% accuracy in recognizing

20,000 object categories from ImageNet

70% relative improvement over the previous state-of-the-art.

27



Stimulates new ideas about how knowledge of the world is learned, consolidated, and then used for future learning and reasoning

Learning and representation of common background knowledge

Important to Big AI problem solving

28


LMLR: Learning to Reason

ML KR … a very interesting area Knowledge consolidation provides insights into how best

to represent common knowledge for use in learning and reasoning

A survey of learning / reasoning paradigms has identified two additional promising bodies of work: NSI - Neural-Symbolic Integration L2R - Learning to Reason

29


Neural-Symbolic Integration

Considers hybrid systems that integrate neural networks and symbolic logic

Takes advantage of: Learning capacity of

connectionist networks Transparency and

reasoning capacity of logic

[Garcez09,Lamb08]

30



31

An integrated framework for NSI and LML Adapted from [Bader and Hitzler, 2005]



32

An integrated framework for NSI and LML Adapted from [Bader and Hitzler, 2005]


Open Questions

Choice of Machine Learning to Use Which choice of ML works best in the context of

knowledge for reasoning Unsupervised learning taking a more central role Others feel that reinforcement learning is the only

true predictive modeling Hybrid methods are challenge for knowledge

consolidation

33


Open Questions

Training Examples versus Prior Knowledge Both NSI and LML systems must weigh the

accuracy and relevance of retained knowledge Theories of how to selectively transfer common

knowledge are needed Measures of relatedness needed

34

Small Nova ScotiaTrout !


Open Questions

Effective and Efficient Knowledge Retention Refinement/Consolidation are key to NSI/LML Stability-plasticity: no loss of prior knowledge,

increase accuracy/resolution if possible Approach should allow NSI/LML to efficiently

select knowledge for use Has the potential to make

serious advances on the learning of common background knowledge

35


Open Questions

Effective and Efficient Knowledge Transfer Transfer learning should quickly develop accurate

models Model accuracy should never degrade Functional transfer more accurate models

e.g. rehearsal of examples from prior tasks Representational transfer more rapid learning

e.g. priming with weights of prior models

36


Open Questions

Practice makes perfect ! An LML system must be capable of learning from

examples of tasks over a lifetime Practice should increase model accuracy and

overall domain knowledge How can this be done?

Research important to AI, Psych, and Education

37


Open Questions

Scalability For NSI symbolic extraction is demanding For LML retention and transfer adds complexity Both must scale to large numbers of:

Inputs, outputs Training examples Tasks over a lifetime

Big Data means Big scaling problems

38


Learning to Reason (L2R)

Takes a probabilistic perspective on learning and reasoning [Kardon and Roth 97]

Agent need not answer all possible knowledge queries Only those that are relevant to the environment of a

learner in a probably approximately correct (PAC) sense (w.r.t. some prob. dist.) [Valiant 08, Juba 12&13 ]

Assertions can be learned to a desired level of accuracy and confidence using training examples of the assertions

39


Learning to Reason (L2R)

We are working on a LMLR approach that uses: Multiple task learning Primed by unsupervised deep learning

PAC-learns multiple logical assertions expressed as binary examples of Boolean functions

Reasoning is done by querying the trained network using similar Boolean examples and looking for sufficient agreement on T/F

Uses a combination of: DLA used to create hierarchies of abstract DNF-like features Consolidation is used to integrate new assertions with prior

knowledge and to share abstract features across a domain knowledge model

40


Learning to Reason (L2R) Example: To learn the assertions

(A ∧ B) ∨ C = True, and (A ∨ C) ∧ D = True The L2R system would be provided with examples of

the Boolean functions equivalent to the assertion and subject to a distribution D over the examples : a b c d T a b c d T a b c d T a b c d T 0 0 0 * 0 1 0 0 * 1 0 * 0 0 0 1 * 0 0 0 0 0 1 * 1 1 0 1 * 1 0 * 0 1 0 1 * 0 1 1 0 1 0 * 1 1 1 0 * 1 0 * 1 0 0 1 * 1 0 0 0 1 1 * 1 1 1 1 * 1 0 * 1 1 1 1 * 1 1 1

To query the L2R system with an assertion such as A ∨ ~C = True then examples for this function would be used to

test the system to see if it agreed

41


Summary

Propose that the AI community move to systems that are capable of learning, retaining and using knowledge over a lifetime

Opportunities for advances in AI lie at the locus of machine learning and knowledge representation

Consider the acquisition of knowledge in a form that can be used for more general AI, such as Learning to Reason (L2R)

Methods of knowledge consolidation will provide insights into how to best represent common knowledge – fundamental to intelligent systems

42


Recommendations

Researchers should Exploit common ground Explore differences

Find low-hanging fruit Encourage pursuit of AI systems that are able

to learn the knowledge that they use for reasoning

43

Make new discoveries


Thank You!

QUESTONS?

[email protected] http://tinyurl/dsilver http://ml3.acadiau.ca

mailto:[email protected]

http://plato.acadiau.ca/courses/comp/dsilver/

http://ml3.acadiau.ca/

http://ml3.acadiau.ca/

Lifelong Machine Learning and Reasoning

Documents

Transcript of Lifelong Machine Learning and Reasoning