Lecture23

19
Introduction to Machine Introduction to Machine Learning Learning Lecture 23 Lecture 23 Learning Classifier Systems Albert Orriols i Puig htt // lb t il t http://www.albertorriols.net [email protected] Artificial Intelligence Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull

description

 

Transcript of Lecture23

Page 1: Lecture23

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 23Lecture 23Learning Classifier Systems

Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net

[email protected]

Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle

Universitat Ramon Llull

Page 2: Lecture23

Recap of Lectures 21-22Value functions

Vπ(s): Long-term reward estimation from state s following policy πo s a e s o o g po cy

Qπ(s,a): Long-term reward estimation from state s executing action a o s a e s e ecu g ac o aand then following policy π

The long term reward is a recency-weighted average ofThe long term reward is a recency weighted average of the received rewards

r r r ra a a ast st+1 st+2 st+3rt rt+1 rt+2 rt+3at at+1 at+2 at+3… …

Slide 2Artificial Intelligence Machine Learning

Page 3: Lecture23

Recap of Lectures 21-22Q-learningQ g

Slide 3Artificial Intelligence Machine Learning

Page 4: Lecture23

Today’s Agenda

The Origins of LCSsMichigan-style LCSsPitt b t l LCSPittsburg-style LCSs

Michigan-style LCSs

Slide 4Artificial Intelligence Machine Learning

Page 5: Lecture23

Original Idea of LCSHolland’s envision: Cognitive Systemsg y

Create true artificial intelligence itself

True intelligence requires adaptive behavior in the face of changingTrue intelligence requires adaptive behavior in the face of changing circumstances (Holland & Reitman, 1978)

Holland’s vision going back to late 50s and early 60s of roving bandsHolland s vision going back to late 50s and early 60s of roving bands of computer programs.

Holland’s notion of genetic search as program searching (1962)The free generation procedure. . . Requires the generators (and combinations of generators) to “shift” and “connect” at random in thecombinations of generators) to shift and connect at random in the computer…two or more generators occupying adjacent modules (“in contact”) may become connected. Such connected sets of

t t hift it

F ti l t i t l t t d difi bl d t t

generators are to shift as a unit.

Slide 5

From stimulus-response to internal states and modifiable detectors and effectors

Artificial Intelligence Machine Learning

Page 6: Lecture23

First LCS Implementation

CS-1 (Holland & Reitman, 1978)

Post-production systemPost-production systemGeneral memory containing classifiersProcess:

Code the situation and find in memory the actions that are appropriate to both CS-1 goal and situation

Store in memory the consequences of these actions (learning)

Generate new good productions ( l ifi ) t d(classifiers) to endure.

Population of classifiers Current system knowledge

Performance component Short term behavior of the system

Slide 6Artificial Intelligence Machine Learning

Rule discovery component Get new promising rules

Page 7: Lecture23

Meanwhile, in Pitts UniversitySmith’s interpretation of Holland’s GA envisionp

Smith’s notion of learning as adaptive search (1980, 1983)LS 1 “L t f h i ti t d d tiLS-1: “Learns a set of heuristics, represented as production system programs, to govern the application of a set of operators in performing a particular task”

Great success! LS-1 took Waterman’s poker player to the cleaners (not bluffing)

Slide 7Artificial Intelligence Machine Learning

Page 8: Lecture23

Two Models And here, two ways started: Michigan vs Pitts LCSs, y g

Michigan-style LCSs Pittsburgh-style LCSsCognitive system

Individual = rule

Straight GA

Individual = set of rulesIndividual rule

Solution: all the population

Individual set of rules

Solution: best individual

U ll fflipopulation

Apportionment of creditUsually offline systems

Reinforcement learning

Slide 8

We focus on Michigan-style LCSArtificial Intelligence Machine Learning

Page 9: Lecture23

Michigan-style LCSsGeneral schema

EnvironmentSensorialstate RewardAction

Learning Classifier 1Classifier 2

Online rule evaluator:• XCS: Q-Learning (Sutton & Barto, 1998)

Uses Widrow-Hoff delta rule

state

Any Representation:Classifier System

Classifier 2

Classifier n

Uses Widrow-Hoff delta ruley pproduction rules,

genetic programs,perceptrons,

SVMs

Rule evolution:

SVMs

GeneticAlgorithm

Rule evolution: Typically, a GA (Holland, 75; Goldberg, 89) applied on the population.

Slide 9Artificial Intelligence Machine Learning

Page 10: Lecture23

Knowledge RepresentationThe knowledge representation consists ofg p

Population of classifiersUsually independent of each otherUsually independent of each other

Each classifier hasC diti t CCondition part CAction part AP di ti t PPrediction part PInterpreted as:

If condition C is satisfied and action A is executed then P isIf condition C is satisfied and action A is executed, then P is expected to be true

Solution for a new problemSolution for a new problemGet the classifiers that match the sensorial stateDecide which action should be used among the actions of

Slide 10

Decide which action should be used among the actions of the selected classifiers

Artificial Intelligence Machine Learning

Page 11: Lecture23

Condition StructuresCondition structure depends on the types of attributesp yp

BinaryTernary encoding {0 1 #}Ternary encoding {0, 1, #}

If v1 is ‘0’ and v2 is ‘1’ and v3 is ‘#’ … and vn in ‘0’ then actioni

ContinuousInterval-based encoding

If v1 in [l1 u1] and v2 in [l2 u2] and v in [l u ] then actioni

Hyperellipsoids

If v1 in [l1,u1] and v2 in [l2,u2] … and vn in [ln,un] then actioni

Slide 11Artificial Intelligence Machine Learning

Page 12: Lecture23

Condition StructuresCondition structure depends on the types of attributesp yp

Many other representationsPartial matching (Booker 1985)Partial matching (Booker, 1985)

Default hierarchies (Holland et al., 1986)

Fuzzy conditions (Bonarini 2000; Valenzuela Rendón 1991; Casillas etFuzzy conditions (Bonarini, 2000; Valenzuela-Rendón, 1991; Casillas et al., 2008, Orriols et al., 2009)

Neural-network-based encodings (Bull & O’Hara, 2002)

GP tree encodings with S-expressions (Lanzi, 1999)

Slide 12Artificial Intelligence Machine Learning

Page 13: Lecture23

PredictionPrediction can be:

Scalar number

LineLine

Polynomial

Neural network

W ill id th i iti l id di ti i l bWe will consider the initial idea: prediction is a scalar number

Slide 13Artificial Intelligence Machine Learning

Page 14: Lecture23

Learning Interaction in XCS

ENVIRONMENT

Problem Match Set [M]

Population [P]

Problem instance

Match set

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Match Set [M]Selected

action

1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp4 C A P ε F num as ts exp

Population [P] Match set generation

6 C A P ε F num as ts exp…

PredictionArray

REWARD

5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Array

Action Set [A]

Selected action

Classifier1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

[ ]Selection, reproduction,

and mutationDeletion

ClassifierParameters

Update(Widrow-Hoff rule)

…Genetic AlgorithmCompetition in the niche

Delayed reward [A-1]

Fitness Sharing

1 C A P ε F num as ts exp

Action Set [A]-1

Slide 14Artificial Intelligence Machine Learning

C ε u as ts e p3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Page 15: Lecture23

Estimate Classifier PredictionThree key parametersy p

Prediction: What I will get if I select the action

Error: Error on that prediction Does it sound familiar?Q-learning!

Fitness: How good is my classifierg y

Slide 15

These parameters are estimated on-lineArtificial Intelligence Machine Learning

Page 16: Lecture23

Evolutionary SearchGA applied time to time to [A]pp [ ]

Select two parents

C thCross them

Mutate them

Introduce the two new offspring into the population

If the population is full remove poor classifierst e popu at o s u e o e poo c ass e s

Slide 16Artificial Intelligence Machine Learning

Page 17: Lecture23

LCS Learning PressuresParameter updates identifies most accurate classifiers

Different pressures caused by the GA:

S t t d litSet pressure toward generality

Fitness pressure toward highly fit classifiers

Mutation pressure pressuring toward diversification

Subsumption pressure toward the deletion of accurate, over-specialized classifiers

Slide 17Artificial Intelligence Machine Learning

Page 18: Lecture23

Next Class

A li ti f LCSApplications of LCS

Slide 18Artificial Intelligence Machine Learning

Page 19: Lecture23

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 23Lecture 23Learning Classifier Systems

Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net

[email protected]

Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle

Universitat Ramon Llull