Lecture23

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 23Lecture 23Learning Classifier Systems

Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net

[email protected]

Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle

Universitat Ramon Llull

Recap of Lectures 21-22Value functions

Vπ(s): Long-term reward estimation from state s following policy πo s a e s o o g po cy

Qπ(s,a): Long-term reward estimation from state s executing action a o s a e s e ecu g ac o aand then following policy π

The long term reward is a recency-weighted average ofThe long term reward is a recency weighted average of the received rewards

r r r ra a a ast st+1 st+2 st+3rt rt+1 rt+2 rt+3at at+1 at+2 at+3… …

Slide 2Artificial Intelligence Machine Learning

Recap of Lectures 21-22Q-learningQ g


Today’s Agenda

The Origins of LCSsMichigan-style LCSsPitt b t l LCSPittsburg-style LCSs

Michigan-style LCSs


Original Idea of LCSHolland’s envision: Cognitive Systemsg y

Create true artificial intelligence itself

True intelligence requires adaptive behavior in the face of changingTrue intelligence requires adaptive behavior in the face of changing circumstances (Holland & Reitman, 1978)

Holland’s vision going back to late 50s and early 60s of roving bandsHolland s vision going back to late 50s and early 60s of roving bands of computer programs.

Holland’s notion of genetic search as program searching (1962)The free generation procedure. . . Requires the generators (and combinations of generators) to “shift” and “connect” at random in thecombinations of generators) to shift and connect at random in the computer…two or more generators occupying adjacent modules (“in contact”) may become connected. Such connected sets of

t t hift it

F ti l t i t l t t d difi bl d t t

generators are to shift as a unit.

Slide 5

From stimulus-response to internal states and modifiable detectors and effectors

Artificial Intelligence Machine Learning

First LCS Implementation

CS-1 (Holland & Reitman, 1978)

Post-production systemPost-production systemGeneral memory containing classifiersProcess:

Code the situation and find in memory the actions that are appropriate to both CS-1 goal and situation

Store in memory the consequences of these actions (learning)

Generate new good productions ( l ifi ) t d(classifiers) to endure.

Population of classifiers Current system knowledge

Performance component Short term behavior of the system


Rule discovery component Get new promising rules

Meanwhile, in Pitts UniversitySmith’s interpretation of Holland’s GA envisionp

Smith’s notion of learning as adaptive search (1980, 1983)LS 1 “L t f h i ti t d d tiLS-1: “Learns a set of heuristics, represented as production system programs, to govern the application of a set of operators in performing a particular task”

Great success! LS-1 took Waterman’s poker player to the cleaners (not bluffing)


Two Models And here, two ways started: Michigan vs Pitts LCSs, y g

Michigan-style LCSs Pittsburgh-style LCSsCognitive system

Individual = rule

Straight GA

Individual = set of rulesIndividual rule

Solution: all the population

Individual set of rules

Solution: best individual

U ll fflipopulation

Apportionment of creditUsually offline systems

Reinforcement learning

Slide 8

We focus on Michigan-style LCSArtificial Intelligence Machine Learning

Michigan-style LCSsGeneral schema

EnvironmentSensorialstate RewardAction

Learning Classifier 1Classifier 2

Online rule evaluator:• XCS: Q-Learning (Sutton & Barto, 1998)

Uses Widrow-Hoff delta rule

state

Any Representation:Classifier System

Classifier 2

Classifier n

Uses Widrow-Hoff delta ruley pproduction rules,

genetic programs,perceptrons,

SVMs

Rule evolution:

SVMs

GeneticAlgorithm

Rule evolution: Typically, a GA (Holland, 75; Goldberg, 89) applied on the population.


Knowledge RepresentationThe knowledge representation consists ofg p

Population of classifiersUsually independent of each otherUsually independent of each other

Each classifier hasC diti t CCondition part CAction part AP di ti t PPrediction part PInterpreted as:

If condition C is satisfied and action A is executed then P isIf condition C is satisfied and action A is executed, then P is expected to be true

Solution for a new problemSolution for a new problemGet the classifiers that match the sensorial stateDecide which action should be used among the actions of

Slide 10

Decide which action should be used among the actions of the selected classifiers

Artificial Intelligence Machine Learning

Condition StructuresCondition structure depends on the types of attributesp yp

BinaryTernary encoding {0 1 #}Ternary encoding {0, 1, #}

If v1 is ‘0’ and v2 is ‘1’ and v3 is ‘#’ … and vn in ‘0’ then actioni

ContinuousInterval-based encoding

If v1 in [l1 u1] and v2 in [l2 u2] and v in [l u ] then actioni

Hyperellipsoids

If v1 in [l1,u1] and v2 in [l2,u2] … and vn in [ln,un] then actioni


Condition StructuresCondition structure depends on the types of attributesp yp

Many other representationsPartial matching (Booker 1985)Partial matching (Booker, 1985)

Default hierarchies (Holland et al., 1986)

Fuzzy conditions (Bonarini 2000; Valenzuela Rendón 1991; Casillas etFuzzy conditions (Bonarini, 2000; Valenzuela-Rendón, 1991; Casillas et al., 2008, Orriols et al., 2009)

Neural-network-based encodings (Bull & O’Hara, 2002)

GP tree encodings with S-expressions (Lanzi, 1999)


PredictionPrediction can be:

Scalar number

LineLine

Polynomial

Neural network

…

W ill id th i iti l id di ti i l bWe will consider the initial idea: prediction is a scalar number


Learning Interaction in XCS

ENVIRONMENT

Problem Match Set [M]

Population [P]

Problem instance

Match set

1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

Match Set [M]Selected

action

1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp4 C A P ε F num as ts exp

Population [P] Match set generation

6 C A P ε F num as ts exp…

PredictionArray

REWARD

5 C A P ε F num as ts exp6 C A P ε F num as ts exp

…

Array

Action Set [A]

Selected action

Classifier1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

[ ]Selection, reproduction,

and mutationDeletion

ClassifierParameters

Update(Widrow-Hoff rule)

…Genetic AlgorithmCompetition in the niche

Delayed reward [A-1]

Fitness Sharing

1 C A P ε F num as ts exp

Action Set [A]-1


C ε u as ts e p3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp

…

Estimate Classifier PredictionThree key parametersy p

Prediction: What I will get if I select the action

Error: Error on that prediction Does it sound familiar?Q-learning!

Fitness: How good is my classifierg y

Slide 15

These parameters are estimated on-lineArtificial Intelligence Machine Learning

Evolutionary SearchGA applied time to time to [A]pp [ ]

Select two parents

C thCross them

Mutate them

Introduce the two new offspring into the population

If the population is full remove poor classifierst e popu at o s u e o e poo c ass e s


LCS Learning PressuresParameter updates identifies most accurate classifiers

Different pressures caused by the GA:

S t t d litSet pressure toward generality

Fitness pressure toward highly fit classifiers

Mutation pressure pressuring toward diversification

Subsumption pressure toward the deletion of accurate, over-specialized classifiers


Next Class

A li ti f LCSApplications of LCS


Introduction to MachineIntroduction to Machine LearningLearning

Lecture 23Lecture 23Learning Classifier Systems

Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net

[email protected]

Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle

Universitat Ramon Llull

Lecture23

Education

Transcript of Lecture23