Lecture23
-
Upload
albert-orriols-puig -
Category
Education
-
view
902 -
download
1
description
Transcript of Lecture23
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 23Lecture 23Learning Classifier Systems
Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net
Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle
Universitat Ramon Llull
Recap of Lectures 21-22Value functions
Vπ(s): Long-term reward estimation from state s following policy πo s a e s o o g po cy
Qπ(s,a): Long-term reward estimation from state s executing action a o s a e s e ecu g ac o aand then following policy π
The long term reward is a recency-weighted average ofThe long term reward is a recency weighted average of the received rewards
r r r ra a a ast st+1 st+2 st+3rt rt+1 rt+2 rt+3at at+1 at+2 at+3… …
Slide 2Artificial Intelligence Machine Learning
Recap of Lectures 21-22Q-learningQ g
Slide 3Artificial Intelligence Machine Learning
Today’s Agenda
The Origins of LCSsMichigan-style LCSsPitt b t l LCSPittsburg-style LCSs
Michigan-style LCSs
Slide 4Artificial Intelligence Machine Learning
Original Idea of LCSHolland’s envision: Cognitive Systemsg y
Create true artificial intelligence itself
True intelligence requires adaptive behavior in the face of changingTrue intelligence requires adaptive behavior in the face of changing circumstances (Holland & Reitman, 1978)
Holland’s vision going back to late 50s and early 60s of roving bandsHolland s vision going back to late 50s and early 60s of roving bands of computer programs.
Holland’s notion of genetic search as program searching (1962)The free generation procedure. . . Requires the generators (and combinations of generators) to “shift” and “connect” at random in thecombinations of generators) to shift and connect at random in the computer…two or more generators occupying adjacent modules (“in contact”) may become connected. Such connected sets of
t t hift it
F ti l t i t l t t d difi bl d t t
generators are to shift as a unit.
Slide 5
From stimulus-response to internal states and modifiable detectors and effectors
Artificial Intelligence Machine Learning
First LCS Implementation
CS-1 (Holland & Reitman, 1978)
Post-production systemPost-production systemGeneral memory containing classifiersProcess:
Code the situation and find in memory the actions that are appropriate to both CS-1 goal and situation
Store in memory the consequences of these actions (learning)
Generate new good productions ( l ifi ) t d(classifiers) to endure.
Population of classifiers Current system knowledge
Performance component Short term behavior of the system
Slide 6Artificial Intelligence Machine Learning
Rule discovery component Get new promising rules
Meanwhile, in Pitts UniversitySmith’s interpretation of Holland’s GA envisionp
Smith’s notion of learning as adaptive search (1980, 1983)LS 1 “L t f h i ti t d d tiLS-1: “Learns a set of heuristics, represented as production system programs, to govern the application of a set of operators in performing a particular task”
Great success! LS-1 took Waterman’s poker player to the cleaners (not bluffing)
Slide 7Artificial Intelligence Machine Learning
Two Models And here, two ways started: Michigan vs Pitts LCSs, y g
Michigan-style LCSs Pittsburgh-style LCSsCognitive system
Individual = rule
Straight GA
Individual = set of rulesIndividual rule
Solution: all the population
Individual set of rules
Solution: best individual
U ll fflipopulation
Apportionment of creditUsually offline systems
Reinforcement learning
Slide 8
We focus on Michigan-style LCSArtificial Intelligence Machine Learning
Michigan-style LCSsGeneral schema
EnvironmentSensorialstate RewardAction
Learning Classifier 1Classifier 2
Online rule evaluator:• XCS: Q-Learning (Sutton & Barto, 1998)
Uses Widrow-Hoff delta rule
state
Any Representation:Classifier System
Classifier 2
Classifier n
Uses Widrow-Hoff delta ruley pproduction rules,
genetic programs,perceptrons,
SVMs
Rule evolution:
SVMs
GeneticAlgorithm
Rule evolution: Typically, a GA (Holland, 75; Goldberg, 89) applied on the population.
Slide 9Artificial Intelligence Machine Learning
Knowledge RepresentationThe knowledge representation consists ofg p
Population of classifiersUsually independent of each otherUsually independent of each other
Each classifier hasC diti t CCondition part CAction part AP di ti t PPrediction part PInterpreted as:
If condition C is satisfied and action A is executed then P isIf condition C is satisfied and action A is executed, then P is expected to be true
Solution for a new problemSolution for a new problemGet the classifiers that match the sensorial stateDecide which action should be used among the actions of
Slide 10
Decide which action should be used among the actions of the selected classifiers
Artificial Intelligence Machine Learning
Condition StructuresCondition structure depends on the types of attributesp yp
BinaryTernary encoding {0 1 #}Ternary encoding {0, 1, #}
If v1 is ‘0’ and v2 is ‘1’ and v3 is ‘#’ … and vn in ‘0’ then actioni
ContinuousInterval-based encoding
If v1 in [l1 u1] and v2 in [l2 u2] and v in [l u ] then actioni
Hyperellipsoids
If v1 in [l1,u1] and v2 in [l2,u2] … and vn in [ln,un] then actioni
Slide 11Artificial Intelligence Machine Learning
Condition StructuresCondition structure depends on the types of attributesp yp
Many other representationsPartial matching (Booker 1985)Partial matching (Booker, 1985)
Default hierarchies (Holland et al., 1986)
Fuzzy conditions (Bonarini 2000; Valenzuela Rendón 1991; Casillas etFuzzy conditions (Bonarini, 2000; Valenzuela-Rendón, 1991; Casillas et al., 2008, Orriols et al., 2009)
Neural-network-based encodings (Bull & O’Hara, 2002)
GP tree encodings with S-expressions (Lanzi, 1999)
Slide 12Artificial Intelligence Machine Learning
PredictionPrediction can be:
Scalar number
LineLine
Polynomial
Neural network
…
W ill id th i iti l id di ti i l bWe will consider the initial idea: prediction is a scalar number
Slide 13Artificial Intelligence Machine Learning
Learning Interaction in XCS
ENVIRONMENT
Problem Match Set [M]
Population [P]
Problem instance
Match set
1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
Match Set [M]Selected
action
1 C A P ε F num as ts exp2 C A P ε F num as ts exp3 C A P ε F num as ts exp4 C A P ε F num as ts exp
Population [P] Match set generation
6 C A P ε F num as ts exp…
PredictionArray
REWARD
5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
Array
Action Set [A]
Selected action
Classifier1 C A P ε F num as ts exp3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
[ ]Selection, reproduction,
and mutationDeletion
ClassifierParameters
Update(Widrow-Hoff rule)
…Genetic AlgorithmCompetition in the niche
Delayed reward [A-1]
Fitness Sharing
1 C A P ε F num as ts exp
Action Set [A]-1
Slide 14Artificial Intelligence Machine Learning
C ε u as ts e p3 C A P ε F num as ts exp5 C A P ε F num as ts exp6 C A P ε F num as ts exp
…
Estimate Classifier PredictionThree key parametersy p
Prediction: What I will get if I select the action
Error: Error on that prediction Does it sound familiar?Q-learning!
Fitness: How good is my classifierg y
Slide 15
These parameters are estimated on-lineArtificial Intelligence Machine Learning
Evolutionary SearchGA applied time to time to [A]pp [ ]
Select two parents
C thCross them
Mutate them
Introduce the two new offspring into the population
If the population is full remove poor classifierst e popu at o s u e o e poo c ass e s
Slide 16Artificial Intelligence Machine Learning
LCS Learning PressuresParameter updates identifies most accurate classifiers
Different pressures caused by the GA:
S t t d litSet pressure toward generality
Fitness pressure toward highly fit classifiers
Mutation pressure pressuring toward diversification
Subsumption pressure toward the deletion of accurate, over-specialized classifiers
Slide 17Artificial Intelligence Machine Learning
Next Class
A li ti f LCSApplications of LCS
Slide 18Artificial Intelligence Machine Learning
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 23Lecture 23Learning Classifier Systems
Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net
Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle
Universitat Ramon Llull