CS364 Artificial Intelligence Machine Learning
Transcript of CS364 Artificial Intelligence Machine Learning
12775
CS364 Artificial Intelligence Machine Learning
Matthew Casey
portal.surrey.ac.uk/computing/resources/l3/cs364
2
12775
Learning Outcomes
• Describe methods for acquiring human knowledge– Through experience
• Evaluate which of the acquisition methods would be most appropriate in a given situation– Limited data available through example
3
12775
Learning Outcomes
• Describe techniques for representing acquired knowledge in a way that facilitates automated reasoning over the knowledge– Generalise experience to novel situations
• Categorise and evaluate AI techniques according to different criteria such as applicability and ease of use, and intelligently participate in the selection of the appropriate techniques and tools, to solve simple problems– Strategies to overcome the ‘knowledge engineering
bottleneck’
4
12775
What is Learning?
• ‘The action of receiving instruction or acquiring knowledge’
• ‘A process which leads to the modification of behaviour or the acquisition of new abilities or responses, and which is additional to natural development by growth or maturation’
Source: Oxford English Dictionary Online: http://www.oed.com/, accessed October 2003
5
12775
Machine Learning
• Negnevitsky:– ‘In general, machine learning involves adaptive
mechanisms that enable computers to learn from experience, learn by example and learn by analogy’ (2005:165)
• Callan:– ‘A machine or software tool would not be viewed as
intelligent if it could not adapt to changes in its environment’ (2003:225)
• Luger:– ‘Intelligent agents must be able to change through the
course of their interactions with the world’ (2002:351)
6
12775
Types of Learning
• Inductive learning– Learning from examples
• Evolutionary/genetic learning– Shaping a population of individual solutions
through survival of the fittest– Emergent learning– Learning through social interaction: game of
life
7
12775
Inductive Learning
• Supervised learning– Training examples with a known classification
from a teacher• Unsupervised learning
– No pre-classification of training examples– Competitive learning: learning through
competition on training examples
8
12775
Key Concepts
• Learn from experience– Through examples, analogy or discovery
• To adapt– Changes in response to interaction
• Generalisation– To use experience to form a response to
novel situations
9
12775
Generalisation
10
12775
Machine Learning
• Techniques and algorithms that adapt through experience
• Used for:– Interpretation / visualisation: summarise data– Prediction: time series / stock data– Classification: malignant or benign tumours– Regression: curve fitting– Discovery: data mining / pattern discovery
11
12775
Why?
• Complexity of task / amount of data– Other techniques fail or are computationally
expensive• Problems that cannot be defined
– Discovery of patterns / data mining• Knowledge Engineering Bottleneck
– ‘Cost and difficulty of building expert systems using traditional […] techniques’ (Luger 2002:351)
12
12775
Common Techniques
• Least squares• Decision trees• Support vector machines• Boosting• Neural networks• K-means• Genetic algorithms
Hastie, T., Tibshirani, R. & Friedman, J.H. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer-Verlag.
13
12775
Neural Networks
• Developed from models of the biology of behaviour– Human brain contains of the order of 1010 neurons,
each connecting to 104 others– Parallel processing– Inhibitory and excitatory
• Supervised:– Perceptron, radial basis function
• Unsupervised:– Self-organising map
http://cwabacon.pearsoned.com/bookbind/pubbooks/pinel_ab/chapter3/deluxe.html
14
12775
Neural Networks
• Output is a complex (often non-linear) combination of the inputs
wk1x1
Input Signal
wk2x2
wkmxm
Φ(.)ΣOutput
yk
Biaswk0
Synaptic Weights
Summing Junction
Activation Function
15
12775
Decision Trees
• A map of the reasoning process, good at solving classification problems (Negnevitsky, 2005)
• A decision tree represents a number of different attributes and values– Nodes represent attributes– Branches represent values of the attributes
• Path through a tree represents a decision• Tree can be associated with rules
16
12775
Example 1
• Consider one rule for an ice-cream seller (Callan 2003:241)
IF Outlook = SunnyAND Temperature = HotTHEN Sell
17
12775
Example 1
OutlookOutlook
TemperatureTemperature
SunnySunny
HotHot
SellSell
Don’t SellDon’t SellSellSell
YesYes NoNo
MildMild
Holiday SeasonHoliday Season
ColdCold
Don’t SellDon’t Sell
Holiday SeasonHoliday Season
OvercastOvercast
NoNo
Don’t SellDon’t Sell
YesYes
TemperatureTemperature
HotHot ColdColdMildMild
Don’t SellDon’t SellSellSell Don’t SellDon’t Sell
Root nodeBranch
Leaf
Node
18
12775
Construction
• Concept learning:– Inducing concepts from examples
• We can intuitively construct a decision tree for a small set of examples
• Different algorithms used to construct a tree based upon the examples– Most popular ID3 (Quinlan, 1986)
19
12775
Example 2
-TrueFalse3-TrueTrue4
21
Item
+FalseTrue+FalseFalse
ClassYX
Draw a tree based upon the following data:
20
12775
Example 2: Tree 1
YY
{3,4}{3,4}NegativeNegative
{1,2}{1,2}PositivePositive
TrueTrue FalseFalse
21
12775
Example 2: Tree 2
XX
{2,4}{2,4}YY
{1,3}{1,3}YY
{2}{2}PositivePositive
{4}{4}NegativeNegative
TrueTrue FalseFalse
TrueTrue FalseFalse
{1}{1}PositivePositive
{3}{3}NegativeNegative
TrueTrue FalseFalse
22
12775
Which Tree?
• Different trees can be constructed from the same set of examples
• Which tree is the best?– Based upon choice of attributes at each node in the
tree– A split in the tree (branches) should correspond to the
predictor with the maximum separating power• Examples can be contradictory
– Real-life is noisy
23
12775
Example 3
• Callan (2003:242-247)– Locating a new bar
24
12775
Choosing Attributes
• Entropy:– Measure of disorder (high is bad)
• For c classification categories• Attribute a that has value v• Probability of v being in category i is pi
• Entropy E is:( ) ∑
=
−==c
iii ppvaE
12log
25
12775
Entropy Example
• Choice of attributes:– City/Town, University, Housing Estate,
Industrial Estate, Transport and Schools• City/Town: is either Y or N• For Y: 7 positive examples, 3 negative• For N: 4 positive examples, 6 negative
26
12775
Entropy Example
• City/Town as root node:– For c=2 (positive and negative) classification
categories– Attribute a=City/Town that has value v=Y– Probability of v=Y being in category positive
– Probability of v=Y being in category negative10
7positive ==ip
103
negative ==ip
27
12775
• City/Town as root node:– For c=2 (positive and negative) classification
categories– Attribute a=City/Town that has value v=Y– Entropy E is:
Entropy Example
( )881.0
103log10
310
7log107YCity/Town 22
=
−−==E
28
12775
• City/Town as root node:– For c=2 (positive and negative) classification
categories– Attribute a=City/Town that has value v=N– Probability of v=N being in category positive
– Probability of v=N being in category negative
Entropy Example
104
positive ==ip
106
negative ==ip
29
12775
• City/Town as root node:– For c=2 (positive and negative) classification
categories– Attribute a=City/Town that has value v=N– Entropy E is:
Entropy Example
( )971.0
106log10
610
4log104NCity/Town 22
=
−−==E
30
12775
Choosing Attributes
• Information gain: – Expected reduction in entropy (high is good)
• Entropy of whole example set T is E(T)• Examples with a=v, v is jth value are Tj,
• Entropy E(a=v)=E(Tj)• Gain is: ( ) ( ) ( )∑
=
−=V
jj
jTE
T
TTEaTGain
1
,
31
12775
• For root of tree there are 20 examples:– For c=2 (positive and negative) classification
categories– Probability of being positive with 11 examples
– Probability of being negative with 9 examples
Information Gain Example
2011
positive ==ip
209
negative ==ip
32
12775
• For root of tree there are 20 examples:– For c=2 (positive and negative) classification
categories– Entropy of all training examples E(T) is:
Information Gain Example
( )993.0
209log20
920
11log2011T 22
=
−−=E
20=T
33
12775
Information Gain Example
• City/Town as root node:– 10 examples for a=City/Town and value v=Y
– 10 examples for a=City/Town and value v=N10==YjT
10==NjT
( ) ( ) ( )( )( )067.0
971.02010881.020
10993.0/,
=
×+×−=TownCityTGain
( ) 881.0==YjTE
( ) 971.0==NjTE
34
12775
Example 4
• Calculate the information gain for the Transport attribute
35
12775
Information Gain Example
36
12775
Choosing Attributes
• Chose root node as the attribute that gives the highest Information Gain– In this case attribute Transport with 0.266
• Branches from root node then become the values associated with the attribute– Recursive calculation of attributes/nodes– Filter examples by attribute value
37
12775
Recursive Example
• With Transport as the root node:– Select examples where Transport is Average– (1, 3, 6, 8, 11, 15, 17)– Use only these examples to construct this
branch of the tree– Repeat for each attribute (Poor, Good)
38
12775
Final Tree
TransportTransport
{7,12,16,19,20}{7,12,16,19,20}PositivePositive
AA PP GG
{8}{8}NegativeNegative
{6}{6}NegativeNegative
{1,3,6,8,11,15,17}{1,3,6,8,11,15,17}Housing EstateHousing Estate
LL MM SS NN
{11,17}{11,17}Industrial EstateIndustrial Estate
{17}{17}NegativeNegative
{11}{11}PPositiveositive
YY NN
{1,3,15}{1,3,15}UniversityUniversity
{15}{15}NegativeNegative
{1,3}{1,3}PPositiveositive
YY NN
Callan 2003:243
{5,9,14}{5,9,14}PPositiveositive
{2,4,10,13,18}{2,4,10,13,18}NegativeNegative
{2,4,5,9,10,13,14,18}{2,4,5,9,10,13,14,18}Industrial EstateIndustrial Estate
YY NN
39
12775
ID3
• Procedure Extend(Tree d, Examples T)– Choose best attribute a for root of d
• Calculate E(a=v) and Gain(T,a) for each attribute• Attribute with highest Gain(T,a) is selected as best
– Assign best attribute a to root of d– For each value v of attribute a
• Create branch for v=a resulting in sub-tree dj
• Assign to Tj training examples from T where v=a• Recurse sub-tree with Extend(dj, Tj)
40
12775
Issues
• Use prior knowledge where available• Not all the examples may be needed to
construct a tree– Test generalisation of tree during training and
stop when desired performance is achieved– Prune the tree once constructed
• Examples may be noisy• Examples may contain irrelevant attributes
41
12775
Extracting Rules
• We can extract rules from decision trees– Create one rule for each root-to-leaf path– Simplify by combining rules
• Other techniques are not so transparent:– Neural networks are often described as ‘black
boxes’ – it is difficult to understand what the network is doing
– Extraction of rules from trees can help us to understand the decision process
42
12775
Rules Example
TransportTransport
{1,3,6,8,11,15,17}{1,3,6,8,11,15,17}Housing EstateHousing Estate
{2,4,5,9,10,13,14,18}{2,4,5,9,10,13,14,18}Industrial EstateIndustrial Estate
{7,12,16,19,20}{7,12,16,19,20}PositivePositive
{11,17}{11,17}Industrial EstateIndustrial Estate
{1,3,15}{1,3,15}UniversityUniversity
{8}{8}NegativeNegative
{6}{6}NegativeNegative
{5,9,14}{5,9,14}PPositiveositive
{2,4,10,13,18}{2,4,10,13,18}NegativeNegative
{17}{17}NegativeNegative
{11}{11}PPositiveositive
{15}{15}NegativeNegative
{1,3}{1,3}PPositiveositive
AA PP GG
LL MM SS NN
YY NN YY NN
Callan 2003:243
YY NN
43
12775
Rules Example
• IF Transport is AverageAND Housing Estate is LargeAND Industrial Estate is YesTHEN Positive
• …• IF Transport is Good
THEN Positive
44
12775
Summary
• What are the benefits/drawbacks of machine learning?– Are the techniques simple?– Are they simple to implement?– Are they computationally cheap?– Do they learn from experience?– Do they generalise well?– Can we understand how knowledge is represented?– Do they provide perfect solutions?
45
12775
Source Texts
• Negnevitsky, M. (2005). Artificial Intelligence: A Guide to Intelligent Systems. 2nd Edition. Essex, UK: Pearson Education Limited.– Chapter 6, pp. 165-168, chapter 9, pp. 349-360.
• Callan, R. (2003). Artificial Intelligence, Basingstoke, UK: Palgrave MacMillan.– Part 5, chapters 11-17, pp. 225-346.
• Luger, G.F. (2002). Artificial Intelligence: Structures & Strategies for Complex Problem Solving. 4th Edition. London, UK: Addison-Wesley.– Part IV, chapters 9-11, pp. 349-506.
46
12775
Journals
• Artificial Intelligence– http://www.elsevier.com/locate/issn/00043702– http://www.sciencedirect
.com/science/journal/00043702
47
12775
Articles
• Quinlan, J.R. (1986). Induction of Decision Trees. Machine Learning, vol. 1, pp.81-106.
• Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers.
48
12775
Websites
• UCI Machine Learning Repository– Example data sets for benchmarking– http://www.ics.uci.edu/~mlearn/MLRepository.html
• Genetic Algorithms Archive– Programs, data, bibliographies, etc.– http://www.aic.nrl.navy.mil/galist/
• Wonders of Math: Game of Life– Game of life applet and details– http://www.math.com/students/wonders/life/life.html