Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
-
date post
20-Dec-2015 -
Category
Documents
-
view
228 -
download
6
Transcript of Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
![Page 1: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/1.jpg)
Machine LearningCMPT 726Simon Fraser University
CHAPTER 1: INTRODUCTION
![Page 2: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/2.jpg)
Outline
• Comments on general approach.• Probability Theory.
• Joint, conditional and marginal probabilities.• Random Variables.• Functions of R.V.s
• Bernoulli Distribution (Coin Tosses).• Maximum Likelihood Estimation.• Bayesian Learning With Conjugate Prior.
• The Gaussian Distribution.• Maximum Likelihood Estimation.• Bayesian Learning With Conjugate Prior.
• More Probability Theory.• Entropy.• KL Divergence.
![Page 3: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/3.jpg)
Our Approach
• The course generally follows statistics, very interdisciplinary.• Emphasis on predictive models: guess the value(s) of target variable(s). “Pattern Recognition”• Generally a Bayesian approach as in the text.• Compared to standard Bayesian statistics:
• more complex models (neural nets, Bayes nets)• more discrete variables• more emphasis on algorithms and efficiency
![Page 4: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/4.jpg)
Things Not Covered
• Within statistics:• Hypothesis testing• Frequentist theory, learning theory.
• Other types of data (not random samples)• Relational data• Scientific data (automated scientific discovery)• Action + learning = reinforcement learning.
Could be optional – what do you think?
![Page 5: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/5.jpg)
Probability Theory
Apples and Oranges
![Page 6: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/6.jpg)
Probability Theory
Marginal Probability
Conditional ProbabilityJoint Probability
![Page 7: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/7.jpg)
Probability Theory
Sum Rule
Product Rule
![Page 8: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/8.jpg)
The Rules of Probability
Sum Rule
Product Rule
![Page 9: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/9.jpg)
Bayes’ Theorem
posterior likelihood × prior
![Page 10: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/10.jpg)
Bayes’ Theorem: Model Version
• Let M be model, E be evidence.
•P(M|E) proportional to P(M) x P(E|M)
Intuition• prior = how plausible is the event (model, theory) a priori before seeing any evidence.• likelihood = how well does the model explain the data?
![Page 11: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/11.jpg)
Probability Densities
![Page 12: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/12.jpg)
Transformed Densities
![Page 13: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/13.jpg)
Expectations
Conditional Expectation(discrete)
Approximate Expectation(discrete and continuous)
![Page 14: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/14.jpg)
Expectations are Linear
• Let aX + bY + c be a linear combination of two random variables (itself a random variable).
• Then E[aX + bY + c] = aE[X] + bE[Y] + c.• This holds whether or not X and Y are
independent.• Good exercise to prove it.
![Page 15: Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.](https://reader035.fdocuments.net/reader035/viewer/2022062313/56649d435503460f94a1ed97/html5/thumbnails/15.jpg)
Variances and CovariancesThink about this difference:1.Everybody gets a B.2.10 students get a C, 10 get an A.The average is the same – how to quantify the difference?
Prove this. Hint: use the linearity of expectation.