Lecture 3: Math Primer II

43
Machine Learning Andrew Rosenberg Lecture 3: Math Primer II

description

Lecture 3: Math Primer II. Machine Learning Andrew Rosenberg. Today. Wrap up of probability Vectors, Matrices. Calculus Derivation with respect to a vector. Properties of probability density functions. Sum Rule. Product Rule. Expected Values. - PowerPoint PPT Presentation

Transcript of Lecture 3: Math Primer II

Page 1: Lecture 3: Math Primer II

Machine LearningAndrew Rosenberg

Lecture 3: Math Primer II

Page 2: Lecture 3: Math Primer II

2

Today

• Wrap up of probability• Vectors, Matrices.• Calculus• Derivation with respect to a vector.

Page 3: Lecture 3: Math Primer II

3

Properties of probability density functions

Sum Rule

Product Rule

Page 4: Lecture 3: Math Primer II

4

Expected Values

• Given a random variable, with a distribution p(X), what is the expected value of X?

Page 5: Lecture 3: Math Primer II

5

Multinomial Distribution

• If a variable, x, can take 1-of-K states, we represent the distribution of this variable as a multinomial distribution.

• The probability of x being in state k is μk

Page 6: Lecture 3: Math Primer II

6

Expected Value of a Multinomial

• The expected value is the mean values.

Page 7: Lecture 3: Math Primer II

7

Gaussian Distribution

• One Dimension

• D-Dimensions

Page 8: Lecture 3: Math Primer II

8

Gaussians

Page 9: Lecture 3: Math Primer II

9

How machine learning uses statistical modeling

• Expectation– The expected value of a function is the

hypothesis• Variance

– The variance is the confidence in that hypothesis

Page 10: Lecture 3: Math Primer II

10

Variance• The variance of a random variable describes how

much variability around the expected value there is.• Calculated as the expected squared error.

Page 11: Lecture 3: Math Primer II

11

Covariance

• The covariance of two random variables expresses how they vary together.

• If two variables are independent, their covariance equals zero.

Page 12: Lecture 3: Math Primer II

12

Linear Algebra• Vectors

– A one dimensional array. – If not specified, assume x is a column

vector.• Matrices

– Higher dimensional array.– Typically denoted with capital letters.– n rows by m columns

Page 13: Lecture 3: Math Primer II

13

Transposition

• Transposing a matrix swaps columns and rows.

Page 14: Lecture 3: Math Primer II

14

Transposition

• Transposing a matrix swaps columns and rows.

Page 15: Lecture 3: Math Primer II

15

Addition

• Matrices can be added to themselves iff they have the same dimensions.– A and B are both n-by-m matrices.

Page 16: Lecture 3: Math Primer II

16

Multiplication• To multiply two matrices, the inner dimensions must

be the same.– An n-by-m matrix can be multiplied by an m-by-k matrix

Page 17: Lecture 3: Math Primer II

17

Inversion

• The inverse of an n-by-n or square matrix A is denoted A-1, and has the following property.

• Where I is the identity matrix is an n-by-n matrix with ones along the diagonal.– Iij = 1 iff i = j, 0 otherwise

Page 18: Lecture 3: Math Primer II

18

Identity Matrix

• Matrices are invariant under multiplication by the identity matrix.

Page 19: Lecture 3: Math Primer II

19

Helpful matrix inversion properties

Page 20: Lecture 3: Math Primer II

20

Norm

• The norm of a vector, x, represents the euclidean length of a vector.

Page 21: Lecture 3: Math Primer II

21

Positive Definite-ness

• Quadratic form– Scalar

– Vector

• Positive Definite matrix M

• Positive Semi-definite

Page 22: Lecture 3: Math Primer II

22

Calculus

• Derivatives and Integrals• Optimization

Page 23: Lecture 3: Math Primer II

23

Derivatives

• A derivative of a function defines the slope at a point x.

Page 24: Lecture 3: Math Primer II

24

Derivative Example

Page 25: Lecture 3: Math Primer II

25

Integrals

• Integration is the inverse operation of derivation (plus a constant)

• Graphically, an integral can be considered the area under the curve defined by f(x)

Page 26: Lecture 3: Math Primer II

26

Integration Example

Page 27: Lecture 3: Math Primer II

27

Vector Calculus

• Derivation with respect to a matrix or vector

• Gradient• Change of Variables with a Vector

Page 28: Lecture 3: Math Primer II

28

Derivative w.r.t. a vector

• Given a vector x, and a function f(x), how can we find f’(x)?

Page 29: Lecture 3: Math Primer II

29

Derivative w.r.t. a vector

• Given a vector x, and a function f(x), how can we find f’(x)?

Page 30: Lecture 3: Math Primer II

30

Example Derivation

Page 31: Lecture 3: Math Primer II

31

Example Derivation

Also referred to as the gradient of a function.

Page 32: Lecture 3: Math Primer II

32

Useful Vector Calculus identities

• Scalar Multiplication

• Product Rule

Page 33: Lecture 3: Math Primer II

33

Useful Vector Calculus identities

• Derivative of an inverse

• Change of Variable

Page 34: Lecture 3: Math Primer II

34

Optimization

• Have an objective function that we’d like to maximize or minimize, f(x)

• Set the first derivative to zero.

Page 35: Lecture 3: Math Primer II

35

Optimization with constraints

• What if I want to constrain the parameters of the model.– The mean is less than 10

• Find the best likelihood, subject to a constraint.

• Two functions:– An objective function to maximize– An inequality that must be satisfied

Page 36: Lecture 3: Math Primer II

36

Lagrange Multipliers

• Find maxima of f(x,y) subject to a constraint.

Page 37: Lecture 3: Math Primer II

37

General form

• Maximizing:

• Subject to:

• Introduce a new variable, and find a maxima.

Page 38: Lecture 3: Math Primer II

38

Example

• Maximizing:

• Subject to:

• Introduce a new variable, and find a maxima.

Page 39: Lecture 3: Math Primer II

39

Example

Now have 3 equations with 3 unknowns.

Page 40: Lecture 3: Math Primer II

40

ExampleEliminate Lambda Substitute and Solve

Page 41: Lecture 3: Math Primer II

41

Why does Machine Learning need these tools?

• Calculus– We need to identify the maximum likelihood, or

minimum risk. Optimization– Integration allows the marginalization of

continuous probability density functions• Linear Algebra

– Many features leads to high dimensional spaces– Vectors and matrices allow us to compactly

describe and manipulate high dimension al feature spaces.

Page 42: Lecture 3: Math Primer II

42

Why does Machine Learning need these tools?

• Vector Calculus– All of the optimization needs to be performed

in high dimensional spaces– Optimization of multiple variables

simultaneously – Gradient Descent– Want to take a marginal over high

dimensional distributions like Gaussians.

Page 43: Lecture 3: Math Primer II

43

Next Time

• Linear Regression– Then Regularization