Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr....

118
Data Mining - Mathematics Dr. Jean-Michel RICHER 2018 Dr. Jean-Michel RICHER Data Mining - Mathematics 1 / 102

Transcript of Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr....

Page 1: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Data Mining - Mathematics

Dr. Jean-Michel RICHER

[email protected]

Dr. Jean-Michel RICHER Data Mining - Mathematics 1 / 102

Page 2: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Outline

1. Introduction

2. Matrices and vectors

3. Derivative

4. Gradient

5. Lagrangian

6. Exercises

Dr. Jean-Michel RICHER Data Mining - Mathematics 2 / 102

Page 3: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

1. Introduction

Dr. Jean-Michel RICHER Data Mining - Mathematics 3 / 102

Page 4: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

What we will cover

What we will coverMathematical background needed for the understandingof Machine Learning techniques:

matrix and vector operationsderivativegradientlagrangian

Dr. Jean-Michel RICHER Data Mining - Mathematics 4 / 102

Page 5: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Difficulty of mathematics

Difficulty of mathematicsuse of symbols that represent expressionssome symbols have different meaning depending onthe contextsometimes strict syntax and sometimes not

Dr. Jean-Michel RICHER Data Mining - Mathematics 5 / 102

Page 6: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Difficulty of mathematics

Example of a prime numbern is a prime number if it has only two divisors 1 and itself(but with restriction that x 6= 1)

implies that we deal with integersimplies the notion of divisibility:

∀ n,p,q, r ∈ N

n = p × q + r

n is divisible by q if (and only if) r = 0 and p ≥ 1

Dr. Jean-Michel RICHER Data Mining - Mathematics 6 / 102

Page 7: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

2. Matrices and vectors

Dr. Jean-Michel RICHER Data Mining - Mathematics 7 / 102

Page 8: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Vector

Vectora series of values that can be identified by their indexin an array of length p

x(p) or simply x ∈ Rp

for computer scientists a 1D array of length p

x(p) = (x1, x2, . . . , xp) = [x1, x2, . . . , xp], xi ∈ Rcan also be represented vertically (formathematicians):

x(p) = x =

x1x2...

xp

, xT = [x1, x2, . . . , xp]

in this case xT (T for Transposition) will be the horizontalrepresentation

Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102

Page 9: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on vectors

Operations on vectorsvectors must have the same length+, −, ×, /the dot (or scalar) product of two vectors

xy = x · y = x � y = xT y =

p∑i=1

xi × yi

norm (or length) of a vector ||x || =√

x · x

Dr. Jean-Michel RICHER Data Mining - Mathematics 9 / 102

Page 10: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Example of operations on vectors

Examples of operations on vectors

x = [1,−2, 3]y = [−1, 4,−7]x + y = [1 + (−1),−2 + 4, 3 + (−7)] = [0, 2,−4]x × y = [1×−1,−2× 4, 3×−7] = [−1,−8,−21]xy = (1×−1) + (−2× 4) + (3×−7) = −30

xT y = [1,−2, 3] ·

−14−7

= −30

Dr. Jean-Michel RICHER Data Mining - Mathematics 10 / 102

Page 11: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Examples of operations on vectors

Norm of a vector in 2D

x · x = (2× 2) + (3× 3)

= 4 + 9 = 13

||x || =√

13 = 3.6055

3

2

Dr. Jean-Michel RICHER Data Mining - Mathematics 11 / 102

Page 12: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Matrix

MatrixX(p) = (x1, x2, . . . , xp) where xi ∈ Rn, notation that canlead to confusionfor computer scientists a 2D array defined by :

I n rowsI and p columns

can be seen as an array of vectors or vector ofvectors

X(n,p) =

x1

1 x21 . . . xp

1x1

2 x22 . . . xp

2...

. . . . . ....

x1n x2

n . . . xpn

=

x11

x12...

x1n

x21

x22...

x2n

· · ·

xp1

xp2...

xpn

x ji is the element in row i and column j

Dr. Jean-Michel RICHER Data Mining - Mathematics 12 / 102

Page 13: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Properties of matrices

Square matricesa square matrix is such that n = p

I the diagonal is a separation lineI called lower triangular if all the entries above the main

diagonal are zeroI called upper triangular if all the entries under the main

diagonal are zero

I or In is the identity matrix

In =

1 0 . . . 0

0 1. . .

......

. . . 1 00 . . . 0 1

Dr. Jean-Michel RICHER Data Mining - Mathematics 13 / 102

Page 14: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Matrix summatrices must have the same dimensions

X(n,p) =

x1

1 x21 . . . xp

1x2

1 x22 . . . xp

2...

.... . .

...x1

n x2n . . . xp

n

+ Y (n,p) =

y1

1 y21 . . . yp

1y2

1 y22 . . . yp

2...

.... . .

...y1

n y2n . . . yp

n

= Z(n,p) =

x1

1 + y11 x2

1 + y21 . . . xp

1 + yp1

x21 + y1

2 x22 + y2

2 . . . xp2 + yp

2...

.... . .

...x1

n + y1n x2

n + y2n . . . xp

n + ypn

Dr. Jean-Michel RICHER Data Mining - Mathematics 14 / 102

Page 15: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Matrix productX(n,p)× Y (p,q) = Z(n,q)

number of columns of X = number of rows of Y

z ji =

p∑k=1

aki × bj

k

for (int i = 0; i < n; ++i)

for (int j = 0; j < q; ++j) {

double sum = 0;

for (int k = 0; k < p; ++k) {

sum += a[i][k] * b[k][j];

}

c[i][j] = sum;

}

Dr. Jean-Michel RICHER Data Mining - Mathematics 15 / 102

Page 16: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Matrix productNote that generally:

AB 6= BA

Dr. Jean-Michel RICHER Data Mining - Mathematics 16 / 102

Page 17: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Example of matrix product

X(3, 2) =

11 1221 2231 32

× Y (2, 3) =

[1 2 34 5 6

]

Dr. Jean-Michel RICHER Data Mining - Mathematics 17 / 102

Page 18: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Example of matrix product

X(3, 2) =

11 1221 2231 32

× Y (2, 3) =

[1 2 34 5 6

]

Z(3, 3) =

11× 1 + 12× 4 11× 2 + 12× 5 11× 3 + 12× 621× 1 + 22× 4 11× 1 + 22× 4 21× 1 + 22× 431× 3 + 32× 6 31× 3 + 32× 6 31× 3 + 32× 6

Dr. Jean-Michel RICHER Data Mining - Mathematics 17 / 102

Page 19: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Example of matrix and vector product

X(3, 2) =

11 1221 2231 32

× y(2) =

[12

]

Dr. Jean-Michel RICHER Data Mining - Mathematics 18 / 102

Page 20: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Example of matrix and vector product

X(3, 2) =

11 1221 2231 32

× y(2) =

[12

]

Z(3, 3) =

11× 1 + 12× 221× 1 + 22× 231× 1 + 32× 2

=

356595

Dr. Jean-Michel RICHER Data Mining - Mathematics 18 / 102

Page 21: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Matrix transposeto make it simple: exchange of values from both sidesof the diagonal of the matrix

X(n,p) =

x1

1 x21 . . . xp

1x2

1 x22 . . . xp

2...

. . . . . ....

x1n x2

n . . . xpn

X T (p,n) =

x1

1 x12 . . . x1

nx2

1 x22 . . . x2

n...

. . . . . ....

xp1 xp

2 . . . xpn

Dr. Jean-Michel RICHER Data Mining - Mathematics 19 / 102

Page 22: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Matrix transpose

X(3, 3) =

11 12 1321 22 2331 32 33

X T (3, 3) =

11 21 3112 22 3213 23 33

X(3, 5) =

11 12 13 14 1521 22 23 24 2531 32 33 34 35

X T (5, 3) =

11 21 3112 22 3213 23 3314 24 3415 25 35

Dr. Jean-Michel RICHER Data Mining - Mathematics 20 / 102

Page 23: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Inverse of a matrixThe inverse matrix A−1 of a matrix A is such that

A−1 × A = I

and is used for example to solve linear equation systems:

A× x = b then x = A−1 × b

where x and b are vectors

Dr. Jean-Michel RICHER Data Mining - Mathematics 21 / 102

Page 24: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Operations on matrices

Inverse of a matrix

A(3, 3) =

1 2 3−1 2 −3−7 6 −5

A−1(3, 3) =

0.125 0.4375 −0.18750.25 0.25 0

0.125 −0.3125 0.0625

x1x2x3

= b =

14−6−10

A−1b =

123

Dr. Jean-Michel RICHER Data Mining - Mathematics 22 / 102

Page 25: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

3. Derivative

Dr. Jean-Michel RICHER Data Mining - Mathematics 23 / 102

Page 26: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative

Derivative of f (x)the slope of the tangent of a curve in a given pointif positive: the curve will increaseif negative: the curve will decreaseif zero: won’t increase or decrease

Dr. Jean-Michel RICHER Data Mining - Mathematics 24 / 102

Page 27: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative

Derivative of f (x)

f(x+h)

f(x)

x+hx

Dr. Jean-Michel RICHER Data Mining - Mathematics 25 / 102

Page 28: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative

DerivativeMore formally the derivative can be defined as

limh→0f (x + h)− f (x)

(x + h)− x= limh→0

f (x + h)− f (x)

h

Notations:

f ′(x) ordf (x)

dxdf (x)

dx means the variation of f (x) if we increase x by a smallvalue

Dr. Jean-Michel RICHER Data Mining - Mathematics 26 / 102

Page 29: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Properties of the derivative

Property of addition and product of the derivative

(f + g)′ = f ′ + g′

(f × g)′ = f ′ × g + f × g′

As an exercise you could try to prove the result of (f × g)′

Dr. Jean-Michel RICHER Data Mining - Mathematics 27 / 102

Page 30: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Properties of the derivative

Property of the composition

(g ◦ f )′ = (g′ ◦ f )× f ′

in other words:

g(f (x))′ = g′(f (x))× f ′(x) (DerivComp)

Dr. Jean-Michel RICHER Data Mining - Mathematics 28 / 102

Page 31: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Properties of the derivative

Property of the compositionFind the derivative of

h(x) = sin(3x2 + 2)

let g(x) = sin(x), then g′(x) = cos(x)

let f (x) = 3x2 + 2, then f ′(x) = 6x

So

h′(x) = g(f (x))′ = g′(f (x))× f ′(x)

h′(x) = cos(3x2 + 2)× 6x

Dr. Jean-Michel RICHER Data Mining - Mathematics 29 / 102

Page 32: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Properties of the derivative

Property of the inverse function

Let f−1(x) be the inverse function of f (x), i.e. f−1(f (x)) = x

[f−1(f (x))]′ = (x)′ = 1

f−1′(f (x))× f ′(x) = 1 from (DerivComp)

f−1′(f (x)) =1

f ′(x)(DerivInv)

Dr. Jean-Michel RICHER Data Mining - Mathematics 30 / 102

Page 33: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of xn

f (x) = xn

Dr. Jean-Michel RICHER Data Mining - Mathematics 31 / 102

Page 34: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of xn

f (x) = xn

The derivative should have the following behaviour

X −∞ 0 ∞

x2k − 0 +

x2k+1 + 0 +

with n even (2k) or odd (2k + 1)

Dr. Jean-Michel RICHER Data Mining - Mathematics 32 / 102

Page 35: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of xn

(x + a)n

Remember that x0 = 1

(x + a)2 = x2 + 2ax + a2

(x + a)3 = x3 + 3ax2 + 3a2x + a3

(x + a)4 = x4 + 4ax3 + 6a2x2 + 3a3x + a4

...

(x + a)n = α0,na0xn + · · ·+ αi,jaix j + · · ·+ αn,0anx0

(x + a)n =∑i=n

i=0,j=n−i αi,jaix j

the coefficients αi,j of aix j are given by Pascal’s triangle

Dr. Jean-Michel RICHER Data Mining - Mathematics 33 / 102

Page 36: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

A bit of history

Blaise Pascal [fr] (1623-1662)was a French mathematician,physicist, inventor, writer andcatholic theologianwrote a significant treatise on thesubject of projective geometry atthe age of 16work on the principles of hydraulicfluids (hydraulic press and thesyringe)theological work, referred toposthumously as the Pensées(Thoughts)

Dr. Jean-Michel RICHER Data Mining - Mathematics 34 / 102

Page 37: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of xn

Pascal’s triangle

n xn axn−1 . . .0 1 01 1 1 02 1 2 1 03 1 3 3 1 04 1 4 6 4 1

Obviously, the coefficient α1,n−1 of axn−1 is n

Dr. Jean-Michel RICHER Data Mining - Mathematics 35 / 102

Page 38: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of xn

Derivative of xn

f (x + h)− f (x) = (x + h)n − xn

Dr. Jean-Michel RICHER Data Mining - Mathematics 36 / 102

Page 39: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of xn

Derivative of xn

f (x + h)− f (x) = (x + h)n − xn

= (xn + nhxn−1 + α2,n−2h2xn−2 + . . .)− xn

Dr. Jean-Michel RICHER Data Mining - Mathematics 36 / 102

Page 40: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of xn

Derivative of xn

f (x + h)− f (x) = (x + h)n − xn

= (xn + nhxn−1 + α2,n−2h2xn−2 + . . .)− xn

= nhxn−1 + α2,n−2h2xn−2 + . . .

Dr. Jean-Michel RICHER Data Mining - Mathematics 36 / 102

Page 41: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of xn

Derivative of xn

f (x + h)− f (x) = (x + h)n − xn

= (xn + nhxn−1 + α2,n−2h2xn−2 + . . .)− xn

= nhxn−1 + α2,n−2h2xn−2 + . . .

f (x+h)−f (x)h =

nhxn−1+α2,n−2h2xn−2+...

h

Dr. Jean-Michel RICHER Data Mining - Mathematics 36 / 102

Page 42: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of xn

Derivative of xn

f (x + h)− f (x) = (x + h)n − xn

= (xn + nhxn−1 + α2,n−2h2xn−2 + . . .)− xn

= nhxn−1 + α2,n−2h2xn−2 + . . .

f (x+h)−f (x)h =

nhxn−1+α2,n−2h2xn−2+...

h

= nxn−1 + α2,n−2hxn−2 + . . .︸ ︷︷ ︸0

Dr. Jean-Michel RICHER Data Mining - Mathematics 36 / 102

Page 43: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of xn

Derivative of xn

f (x + h)− f (x) = (x + h)n − xn

= (xn + nhxn−1 + α2,n−2h2xn−2 + . . .)− xn

= nhxn−1 + α2,n−2h2xn−2 + . . .

f (x+h)−f (x)h =

nhxn−1+α2,n−2h2xn−2+...

h

= nxn−1 + α2,n−2hxn−2 + . . .︸ ︷︷ ︸0

limh→0f (x+h)−f (x)

h = nxn−1

Dr. Jean-Michel RICHER Data Mining - Mathematics 36 / 102

Page 44: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Function 1/x

Function 1/x

Dr. Jean-Michel RICHER Data Mining - Mathematics 37 / 102

Page 45: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of 1/x

f (x) = 1/xThe derivative should have the following behaviour

X −∞ 0 ∞

x − NA −

Dr. Jean-Michel RICHER Data Mining - Mathematics 38 / 102

Page 46: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of 1/x

Derivative of 1/x

f (x + h)− f (x) = 1x+h −

1x

= x−(x+h)x(x+h)

= −hx2+hx

f (x+h)−f (x)h =

−hx2+hx

h

= −hh(x2+hx)

limh→0f (x+h)−f (x)

h = − 1x2

Dr. Jean-Michel RICHER Data Mining - Mathematics 39 / 102

Page 47: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Function log(x)

Function log(x)ln(x) or log(x)

the natural logarithm of x is the power to whiche = 2.718281 . . . would have to be raised to equal x

for example ln(7.5) = 2.0149 . . ., becausee2.0149... = 7.5used to replace products by sumsother functions:

logn(x) =ln(x)

ln(n)

Dr. Jean-Michel RICHER Data Mining - Mathematics 40 / 102

Page 48: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Function log(x)

Function log(x)

Dr. Jean-Michel RICHER Data Mining - Mathematics 41 / 102

Page 49: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of log(x)

f (x) = log(x)The derivative should have the following behaviour

X −∞ 0 ∞

x NA −∞ +

Dr. Jean-Michel RICHER Data Mining - Mathematics 42 / 102

Page 50: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Properties of the function log(x)

Properties of the function log(x)

log(1) = 0log(e) = 1log(x × y) = log(x) + log(y)log(x/y) = log(x)− log(y)log(xn) = n× log(x)

Dr. Jean-Michel RICHER Data Mining - Mathematics 43 / 102

Page 51: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of log(x)

Derivative of log(x)By definition

log(a) =

∫ a

1

1x

dx

so the derivative of log(x) is 1x

Dr. Jean-Michel RICHER Data Mining - Mathematics 44 / 102

Page 52: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of log(x)

Derivative of log(x)

f (x + h)− f (x) = log(x + h)− log(x)

Dr. Jean-Michel RICHER Data Mining - Mathematics 45 / 102

Page 53: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of log(x)

Derivative of log(x)

f (x + h)− f (x) = log(x + h)− log(x)

= log( x+hx )

Dr. Jean-Michel RICHER Data Mining - Mathematics 45 / 102

Page 54: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of log(x)

Derivative of log(x)

f (x + h)− f (x) = log(x + h)− log(x)

= log( x+hx )

= log(1 + hx )

Dr. Jean-Michel RICHER Data Mining - Mathematics 45 / 102

Page 55: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of log(x)

Derivative of log(x)

f (x + h)− f (x) = log(x + h)− log(x)

= log( x+hx )

= log(1 + hx )

f (x+h)−f (x)h = 1

h × log(1 + hx )

Dr. Jean-Michel RICHER Data Mining - Mathematics 45 / 102

Page 56: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of log(x)

Derivative of log(x)

f (x + h)− f (x) = log(x + h)− log(x)

= log( x+hx )

= log(1 + hx )

f (x+h)−f (x)h = 1

h × log(1 + hx )

= log((1 + hx )

1h )

Dr. Jean-Michel RICHER Data Mining - Mathematics 45 / 102

Page 57: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of log(x)

Derivative of log(x)

f (x + h)− f (x) = log(x + h)− log(x)

= log( x+hx )

= log(1 + hx )

f (x+h)−f (x)h = 1

h × log(1 + hx )

= log((1 + hx )

1h )

limh→0f (x+h)−f (x)

h = log((1 + hx )

1h )

Dr. Jean-Michel RICHER Data Mining - Mathematics 45 / 102

Page 58: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of log(x)

Derivative of log(x)

f (x + h)− f (x) = log(x + h)− log(x)

= log( x+hx )

= log(1 + hx )

f (x+h)−f (x)h = 1

h × log(1 + hx )

= log((1 + hx )

1h )

limh→0f (x+h)−f (x)

h = log((1 + hx )

1h )

= log(limh→0(1 + hx )

1h ) = log(e

1x )

Dr. Jean-Michel RICHER Data Mining - Mathematics 45 / 102

Page 59: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Function exp(x)

Function exp(x) (Jakob Bernoulli [ch], 1654-1705)the exponential function aka the antilogarithmexp(x) = ex = limn→∞(1 + x

n)n

e1 = 2.718281 . . .

Dr. Jean-Michel RICHER Data Mining - Mathematics 46 / 102

Page 60: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

A bit of history

Jakob Bernoulli [ch], 1654-1705)family, of Belgium origin, wererefugees fleeing from persecutionby the Spanish rulers of theNetherlandsswiss mathematician andastronomertheory of permutations andcombinations (Bernoulli numbers),by which he derived theexponential seriesLaw of large numbers, in statistics,1713

Dr. Jean-Michel RICHER Data Mining - Mathematics 47 / 102

Page 61: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Function exp(x)

Function exp(x)

Dr. Jean-Michel RICHER Data Mining - Mathematics 48 / 102

Page 62: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of exp(x)

f (x) = exp(x)The derivative should have the following behaviour

X −∞ 0 ∞

x + 1 +

Dr. Jean-Michel RICHER Data Mining - Mathematics 49 / 102

Page 63: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Properties of the function ex

Properties of the function ex

e0 = 1ex > 0 ∀xe−x = 1

ex

ex+y = ex × ey

ex−y = ex

ey

ex×y = (ex )y

Dr. Jean-Michel RICHER Data Mining - Mathematics 50 / 102

Page 64: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of ex

Derivative of ex

By definition log(ex ) = x . So we compute the derivative ofthis last expression:

log(ex ) = x

Dr. Jean-Michel RICHER Data Mining - Mathematics 51 / 102

Page 65: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of ex

Derivative of ex

By definition log(ex ) = x . So we compute the derivative ofthis last expression:

log(ex ) = x

(log(ex ))′ = (x)′

Dr. Jean-Michel RICHER Data Mining - Mathematics 51 / 102

Page 66: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of ex

Derivative of ex

By definition log(ex ) = x . So we compute the derivative ofthis last expression:

log(ex ) = x

(log(ex ))′ = (x)′

1ex × e′(x) = 1 by(DerivInv)

Dr. Jean-Michel RICHER Data Mining - Mathematics 51 / 102

Page 67: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of ex

Derivative of ex

By definition log(ex ) = x . So we compute the derivative ofthis last expression:

log(ex ) = x

(log(ex ))′ = (x)′

1ex × e′(x) = 1 by(DerivInv)

e′(x) = e(x)

Dr. Jean-Michel RICHER Data Mining - Mathematics 51 / 102

Page 68: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

To sum up

Derivatives

(xn)′ = n xn−1

( 1x )′ = − 1

x2

(log(x))′ = 1x

(e(x))′ = ex

Dr. Jean-Michel RICHER Data Mining - Mathematics 52 / 102

Page 69: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

4. Gradient

Dr. Jean-Michel RICHER Data Mining - Mathematics 53 / 102

Page 70: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient

Definition of the gradientGiven a function of several variables f (x , y , z), the gradientis the vector of the partial derivatives of f

∇f (x , y , z) = ∇f = [∂f∂x,∂f∂y

,∂f∂z

]

The partial derivative ∂f∂x is the derivative of f (x , y , z) when

y and z are considered as constants:

∂f∂x

=∂f (x , y , z)

∂x=

df (x , y , z)|y,zdx

Dr. Jean-Michel RICHER Data Mining - Mathematics 54 / 102

Page 71: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient

Property of the gradientthe gradient ∇f gives the direction toward which youcan increase the value of the functionconversly −∇f gives the direction toward which youcan decrease the value of the function

Dr. Jean-Michel RICHER Data Mining - Mathematics 55 / 102

Page 72: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Use of the gradient

Finding the minimum of a functionthe gradient can be used to find the minimum of afunction by progressively decreasing the coordinatesby substractinga fraction of the value of the gradient

Dr. Jean-Michel RICHER Data Mining - Mathematics 56 / 102

Page 73: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient

A convex quadratic functionConsider the following function:

f (x , y) = (x2 + 8× x − 4) + (y2 + 6× y − 3)

-10 -5 0 5 10-10

-5

0

5

10

-500

50100150200250300350

(x**2+8*x-4)+(y**2+6*y-3)

-50050100150200250300350

Dr. Jean-Michel RICHER Data Mining - Mathematics 57 / 102

Page 74: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient

A convex quadratic functionThe gradient of the function is:

∂f∂x = 2× x + 8

∂f∂y = 2× y + 6

The minimum is found for ∂f∂x = 0 and ∂f

∂y = 0

∂f∂x = 0 ⇒ x = −4

∂f∂y = 0 ⇒ y = −3

Dr. Jean-Michel RICHER Data Mining - Mathematics 58 / 102

Page 75: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient

Gradient descent algorithm

Data: f (x)Result: x?: the minimum of the functioninitialise vector x ;while not terminate_condition do

compute gradient ∇f ;x = x − α×∇f ;

endAlgorithm 1: A very simple descent algorithm

note that the Terminate Condition can be defined indifferent ways (improvement, number of iterations)α = 0.1 for example, if too big the algorithm won’t findthe solution

Dr. Jean-Michel RICHER Data Mining - Mathematics 59 / 102

Page 76: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient

Descent for a convex quadratic functionFor the previous convex function we obtain this:

x0= 3, y0= 5, alpha=0.1

gradient=(14, 16)

x1= 1.5999, y1= 3.4

gradient=(11.2, 12.8)

x2= 0.48, y2 = 2.1199

gradient=(8.96, 10.2399)

...

x48 = -3.99984389478361, y48 = -2.999821594038412

gradient=(0.0003122104327797359, 0.000356811923175826)

x49 = -3.999875115826888, y49 = -2.9998572752307298

Dr. Jean-Michel RICHER Data Mining - Mathematics 60 / 102

Page 77: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient

Difficulty of finding the minimumit becomes more difficult to find the minimum if

the function is not convexthe function has many minima (Rastrigin orHimmelblau functions)

Dr. Jean-Michel RICHER Data Mining - Mathematics 61 / 102

Page 78: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient

In Python

from scipy import optimize

def f(x):

return x[0]**2+8*x[0] -4+x[1]**2+6*x[1]-3

def fprime(x):

return np.array ([(2*x[0]+8) , (2*x[1]+6) ])

z = optimize.fmin_bfgs(f, [3, 5], fprime=fprime)

print(z)

Optimization terminated successfully.

Current function value: -32.000000

Iterations: 2

Function evaluations: 4

Gradient evaluations: 4

[-4. -3.]

Dr. Jean-Michel RICHER Data Mining - Mathematics 62 / 102

Page 79: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

5. Lagrangian

Dr. Jean-Michel RICHER Data Mining - Mathematics 63 / 102

Page 80: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

A bit of history

Joseph-Louis Lagrangeborn Guiseppe LodovicoLagrangia [it,fr] (1736 - 1813) was afranco-italian mathematician andastronomermade significant contributions tothe fields of analysis, numbertheory, and both classical andcelestial mechanicsin 1787, at age 51, moved fromBerlin to Paris and became amember of the French Academyof Sciencesremained in France until the end ofhis life

Dr. Jean-Michel RICHER Data Mining - Mathematics 64 / 102

Page 81: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Lagrangian multipliers

Principleyou want to minimize or maximize f (x) subject tog(x) = 0under certain conditions

I f (x) is a quadratic functionI g(x) are linear constraints

define the function

L(x , α) = f (x) + αg(x)

where α ≥ 0 ∈ R is called the lagrangian multiplier

Dr. Jean-Michel RICHER Data Mining - Mathematics 65 / 102

Page 82: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Lagrangian multipliers

Resolutiona solution of L(x , α) is a point of gradient 0so compute and solve

∂L(x ,α)∂x = 0

∂L(x ,α)∂α = g(x) = 0

or reuse in L(x , α)

Dr. Jean-Michel RICHER Data Mining - Mathematics 66 / 102

Page 83: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Method of Lagrange - example

Statement of the exampleSuppose you want to put a fence around some fieldwhich as a form of a rectangle (x , y) and you want tomaximize the area knowing that you have P meters offence: {

Max x × ysuch that P = 2x + 2y

then f (x , y) = xy and g(x) = P − 2x − 2y = 0{Max xy

such that P − 2x − 2y = 0

Dr. Jean-Michel RICHER Data Mining - Mathematics 67 / 102

Page 84: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Method of Lagrange - example

Lagrange formulation

L(x , y , α) = xy + α(P − 2x − 2y)

the derivatives give us

∂L(x , y , α)

∂x= y − 2α = 0

∂L(x , y , α)

∂y= x − 2α = 0

∂L(x , y , α)

∂α= P − 2x − 2y = 0

Dr. Jean-Michel RICHER Data Mining - Mathematics 68 / 102

Page 85: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Method of Lagrange - example

Resolutionthe first two constraints give us y = 2α = x , so x = y

in other words, the area is a squareand the last one that P = 4x = 4y

consequently α = P/8 because α = x/2 = y/2

Dr. Jean-Michel RICHER Data Mining - Mathematics 69 / 102

Page 86: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

6. Exercises

Dr. Jean-Michel RICHER Data Mining - Mathematics 70 / 102

Page 87: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Matrix - Matrix multiplication

Matrix - Matrix multiplicationConsider the following matrices:

A =

1 0 −24 5 −36 −3 −2

B =

0.5 7 38 −6 21 −2 3

Perform the products by hand and compare:

A× B ?= B × A

Dr. Jean-Michel RICHER Data Mining - Mathematics 71 / 102

Page 88: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Matrix - Matrix multiplication

Matrix - Vector multiplicationConsider the following matrix and vector:

A =

1 0 −24 5 −36 −3 −2

x =

0.581

Perform the products by hand and compare:

A× x ?= xT × A

Dr. Jean-Michel RICHER Data Mining - Mathematics 72 / 102

Page 89: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Matrix - Matrix multiplication

Vector - Vector multiplicationConsider the following vectors:

x =

146

y =

−381

Perform the following operations:

x × y and x � y

Dr. Jean-Michel RICHER Data Mining - Mathematics 73 / 102

Page 90: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Check results in python

Check results in pythonWrite a program in python to check the results of thedifferent matrix and vector products

Dr. Jean-Michel RICHER Data Mining - Mathematics 74 / 102

Page 91: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative

Derivative rulesRemember to use the following rules:

(f + g)′ = f ′ + g′ (dSum)

(f × g)′ = f ′ × g + f × g′ (dProd)

f (g(x))′ = f ′(g(x))× g′(x) (dComp)

Dr. Jean-Michel RICHER Data Mining - Mathematics 75 / 102

Page 92: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative

Derivativeswrite a python program to draw the following functions(use matplotlib) and compute their derivatives by hand:

f1(x) =1+ 1

xx−3

f2(x) = 1x2+ex

f3(x) = x×ex

1+ex

Dr. Jean-Michel RICHER Data Mining - Mathematics 76 / 102

Page 93: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative

DerivativesThe results are the following

f ′1(x) = −x2 + 2x − 3

(x − 3)2 x2

f ′2(x) = − ex + 2x

(ex + x2)2

f ′3(x) =e

x (ex + x + 1)

(ex + 1)2

Dr. Jean-Michel RICHER Data Mining - Mathematics 77 / 102

Page 94: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of f1(x)

Derivative of f1(x)

f1(x) = (1 +1x)× 1

x − 3

f1(x) = F(x)×G(x)

So we need to apply the formula (dProd)

Dr. Jean-Michel RICHER Data Mining - Mathematics 78 / 102

Page 95: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of f1(x)

Derivative of f1(x)

F(x) = (1 +1x)

F ′(x) = − 1x2

G(x) =1

x − 3= H(K (x))

withH(z) =

1z

K (z) = z − 3

Dr. Jean-Michel RICHER Data Mining - Mathematics 79 / 102

Page 96: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of f1(x)

Derivative of f1(x)

G′(x) = H ′(K (x))× K ′(x)

G′(x) = − 1(x − 3)2 × 1

Dr. Jean-Michel RICHER Data Mining - Mathematics 80 / 102

Page 97: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of f1(x)

Derivative of f1(x)Finally

f ′1(x) = − 1x2 ×

1x − 3

+ (1 +1x)×− 1

(x − 3)2

f ′1(x) = − 1x2 ×

x − 3x − 3

+ 1 +1x×− x

(x − 3)2

f ′1(x) = −(x − 3) + (x + 1)xx2(x − 3)2

f ′1(x) = −x2 + 2x − 3x2(x − 3)2

Dr. Jean-Michel RICHER Data Mining - Mathematics 81 / 102

Page 98: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of f2(x)

Derivative of f2(x)

f2(x) =1

x2 + ex

f2(x) = F(G(x))

So we need to appy the formula (dComp) with

F(z) =1z

G(z) = z2 + ez

Dr. Jean-Michel RICHER Data Mining - Mathematics 82 / 102

Page 99: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of f2(x)

Derivative of f2(x)

F ′(z) = − 1z2

G′(z) = 2z + ez (dSum)

Dr. Jean-Michel RICHER Data Mining - Mathematics 83 / 102

Page 100: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative of f2(x)

Derivative of f2(x)Finally

f ′2(x) = − 1x2 + ex × (2x + ex)

f ′2(x) = −2x + ex

x2 + ex

Dr. Jean-Michel RICHER Data Mining - Mathematics 84 / 102

Page 101: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Derivative

Derivativescheck the results with the derivative calculatorwrite a program in python using sympy to computethe derivatives of the functions

Dr. Jean-Michel RICHER Data Mining - Mathematics 85 / 102

Page 102: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient

Gradientdetermine where f1(x) is minimumusing the gradient method try to determine wheref2(x) and f3(x) are maximum or minimum

Dr. Jean-Michel RICHER Data Mining - Mathematics 86 / 102

Page 103: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient of f1

Gradient/derivative of f1

The derivative of f1 is:

f ′1(x) = −x2 + 2x − 3x2(x − 3)2

There are extremum (maximum or minimum) wheref ′1(x) = 0The function is not defined if the denominator is equal tox2(x − 3)2 = 0: {

x2 = 0 ⇒ x = 0(x − 3)2 = 0 ⇒ x = 3

Dr. Jean-Michel RICHER Data Mining - Mathematics 87 / 102

Page 104: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient/derivative of f1

Gradient/derivative of f1

The derivative is equal to 0 if:

x2 + 2x − 3 = 0

∆ = b2 − 4ac = 22 − 4× 1×−3 = 16

x1 =−b −

√∆

2a=−2− 42× 1

= −3

x2 =−b +

√∆

2a=−2 + 42× 1

= +1

thenx2 + 2x − 3 = (x − 1)(x + 3)

Dr. Jean-Michel RICHER Data Mining - Mathematics 88 / 102

Page 105: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient/derivative of f2

Gradient/derivative of f2

The derivative of f2 is:

f ′2(x) = −2x + ex

x2 + ex

The denominator x2 + e is always positive so we need to

solve:2x + ex = 0

which is not possible by analytical methods we need tofind the root by using an approximation method

Dr. Jean-Michel RICHER Data Mining - Mathematics 89 / 102

Page 106: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient/derivative of f2

Gradient/derivative of f2 (1/2)

import numpy as np

def gradient_ascent(x, df):

delta = 0.0000001

alpha = 0.1

while True:

x_n = x + alpha * df(x)

if math.fabs(x-x_n) < delta:

break

x = x_n

return x

Dr. Jean-Michel RICHER Data Mining - Mathematics 90 / 102

Page 107: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Gradient/derivative of f2

Gradient/derivative of f2 (2/2)

def f2(x):

return 1/(x*x + np.exp(x))

def df2(x):

return (-2*x - math.exp(x))/(x**2 + math.exp(

x))**2

x2_star = gradient_ascent2( -0.5, df2)

print("f2: ", x2_star , " => ", f2(np.asarray ([ x2_star

])))

Dr. Jean-Michel RICHER Data Mining - Mathematics 91 / 102

Page 108: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Lagrangian

LagrangianConsier the following problem

Max x2 + y2 + z2

such that

x + 2y + z = 12x − y − 3z = 4

use the method of Lagrange to solve it to determinex , y and z

Dr. Jean-Michel RICHER Data Mining - Mathematics 92 / 102

Page 109: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Lagrangian resolution

Lagrangian resolutionWe define

L(x , y , z, α1, α2) = f (x , y , z) +α1(x + 2y + z − 1) +α2(2x − y − 3z − 4)

with f (x , y , z) = x2 + y2 + z2

Dr. Jean-Michel RICHER Data Mining - Mathematics 93 / 102

Page 110: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Lagrangian resolution

Lagrangian resolutionWe need to compute the partial derivatives of L for eachvariable :

∂L(x ,y,z,α1,α2)∂x = 2x + α1 + 2α2 = 0 (1)

∂L(x ,y,z,α1,α2)∂y = 2y + 2α1 − α2 = 0 (2)

∂L(x ,y,z,α1,α2)∂z = 2z + α1 − 3α2 = 0 (3)

∂L(x ,y,z,α1,α2)∂α1

= x + 2y + z − 1 = 0 (4)

∂L(x ,y,z,α1,α2)∂α2

= 2x − y − 3z − 4 = 0 (5)

Dr. Jean-Michel RICHER Data Mining - Mathematics 94 / 102

Page 111: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Lagrangian resolution

Lagrangian resolutionExpress α1 from x and α2 in (1)

−2x − 2α2 = α1 (1)

2y + 2α1 − α2 = 0 (2)

2z + α1 − 3α2 = 0 (3)

x + 2y + z − 1 = 0 (4)

2x − y − 3z − 4 = 0 (5)

Dr. Jean-Michel RICHER Data Mining - Mathematics 95 / 102

Page 112: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Lagrangian resolution

Lagrangian resolutionThen replace in (2) and (3)

−2x − 2α2 = α1 (1)

2y + 2(−2x − 2α2)− α2 = 0 ⇒ 2y − 4x − 5α2 = 0 (2)

2z + (−2x − 2α2)− 3α2 = 0 ⇒ 2z − 2x − 5α2 = 0 (3)

x + 2y + z − 1 = 0 (4)

2x − y − 3z − 4 = 0 (5)

Dr. Jean-Michel RICHER Data Mining - Mathematics 96 / 102

Page 113: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Lagrangian resolution

Lagrangian resolutionBy substracting (2) and (3) we get−2x + 2y − 2z = x − y + z = 0. And finally we have asystem of 3 equations with 3 variables:

x − y + z = 0 (6) = (2)− (3)

x + 2y + z − 1 = 0 (4)

2x − y − 3z − 4 = 0 (5)

Dr. Jean-Michel RICHER Data Mining - Mathematics 97 / 102

Page 114: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Lagrangian resolution

Lagrangian resolution

Then we compute (6)− (4) and obtain y = 13 .

The rest of the resolution is obvious and we should get

x = 1615

y = 13

z = −1115

α1 = −5275

α2 = −5475

Dr. Jean-Michel RICHER Data Mining - Mathematics 98 / 102

Page 115: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Lagrangian resolution

Lagrangian resolutionIn Python:

import numpy as np

A = np.asarray ([[2,0,0,-1,-2], [0,2,0,-2,1],

[0,0,2,-1,3], [1,2,1,0,0], [2,-1,-3,0,0]])

b = np.asarray ([0,0,0,1,4])

x = np.linalg.solve(A, b)

print(x)

Dr. Jean-Michel RICHER Data Mining - Mathematics 99 / 102

Page 116: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

6. End

Dr. Jean-Michel RICHER Data Mining - Mathematics100 /

102

Page 117: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

University of Angers - Faculty of Sciences

UA - Angers2 Boulevard Lavoisier 49045 AngersCedex 01Tel: (+33) (0)2-41-73-50-72

Dr. Jean-Michel RICHER Data Mining - Mathematics101 /

102

Page 118: Data Mining - Mathematics - univ-angers.frricher/dm/data_mining_0_maths.pdf · 2018. 5. 25. · Dr. Jean-Michel RICHER Data Mining - Mathematics 8 / 102. Operations on vectors Operations

Dr. Jean-Michel RICHER Data Mining - Mathematics102 /

102