System of Linear Equations - NTUd00922011/matlab/260/20150923.pdf · System of Linear Equations A...

System of Linear Equations

A system of linear equations is a set of linear equations involving thesame set of variables.

For example, 3x +2y −z = 1

x −y +2z = −1−2x +y −2z = 0

After doing math, we have x = 1, y = −2, and z = −2.

Zheng-Liang Lu 339 / 539

A general system of m linear equations with n unknowns can bewritten as

a11x1 +a12x2 · · · +a1nxn = b1

a21x1 +a22x2 · · · +a2nxn = b2...

.... . .

... =...

am1x1 +am2x2 · · · +amnxn = bm

where x1, . . . , xn are unknowns, a11, . . . , amn are the coefficients ofthe system, and b1, . . . , bm are the constant terms.

So, we can rewrite the system of linear equations as a matrixequation, given by

Ax = b.

a11 a12 · · · a1na21 a22 · · · a2n

......

. . ....

am1 am2 · · · amn

.Zheng-Liang Lu 341 / 539

General System of Linear Equations

Let x be the column vector with n independent variables and mconstraints.1

If m = n2, then there exists the unique solution x ′.

If m > n, then there is no exact solution but there exists oneleast-squares error solution such that ‖Ax ′ − b‖2 is minimal.

I Let x ′ be the least-square error solution.I Then the error is ε = Ax ′ − b, usually not zero.I So, ‖Ax ′ − b‖2 = (Ax ′ − b)(Ax ′ − b) = ε2 is minimal.

If m < n, then there are infinitely many solutions.

1Assume that these equations are linearly independent.2Equivalently, rank(A) = rank([A b]). Or, see Cramer’s rule.

Quick Glance at Method of Least Squares

The method of least squares is a standard approach to theapproximate solution of overdetermined systems.

Matrix Division in MATLAB

Consider Ax = b for a system of linear equations.

A\b is the matrix division of A into B, which is roughly the same asinv(A)b, except that it is computed in a different way.

Left matrix divide (\) or mldivide is to do

x =A−1b

=inv(A)b

Unique Solution (m = n)

x −y +2z = −1−2x +y −2z = 0

1 >> A = [3 2 -1; 1 -1 2; -2 1 -2];2 >> b = [1; -1; 0];3 >> x = A \ b4

5 16 -27 -2

Overdetermined System (m > n)

For example, 2x −y = 2

x −2y = −2x +y = 1

1 >> A=[2 -1; 1 -2; 1 1];2 >> b=[2; -2; 1];3 >> x = A \ b4

5 16 1

Underdetermined System (m < n)

For example, x +2y +3z = 7

4x +5y +6z = 8

1 >> A = [1 2 3; 4 5 6];2 >> b = [7; 8];3 >> x = A \ b4

5 -36 07 3.3338

9 % (Why?)

Note that this solution is infinite many.

How to find the directional vector?

Gauss Elimination

Recall the Gauss Elimination in high school.

x −y +2z = −1−2x +y −2z = 0

It is known that x = 1, y = −2, and z = −2.

Check if det(A) = 0, then the program terminates; otherwise, theprogram continues.

Loop to form upper triangular matrix which looks like

1 a12 · · · a1n0 1 · · · a2n...

... 1...

0 0 · · · 1

.where aijs and bi s are the resulting values.

Use backward substitution to determine the solution vector x by

xn = bn,

xi = bi −n∑

aijxj ,

where i ∈ 1, · · · , n − 1.

Solution

1 clear; clc;2 % main3 A = [3 2 -1; 1 -1 2; -2 1 -2];4 b = [1; -1; 0];5 x = zeros(3, 1);6 if det(A) ~= 07 for i = 1 : 38 for j = i : 39 % cannot be interchanged %

10 b(j) = b(j) / A(j, i);11 A(j, :) = A(j, :) / A(j, i);12 % % % % % % % % % % % % % %13 end14 for j = i + 1 : 315 A(j, :) = A(j, :) - A(i, :);16 b(j) = b(j) - b(i);17 end

18 end19 for i = 3 : -1 : 120 x(i) = b(i);21 for j = i + 1 : 1 : 322 x(i) = x(i) - A(i, j) * x(j);23 end24 end25 else26 disp('No unique solution.');27 end

Extension

How to extend this algorithm for any simultaneous equations of nvariables?

1 clear; clc2

3 n = randi(10, 1);4 A = randi(100, n, n);5 b = randi(100, n, 1);6 if det(A) ~= 07 for i = 1 : n8 for j = i : n9 b(j) = b(j) / A(j, i);

10 A(j, :) = A(j, :) / A(j, i);11 end12 for j = i + 1 : n13 A(j,:) = A(j, :) - A(i, :);

14 b(j) = b(j) - b(i);15 end16 end17 for i = n : -1 : 118 x(i) = b(i);19 for j = i + 1 : 1 : n20 x(i) = x(i) - A(i, j) * x(j);21 end22 end23 else24 disp('No unique solution.');25 end26 x'

Exercise

rank is used to check if rank(A) =rank([A, b]).

rref is used to reduce the argumented matrix [A, b].

Use built-in functions to implement a program that solves a generalsystem of linear equations.

Solution

1 function y = linearSolver(A,b)2

3 if rank(A) == rank([A b]) % argumented matrix4 if rank(A) == size(A, 2);5 disp('Exact one solution.')6 x = A \ b7 else8 disp('Infinite numbers of solutions.')9 rref([A b])

10 end11 else12 disp('There is no solution. (Only least square ...

solutions.)')13 end

Can you replace reff with your Gaussian elimination algorithm?

Method of Least Squares

The first clear and concise exposition of the method of least squareswas published by Legendre in 1805.

In 1809, Gauss published his method of calculating the orbits ofcelestial bodies.

The method of least squares is a standard approach to theapproximate solution of overdetermined systems, i.e., sets ofequations in which there are more equations than unknowns.

To obtain the coefficient estimates, the least-squares methodminimizes the summed square of residuals.

More specific...

Let yini=1 be the observed response values and yini=1 be the fittedresponse values.

Define the error or residual εi = yi − yi for i = 1, . . . , n.

Then the sum of squares error estimates associated with the data isgiven by

S =n∑

ε2i . (1)

Linear Least Squares

A linear model is defined as an equation that is linear in thecoefficients.

Suppose that you have n data points that can be modeled by a1st-order polynomial, given by

y = ax + b.

By (1), εi = yi − (axi + b).

Now S =∑n

i=1(yi − (axi + b))2.

The least-squares fitting process minimizes the summed square of theresiduals.

The coefficient a and b are determined by differentiating S withrespect to each parameter, and setting the result equal to zero.(Why?)

Hence,

∂a=− 2

n∑i=1

xi (yi − (axi + b)) = 0,

∂b=− 2

n∑i=1

(yi − (axi + b)) = 0.

The normal equations are defined as

x2i + b

n∑i=1

xi =n∑

xiyi ,

xi + nb =n∑

In fact, [ ∑ni=1 x2

∑ni=1 xi∑n

i=1 xi n

[ ∑ni=1 xiyi∑ni=1 yi

Solving for a, b, we have

a =n∑n

i=1 xiyi −∑n

i=1 xi∑n

i=1 yin∑n

i=1 x2i − (

∑ni=1 xi )2

=Cov(X ,Y )

Var(X ),

where Cov(X ,Y ) refers to the covariance between X and Y .

Also, we have

n∑i=1

yi − an∑

Example: Drag Coefficients

Let v be the velocity of a moving object and k be a positive constant.

The drag force due to air resistance is proportional to the square ofthe velocity, that is, d = kv2.

In a wind tunnel experiment, the velocity v can be varied by settingthe speed of the fan and the drag can be measured directly.

The following sequence of commands replicates the data one mightreceive from a wind tunnel:

1 clear; clc;2 % main3 v = [0 : 1 : 60]';4 d = [0.1234 * v .ˆ 2]';5 dn = d + 0.4 * v .* randn(size(v));6 figure(1), plot(v, dn, '*', v, d, 'r-'); grid on;7 legend('Data', 'Analytic');

The unknown coefficient k is to be determined by the method of leastsquares.

The formulation could bev21 k = dn1

v22 k = dn2

...v261k = dn61

Recall that for any matrix A and vector b with Ax = b, x = A\breturns the least square best fit.

1 >> k = v .ˆ2 \ dn % note that the v and dn vectors are ...row vectors, need to be transposed.

3 k =4

5 0.1239

Polynomials in MATLAB

In fact, all polynomials of n-th order with addition and multiplicationto scalars form a vector space, denoted by Pn.

In general, f (x) is said to be a polynomial of n-order if f (x)is given by

f (x) = anxn + an−1xn−1 + · · ·+ a0,

where an 6= 0.

It is convenient to express a polynomial by a coefficient vector(an, an−1, . . . , a0).

I Note that the elements are the coefficients of polynomial in descendingpowers.

Arithmetic Operations

P1 + P2 returns the addition of two polynomials.

P1 − P2 returns the subtraction of two polynomials.

conv(P1,P2) returns the resulting coefficient vector for multiplicationof the two polynomials P1 and P2.3

[Q,R] =deconv(B,A) deconvolves vector A out of vector B.I The result is returned in vector Q and the remainder in vector R such

thatB = conv(A,Q) + R.

I This is so-called “Euclidean division algorithm.”

polyval(P,X ) returns the values of a polynomial P evaluated atx ∈ X .

3See Convolution.Zheng-Liang Lu 368 / 539

1 clear; clc;2 % main3 p1 = [1 -2 -7 4];4 p2 = [2 -1 0 6];5 x = -1 : 0.1 : 1;6 addition = p1 + p2 % addition7 sub = p1 - p2 % substraction8 mul = conv(p1, p2) % multiplcaition9 [q, r] = deconv(p1, p2) % division: q is quotient and r ...

is remainder.10 plot(x, polyval(p1, x), 'o', x, polyval(p2, x), '*', ...11 x, polyval(mul, x), 'd');12 grid on; axis tight;13 legend('p1', 'p2', 'conv(p1, p2)');

−1 −0.5 0 0.5 1

p1p2conv(p1,p2)

Roots Finding

roots(P) returns a column vector whose elements are the roots of thepolynomial P.

For example,

1 clear; clc;2 % main3 p = [1, 3, 1, 5, -1];4 r = roots(p) % To find all roots5 x = -4 : 0.1 : 1;6 plot(x, polyval(p, x), '--'); hold on; grid on;7 for i = 1 : length(r)8 if isreal(r(i)) == 19 plot(r, polyval(p, r(i)), 'ro');

10 end11 end12 polyval(p, r) % To verify the roots

1 >> r =2

3 -3.20514 0.0082 + 1.2862i5 0.0082 - 1.2862i6 0.18867

8 >> ans =9

10 1.0e-013 *11

12 0.404113 -0.0133 + 0.0529i14 -0.0133 - 0.0529i15 0

Why not exactly zero?

−4 −3 −2 −1 0 1−20

Integral and Derivative of Polynomials

polyder(P) returns the derivative of the polynomial whosecoefficients are the elements of vector P in descending powers.

polyint(P,K ) returns a polynomial representing the integral ofpolynomial P, using a scalar constant of integration K .

1 clear; clc2 % main3 p = [4 3 2 1];4 p der = polyder(p)5 p int = polyint(p, 0) % assume K = 0

Exercise

Let f (x) = 4x3 + 3x2 + 2x + 1 for x ∈ R.

Then determine the coefficients of its derivative f ′ and integrationF (x) =

∫ x0 f (t)dt.

Do not use the built-in functions.I Try to manipulate array indexing using for loop.

1 clear; clc2 p = [4 3 2 1]3 K = 0; % constant of integration4 q1 = zeros(1, length(p));5 for i = 2 : length(p) - 16 q1(i) = p(i - 1) * (length(p) - (i - 1));7 end8 q2 = zeros(1, length(p) + 1);9 q2(length(q2)) = K;

10 for i = 1 : length(p)11 q2(i) = 1 / (length(p) - i + 1) * p(i);12 end13 q114 q2

Compare your result to the answer provided by polyder and polyint.

1 >> Lecture 62 >>3 >> -- User-Controlled Input and Output4 >>

Contents

High-Level File I/O

Low-Level File I/O

Access to Internet

Everything is encoded in binary codes.

American Standard Code for Information Interchange4, aka ASCII, isa character-encoding scheme originally based on the English alphabetthat encodes 128 specified characters into the 7-bit binary integers:

I the numbers 0, 1, . . . , 9,I the letters a-z and A-Z,I some basic punctuation symbols,I some control codes that originated with teletype machines,I and a blank space.

4See ASCII.Zheng-Liang Lu 379 / 539

ASCII can be traced back to 1967, and the current version was issuedin 1986.

Unicode5 became a standard for the modern systems from 2007.

ASCII was incorporated into the Unicode character set as the first 128symbols, so the ASCII characters have the same numeric codes inboth sets.

5See here.Zheng-Liang Lu 381 / 539

importdata

importdata() can recognize the common file extensions.

One can load data from the specified file into array A byA =importdata(filename).

I Note that the file name must be single-quoted.I If not, importdata interprets the file as a delimited ASCII file as

default.I importdata(′−pastespecial ′) loads data from the system clipboard

rather than from a file.

Supporting the following file types:I MAT-filesI ASCII filesI SpreadsheetsI Images6

I Audio files7

6Check imread.7Check audioread.

Example

1 >> A = importdata('ngc6543a.jpg');2 >> image(A);

100 200 300 400 500 600

Example

Using a text editor, create a space-delimited ASCII file with columnheaders called myfile01.txt.

1 Day1 Day2 Day3 Day4 Day5 Day6 Day72 35.627 48.483 35.94 41.978 42.941 48.429 37.9583 37.976 45.544 54.247 53.332 54.411 45.959 53.0384 45.23 47.361 54.34 51.759 44.33 40.981 51.9375 46.924 36.816 42.832 41.372 38.775 45.613 40.8856 45.632 36.362 40.214 51.419 49.265 44.252 44.048

1 clear; clc;2 % main3 A = importdata('myfile01.txt', ' ', 1);4 for k = [3, 5]5 disp(A.colheaders1, k) % headers of columns6 disp(A.data(:, k)) % numeric data7 end

Note that A is a structure array. (Why?)

1 Day32 35.943 54.2474 54.345 42.8326 40.2147

9 Day510 42.94111 54.41112 44.3313 38.77514 49.265

Access to Delimited Text Files

dlmread(filename, delimiter) reads ASCII-delimited file of numericdata.

dlmwrite(filename,M, delimiter) writes the array M to the file usingthe specified delimiter to separate array elements.

I The default delimiter is the comma (,).I dlmwrite(filename,M,’-append’) appends the data to the end of the

existing file.

dlmread(filename, delimiter ,R,C ) reads data whose upper left corneris at row R and column C in the file.

I R and C start from 0. (R,C ) = (0, 0) specifies the first value in thefile.

Example

1 >> M = gallery('integerdata', 100, [5 8], 0);2 >> dlmwrite('myfile.txt', M, 'delimiter', '\t');

1 >> dlmread('myfile.txt', 't')2 >> dlmread('myfile.txt', '\t', 2, 3)

textread

textread is useful for reading text files with a known format.

[A,B,C , . . .] =textread(filename, format,N) reads data from the filefilename into the variables A,B,C , and so on, using the specifiedformat, format for N lines.

I format determines the number and types of return arguments. (Seenext page.)

I If you drop N, then textread reads until the end of file.

The common conversions are as follows:I %d: as signed integer values.I %f: as floating-point values.I %s: as a white-space or delimiter-separated string.

Example

Create a mydata.dat with the following content:

1 Sally Level1 12.34 45 Yes2 Arthur Level2 19.85 29 No

1 >> [names, types, x, y, answer] = textread('mydata.dat', ...'%s %s %f %d %s') % normal usage

1 >> [names, types, x, answer] = textread('mydata.dat', ...'%s Level%d %f %*d %s', 1) % check the difference!

In %*f, * ignores the matching characters specified by *.

Access to Excel Files

xlsread(filename, sheet, xlRange) reads from the specified sheet andrange.

I sheet can be the sheet name8 or a sheet number in the excel file.I xlRange is optional for the rectangular portion of the worksheet to read.I For example, xlRange =’B:B’ is used to import column B.I To read a single value, use xlRange = ’B1:B1’.9

xlswrite(filename,A, sheet, xlRange) writes the array A to thespecified range of the sheet.

8The default sheet name is “工作表1”.9Contribution by Mr. Tsung-Yu Hsieh (MAT24409) on August 27, 2014.

Example

1 >> values = 1, 2, 3; 4, 5, 'x'; 7, 8, 9;2 >> headers = 'First', 'Second', 'Third';3 >> xlswrite('myExample.xlsx', [headers; values]); % write

1 >> subsetA = xlsread('myExample.xlsx', 1, 'B2 : C3') % read2

3 subsetA =4

5 2 36 5 NaN

Low-Level File I/O

Low-level file I/O functions allow the most control over reading orwriting data to a file.

However, these functions require that you specify more detailedinformation about your file than the easier-to-use high-level functions,such as importdata.

If the high-level functions cannot import your data, you may considerto use low-level file I/O.

The normal procedure looks like:1 Open a file.2 Read or write data into the file.3 Close the file.

Open Files

fid = fopen(filename, permission) opens the file, filename, for binaryread access, and returns an integer as fid equal to or greater than 3.

I fid refers to the file identifier.I permission: file access type, specified as a string.I MATLAB reserves fids 1 and 2 for standard output on the screen and

standard error, respectively. (You will see later.)I If fopen fails to open the file with ’r’, then fid is −1.

You can fopen a new file with permission ’w’.I Be aware that you can repeatedly open a new file with the same file

fopen – Permission Codes

’r’ is to open file for reading.I ’r+’ additionally for writing.

’w’ is to open or create new file for writing.10

I ’w+’ additionally for reading.

’a’ is to open or create new file for writing. Also, it appends data tothe end of the file.

I ’a+’ additionally for reading.

10Be aware that it will discard existing contents, if any.Zheng-Liang Lu 395 / 539

Common Commands in Low-Level File I/O

feof(fid), which refers to “end-of-file”, returns 1 if a previousoperation set the end-of-file indicator for the specified file.

fgetl(fid) returns the next line of the specified file, removing thenewline characters.

I If the line contains only the end-of-file marker, then the return value is−1.

Example

Write a script to show the content of a text file, say, “fgetl.m”.

1 clear; clc2

3 f = fopen('fgetl.m', 'r');4 while ~feof(f)5 disp(fgetl(f));6 end7 fclose(f);

Close Files

fclose(fid) closes an opened file.

fclose(’all’) closes all opened files.

fclose returns a status of 0 when the close operation is successful.I Otherwise, it returns −1.

fprintf

Recall that fprintf(format,A1, ...,An) formats data and displays theresults on the screen11.

fprintf(fid , format,A1, . . . ,An) applies the format to all elements ofarrays A1, . . . ,An in column order, and writes the data to a text file.

I format: to specify a format for the output fields; it is a string.I A1, . . . ,An: arrays for the output fields.

Matlab reserves the file identifier number 1 and 2 for standard outputon the screen, and standard error, respectively.

I fprintf(1, ’This is standard output!\n’);I fprintf(2, ’This is standard error!\n’);

sprintf(format,A1, . . . ,An), similar to fprintf but returns the resultsas a string.

11Recall disp.Zheng-Liang Lu 399 / 539

Example

fprintf can print multiple numeric values and literal text to the screen.

1 >> A = [9.9 8.8 7.7; 9900 8800 7700];2 >> format = 'X is %4.2f meters or %8.3f mm.\n';3 >> fprintf(format, A) % print on the screen4

5 X is 9.90 meters or 9900.000 mm.6 X is 8.80 meters or 8800.000 mm.7 X is 7.70 meters or 7700.000 mm.

%4.2f specifies that the first value in each line of output is afloating-point number with a field width of four digits, including twodigits after the decimal point.

I Can you explain %8.3f?

Escape Characters

%%: Percent character

\\: Backslash

\b: Backspace

\n: New line

\t: Horizontal tab

format

Integer, signed: %d or %i

Integer, unsignedI %u: Base 10I %o: Base 8I %x: Base 16

Floating-point number12

I %f: Fixed-point notationI %e: Exponential notation, such as 3.141593e+00

CharactersI %c: Single characterI %s: String

Note that the % can be followed by an optional field width to handlefixed width fields.

12See IEEE 754.Zheng-Liang Lu 402 / 539

Example: grep in UNIX/Linux-like Systems

findstr(S1,S2) returns the starting indices of any occurrences of theshorter of the two strings in the longer.

1 function grep(f, pattern)2 f = fopen(f, 'r');3 cnt = 0;4 while ~feof(f)5 t = fgetl(f);6 cnt = cnt + 1;7 w = findstr(t, pattern);8 if ~isempty(w)9 fprintf('%d: %s\n', cnt, t);

10 end11 end12 fclose(f);13 end

Exercise

Write a program which produces a multiplication table into a text file.

1 1 2 3 4 5 6 7 8 92 2 4 6 8 10 12 14 16 183 3 6 9 12 15 18 21 24 274 4 8 12 16 20 24 28 32 365 5 10 15 20 25 30 35 40 456 6 12 18 24 30 36 42 48 547 7 14 21 28 35 42 49 56 638 8 16 24 32 40 48 56 64 729 9 18 27 36 45 54 63 72 81

1 clear; clc;2

3 f = fopen('multiplicationTable.txt', 'w');4 for i = 1 : 95 for j = 1 : 96 fprintf(f, '%3d', i * j);7 end8 fprintf(f, '\n');9 end

10 fclose(f);

fscanf

A = fscanf(fid , format, size) reads data from the file specified by fileidentifier fid , converts it according to the specified format string, andreturns it in matrix A.

I fscanf populates A in column order.

fscanf can be used to skip specific characters in a sample file, andreturn only numeric data.

Example

1 clear all;2 clc3 str=4 ['78' char(176) 'C'];5 ['72' char(176) 'C'];6 ['64' char(176) 'C'];7 ['66' char(176) 'C'];8 ['49' char(176) 'C'];9 % char(176) is the symbol of degree

10 fid=fopen('temperature.txt', 'w');11 for i=1:length(str)12 fprintf(fid, '%s\n', stri);13 end14 fclose(fid);15 % main16 fid=fopen('temperature.txt', 'r');17 [A, count]=fscanf(fid, ['%d' char(176) 'C'])18 fclose(fid);

3 A =4

5 786 727 648 669 49

12 count =13

Binary Files

fread(fid , size, precision) interprets values in the file according to theform and size described by precision.

fwrite(fid ,A, precision) translates the values of A according to theform and size described by precision.

Valid entries for size are:I N: read N elements into a column vector.I inf : read to the end of the file.I [M,N]: read elements to fill an M-by-N matrix, in column order.

Valid entries for precision are:I ’uchar’: unsigned integer, 8 bits.I ’int64’: integer, 64 bits.I ’uint64’: unsigned integer, 64 bits.I ’float64’: floating point, 64 bits.

Note that “64” can be replaced by 8, 16, and 32.

Example

Create a binary file containing a 3-by-3 magic square, whose elementis stored as 4-byte integers.

1 clear all;2 clc3 % main4 A=magic(3)5 fid = fopen('magic3.txt', 'w');6 fwrite(fid, A, 'int32');7 fclose(fid);8 fid = fopen('magic3.txt', 'r');9 fread(fid, [3 3], 'int32'); % try [3 1]?

Access to Internet

urlread(URL,Name,Value) returns the contents of a URL as a string.

1 contents = ...urlread('http://www.csie.ntu.edu.tw/~d00922011/matlab.html');

2 f = fopen('matlab.html', 'w');3 fprintf(fid, '%s', contents);4 fclose all;5 dos('start matlab.html');

Try sendmail, ftp.

Yahoo Finance API

Current market and historical data from the Yahoo! data server

Blog: 研究雅虎股票API (Yahoo finance stock API)

Google: yahoo-finance-managed

Historical Stock Data downloader by Josiah Renfree (2008)

1 >> Lecture 72 >>3 >> -- Optimization4 >>

“In my opinion, no single designis apt to be optimal for everyone.”

– Donald Norman (1935–)

Contents

Introduction

Optimization Problem in Standard Form

Linear Programming Problems

Quadratic Programming Problems

Unconstrained Nonlinear Programming

Introduction

Mathematical optimization is to find the optimal selection of feasiblesolutions with regard to the specific criteria.

In the simplest case, an optimization problem consists of maximizingor minimizing a real function by systematically choosing input valuesfrom within an allowed set and computing the value of the function.

The generalization of optimization theory and techniques to otherformulations comprises a large area of applications.

I EE: circuit layout, fabrication parameters of transistors...I CS: model parameters in machine learning...I Fin: optimal portfolio...I Economics: tax, wage rate...I · · ·

Optimization Problem in Standard Form (1/3)

An optimization problem can be represented in the following way:I Given a function f : M → R.I (Minimization) Find x0 ∈ M such that f (x0) ≤ f (x) for all x ∈ M.I (Maximization) Find x0 ∈ M such that f (x0) ≥ f (x) for all x ∈ M.

Many real-world and theoretical problems may be modeled in thisgeneral framework.

Typically, M is some subset of the Euclidean space Rn, often specifiedby a set of constraints, equalities or inequalities that the members ofM have to satisfy.

The domain M of f is called the search space or the choice set,while the elements of M are called feasible solutions.

The function f is called an objective function.I Aka loss function, cost function, utility function, and fitness

function.

A feasible solution that minimizes (or maximizes, if that is the goal)the objective function is called an optimal solution.

By convention, the standard form of an optimization problem isstated in terms of minimization.

Convex optimization, a subfield of optimization, studies the problemof minimizing convex functions over convex sets. (You will see later.)

With recent improvements in computing and in optimization theory,convex minimization is nearly as straightforward as linearprogramming13.

Many optimization problems can be reformulated as convexoptimization problems.

For example, the problem of maximizing a concave function f can bere-formulated equivalently as a problem of minimizing the function−f , which is convex.

13Aka 線性規劃(高二上).Zheng-Liang Lu 420 / 539

Classification of Optimization Problems

Finite- vs. infinite-dimensional problems

Unconstrained vs. constrained problems

Convex vs. non-convex problems

Linear vs. non-linear problems

Continuous vs. discrete problems

Deterministic vs. stochastic problems

Example

Consider f (x) = x4 − 10.5x3 + 39x2 − 59.5x + 30.

1 >> g = @(x) polyval([1 -10.5 39 -59.5 30], x);2 >> x = 1 : 0.05 : 4;3 >> plot(x, g(x)); grid on;4 >> [s, fval] = fminunc(g, 0) % unconstrained minimizing g5

6 s =7

8 1.48789

10 fval =11

12 -1.8757

Try: [s, fval ] = fminunc(g , 5)

The minimal point returns depending on the initial guess.

Example: Utility Maximization

A consumer has a budget flush with w = 10 and faces pricesp1 = 1, p2 = 2 for Product 1 and 2, respectively.

Let x1 and x2 be the weights of two products, and u be the utilityfunction over two goods, given by u = x0.8

1 + x0.82 .

Then, what is the optimal (x1, x2)?

We can observe the behavior of u first.

∂xi= 0.8x−0.2i > 0 for all xi > 0.

I∂2u

∂x2i

= −0.16x−1.2i < 0 for all xi > 0.

I So, u increases with a gradual slowdown as x1 and x2 increase.

Formulate the problem into a standard form of optimization:

maxx1,x2u,

x1p1 + x2p2 = 10.

Let Aeq =[

], ~x =

]and beq = 10. Then the second

equation can be Aeq · ~x = beq.

mesh (Recap)

1 [X,Y]=meshgrid(0:.5:10);2 u=(X.ˆ0.8+Y.ˆ0.8);3 LX=0:.5:10;4 LY=-0.5*LX+5;5 uu=LX.ˆ0.8+LY.ˆ0.8;6 mesh(X,Y,u);grid on; hold on;7 plot3(LX,LY,uu);

8 +y0.

1 >> f = @(x) -(x(1) ˆ 0.8 + x(2) ˆ 0.8);2 >> Aeq = [1 2];3 >> beq = 10;4 >> x 0 = [8 1]; % initial guess5 >> [xx, fval] = fmincon(f, x 0, [], [], Aeq, beq)6

7 xx =8

9 9.4118 0.294110

11 fval =12

13 -6.3865

The maximization problem could be equivalent to minimize −u.

Linear Programming Problems

If the objective function f and the defining functions of M are linear,then the problem you are concern about will be a linear optimizationproblem.

A general form of a linear programming problem is given by

That is, f (x) = cT x andM = x ∈ Rn|Ax = a,Bx ≤ b, lb ≤ x ≤ ub.

Once you have defined the matrices A, B, and the vectors c , a, b, lband ub, then you can call linprog to solve the problem:

[x , fval , exitflag , output, lambda]

= linprog(c ,A, a,B, b, lb, ub, x0, options),

whereI c : coefficient vector of the objectiveI A: matrix of inequality constraintsI a: right hand side of the inequality constraintsI B or []: matrix of equality constraints, or no constraintsI b or []: right hand side of the equality constraints, or no constraintsI lb, ub or []: lower/upper bounds for x , or no lower/upper boundsI x0: initial vector for the algorithm if known; otherwise [].I options: options are set using the optimset funciton which determines

the details in the algorithm.

(Continued)I x : optimal solutionI fval : optimal value of the objective functionI exitflag : tells whether the algorithm converged or not (exitflag > 0

means convergence.)I output: a struct for number of iterations, algorithm used and PCG

iterations (when LargeScale = on)I lambda: a struct containing Lagrange multipliers corresponding to the

constraints

About Optimset

The input argument options is a structure, which contains severalparameters that you can use with a given Matlab optimizationroutine. (Try optimset(’linprog’)!!)

For example,

1 >> options=optimset('ParameterName1',value1,...2 'ParameterName2',value2,...)

The following are parameters and their corresponding values whichare frequently used with linprog:

I ′LargeScale′: ’on’,’off’I ′Simplex ′: ’on’,’off’I ′Display ′: ’iter,’final’,’off’I ′Maxiter ′: maximum number of iterationI ′TolFun′: termination tolerance for the objective functionI ′TolX ′: termination tolerance for the iteratesI ′Diagnostics ′: ’on’ or ’off’

Example 1

Solve the following linear optimization problem using linprog.

1 c = [-2,-3]';2 A = [1,2; 2,1; 0,1];3 a = [8, 10, 3]';4 options = optimset('LargeScale', 'off');5 xsol = linprog(c, A, a, [], [], [], [], [], options);

1 Optimization terminated.2

3 xsol =4

5 4.00006 2.00007

Example 2

Solve the following LP using linprog:

1 clear; clc;2

3 A = [1, 1, 1, 1, 1, 1; 5, 0, -3, 0, 1, 0];4 a = [10, 15]';5 B1 = [1, 2, 3, 0, 0, 0;6 0, 1, 2, 3, 0, 0;7 0, 0, 1, 2, 3, 0; 0, 0, 0, 1, 2, 3];8 b1 = [5, 7, 8, 8]';9 D = [3, 0, 0, 0, -2 ,1 ;0 ,4 ,0 , -2, 0, 3];

10 d = [5, 7]';11 lb = [-2, 0, -1, -1, -5, 1]';12 ub = [7, 2, 2, 3, 4, 10]';13 c = [1, -2, 3, -4, 5, -6]';14 B = [-B1; D]; b = [-b1; d];15

16 [xsol, fval, exitflag, output] = linprog(c, A, a, B, b, ...lb, ub)

17 fprintf('%s %s \n', 'Algorithm Used: ', output.algorithm);18 disp('============================');

20 options = optimset('linprog');21 options = optimset(options, 'LargeScale', 'off',...22 'Simplex', 'on', 'Display', 'iter');23 [xsol, fval, exitflag] = linprog(c, A, a, B, b, lb, ub, ...

[], options)24 fprintf('%s %s \n', 'Algorithm Used: ',output.algorithm);25 fprintf('%s','Reason for termination:')26 if (exitflag)27 fprintf('%s \n',' Convergence.');28 else29 fprintf('%s \n',' No convergence.');30 end

Example 3: Approximation of discrete Data by a Curve

Suppose the measurement of a real process over a 24 hours period begiven by the following table with 14 data values:

The values ti represent time and ui ’s are measurements.

Assuming there is a mathematical connection between the variables tand u, we would like to determine the coefficients a, b, c , d , e ∈ R ofthe function

u(t) = at4 + bt3 + ct2 + dt + e,

so that the value of the function u(ti ) could best approximate thediscrete value ui at ti , i = 1, . . . , 14. in the Chebychev sense14.

14http:

//en.wikipedia.org/wiki/Approximation_theory#Chebyshev_approximation

Hence, we need to solve the Chebyshev approximation problem, whichis written as

Reformulate it into a linear programming problem:I Objective function?I Constraints?

Solution to Chebyshev Approximation Problem

Define the additional variablef := maxi=1,...,14 |ui − (at4i + bt3i + ct2i + dti + e)|.Then the problem can be equivalently written as

minf ,−(at4i + bt3i + ct2i + dt i + e)− f ≤ −ui ,

(at4i + bt3i + ct2i + dt i + e)− f ≤ ui ,

where i ∈ 1, . . . , 14.More specific, [A]28×6[x ]6×1 ≤ [u]28×1.

I Note that [x ]6×1 = [a, b, c , e, d , f ]′.

1 clear all;2 clc3

4 t=[0,3,7,8,9,10,12,14,16,18,19,20,21,23]';5 u=[3,5,5,4,3,6,7,6,6,11,11,10,8,6]';6 A1=[-t.ˆ4,-t.ˆ3,-t.ˆ2,-t,-ones(14,1),-ones(14,1)];7 A2=[t.ˆ4,t.ˆ3,t.ˆ2,t,ones(14,1),-ones(14,1)];8 c=zeros(6,1);9 c(6)=1; % objective function coefficient (why?)

10 A=[A1;A2]; % inequality constraint matrix11 a=[-u;u]; % right hand side vectro of ineq constraints12 [xsol,fval,exitflag]=linprog(c,A,a);13

14 plot(t,u,'r*'); hold on; grid on;15 tt=0:0.5:25;16 ut=xsol(1)*(tt.ˆ4)+xsol(2)*(tt.ˆ3)+xsol(3)*(tt.ˆ2)+...17 xsol(4)*tt+xsol(5);18 plot(tt,ut,'-k','LineWidth',2)

0 5 10 15 20 251

Exercise

randi([range],m, n) generates an m-by-n random matrix integervalues drawn uniformly in range.

Use randi as a set of input pairs of the program in ChebyshevApproximation Problem.

I Let t be a simple sequence like 0 : 1 : m.I Let u be a sequence generated by randi.

See the fitting result.

Integer Programming

Integer programming problem is a mathematical optimization inwhich some or all of the variables are restricted to be integers.

I Sometimes called integer linear programming (ILP), in which theobjective function and the constraints (other than the integerconstraints) are linear.

Note that integer programming is much harder than linearprogramming in general. (Why?)

Quadratic Programming Problems

Quadratic programming is a special type of mathematicaloptimization problem, which optimizes (minimizing or maximizing) aquadratic function of several variables subject to linear constraints onthese variables.

Let Q ∈ Rn×n,A ∈ Rm×n,B ∈ l × n, aRm, and b ∈ Rl . Then ageneral form of a quadratic programming problem is given by

The general form for calling quadprog of the problem is given by

[xsol , fval , exitflag , output, lambda]

= quadprog(Q, q,A, a,B, b, lb, ub, x0, options),

whereI Q: Hessian of the objective functionI q: Coefficient vector of the linear part of the objective functionI A or []: matrix of inequality constraints, or no inequality constraintsI a or []: right hand side of the inequality constraints, or no inequality

constraintsI B or []: matrix of equality constraintsI b or []: right hand side of the equality constraintsI lb, ub or []: lower/upper bounds for x , or no lower/upper boundsI x0: initial vector for the algorithm if known; otherwise [].I options: options are set using the optimset function which determines

the details in the algorithm.

(Continued)I x : optimal solutionI fval : optimal value of the objective functionI exitflag : tells whether the algorithm converged or not (exitflag > 0

means convergence.)I output: a struct for number of iterations, algorithm used and PCG

iterations (when LargeScale = on)I lambda: a struct containing Lagrange multipliers corresponding to the

constraints

Example 4

Solve the following quadratic optimization problem using quadprog.

Try to re-formulate it.

1 clear all;2 clc3

4 Q=[2,0;0,4];5 q=[2,3]';6 A=[1,2;2,1;0,1];7 a=[8,10,3]';8 lb=[0,0]';9 ub=[inf;inf]';

11 options=optimset('quadprog');12 options=optimset('LargeScale','off');13 [xsol,fsolve,exitflag,output]=...14 quadprog(Q,q,A,a,[],[],lb,ub,[],options)15

16 fprintf('Convergence ');17 if exitflag > 018 fprintf('succeeded.\n');19 xsol

20 else21 fprintf('failed.\n');22 end23 fprintf('Algorithm used: %s \n' ,output.algorithm);24

25 x=-3:0.1:3;26 y=-4:0.1:4;27 [X,Y]=meshgrid(x,y);28 Z=X.ˆ2+2*Y.ˆ2+2*X+3*Y;29 meshc(X,Y,Z); hold on;30 plot(xsol(1),xsol(2),'r*');

−4−2

4−20

z=x2 +

y2 +2x

Example 5

Solve the following LP using quadprog:

1 clear; clc;2 % Initialize3 Q = [2, 1, 0; 1, 4, 2; 0, 2, 4];4 q = [4, 6, 12];5 A = [-1, -1, -1; 1, 2, -2];6 a = [-6, -2];7 lb = [0; 0; 0];8 ub = [inf; inf; inf];9

10 options = optimset('quadprog'); % choose quadratic ...programming

11 options = optimset('LargeScale', 'off');12

13 [xsol, fsolve, exitflag, output] = quadprog(Q, q, A, a, ...[], [], lb, ub, [], options);

15 fprintf('Convergence ');16 if exitflag > 017 fprintf('succeeded.\n');

18 xsol19 else20 fprintf('failed.\n');21 end22 fprintf('Algorithm used: %s \n', output.algorithm);

1 Optimization terminated.2 Convergence succeeded.3

4 xsol =5

6 3.33337 08 2.66679

10 Algorithm used: medium-scale: active-set

Curve Fitting

Curve fitting is the process of constructing a curve, or mathematicalfunction, that has the best fit to a series of data points, possiblysubject to constraints.

Curve fitting requires a parametric model that relates the responsedata to the predictor data with one or more coefficients.

The result of the fitting process is an estimate of the modelcoefficients.

Common Techniques

Polynomial interpolation1 Newton form2 Lagrange form3 Polynomial splines

Method of least squares

You can find more details in curve fitting in the link:http://www.mathcs.emory.edu/~haber/math315/chap4.pdf.

Example: Chebyshev Approximation Problem (Revisited)

Suppose the measurement of a real process over a 24 hours period begiven by the following table with 14 data values:

The values ti represent time and ui ’s are measurements.

Consider the polynomial u(t) = at4 + bt3 + ct2 + dt + e.

Please determine the coefficients a, b, c , e, d and e, so that the valueof the function u(ti ) could best approximate the discrete value ui atti , i = 1, . . . , 14 in the sense of least square error.

1 clear ; clc;2 % main3 t = [0, 3, 7, 8, 9, 10, 12, 14, 16, 18, 19, 20, 21, 23]';4 u = [3, 5, 5, 4, 3, 6, 7, 6, 6, 11, 11, 10, 8, 6]';5 plot(t, u, 'r*'); hold on; grid on;6 % least squares7 X = [t .ˆ 4, t .ˆ 3, t .ˆ 2, t .ˆ 1, ones(length(u), 1)];8 b = X \ u;9 tt = 0 : 0.5 : 25;

10 ut1 = b(1) * tt .ˆ 4 + b(2) * tt .ˆ3 + b(3) * tt .ˆ2 + ...b(4) * tt + b(5);

11 plot(tt, ut1, '-g', 'LineWidth', 2);12 % chebyshev13 A1 = [-t .ˆ 4, -t .ˆ 3, -t .ˆ 2, -t, -ones(14, 1), ...

-ones(14, 1)];14 A2 = [t .ˆ 4, t .ˆ 3, t .ˆ 2, t, ones(14, 1), -ones(14, 1)];15 c = zeros(6, 1);16 c(6) = 1; % objective function coefficient (why?)17 A = [A1; A2];

18 a = [-u; u];19 [xsol, fval, exitflag] = linprog(c, A, a);20 ut2 = xsol(1) * tt .ˆ 4 + xsol(2) * tt .ˆ 3 + xsol(3) * ...

tt .ˆ2 + xsol(4) * tt + xsol(5);21 plot(tt, ut2, '-k', 'LineWidth', 2)22 % Sum of square errors23 sum square error = [sum((polyval(b, t') - u') .ˆ 2) ...

sum((polyval(xsol, t') - u') .ˆ 2)]

1 sum square error =2

3 17.596 1.9151e+005

Linear Least Squares in MATLAB

[x , resnorm, residual , exitflag , output] =lsqlin(C , d ,A, b,...Aeq, beq, lb, ub, x0, options) returns a structure output that containsinformation about the optimization.

I c : matrixI d : vectorI Aineq: matrix for linear inequality constraintsI bineq: vector for linear inequality constraintsI Aeq: matrix for linear equality constraintsI beq: vector for linear equality constraintsI lb, ub: vector of lower/upper boundsI x0: initial point for xI options: using the optimset funciton which determines the details in

the algorithm.

Example

1 >> C = [2 0.9501 0.7620 0.6153 0.40573 0.2311 0.4564 0.7919 0.93544 0.6068 0.0185 0.9218 0.91695 0.4859 0.8214 0.7382 0.41026 0.8912 0.4447 0.1762 0.8936];7 >> d = [8 0.05789 0.3528

10 0.813111 0.009812 0.1388];

1 clear; clc;2 A = [3 0.2027 0.2721 0.7467 0.46594 0.1987 0.1988 0.4450 0.41865 0.6037 0.0152 0.9318 0.8462];6 b = [7 0.52518 0.20269 0.6721];

10 lb = -0.1 * ones(4, 1);11 ub = 2 * ones(4, 1);12 [x, resnorm, residual, exitflag, output] = lsqlin(C, d, ...

A, b, [ ], [ ], lb, ub);

Summary: Built-in Functions (1/2)

Linear and Quadratic Minimization problems:I linprogI quadprog

Nonlinear zero finding (equation solving):I fzeroI fsolve

Linear least squares (of matrix problems):I lsqlinI lsqnonneg

Summary: Built-in Functions (2/2)

Nonlinear minimization of functions:I fminbndI fminconI fminsearchI fminuncI fseminf

Nonlinear least squares of functions:I lsqcurvefitI lsqnonlin

Nonlinear minimization of multi-objective functions:I fgoalattainI fminimax

1 >> Lecture 82 >>3 >> -- Monte Carlo Simulation4 >>

Contents

Fundamental Concepts in Statistics

Simple Random Sampling

Monte Carlo Simulation

Random Variables

A random variable is a function from a sample space Ω into the realnumbers R.

There are lots of examples where the random variables are used.I In the experiment of tossing two dice once, a random variable X can be

the sum of the numbers.I In the experiment of tossing a coin 25 times, a random variable X can

be the number of heads in 25 tosses.

Distribution Functions

With every random variable X , we associate a function called thecumulative distribution function of x .

I Often we call it a cdf of X , denoted by FX (x) = PX (X ≤ x) for all x .

The function FX (x) is a cdf if and only if the following threeconditions hold:

1 limx→−∞ FX (x) = 0 and limx→∞ FX (x) = 1.2 FX (x) is a nondecreasing function of x .3 FX (x) is right-continuous; that is, for every number x0,

limx→x+0

FX (x) = FX (x0).

Random Variable (Revisited)

A random variable X is continuous if FX (x) is a continuous functionof x .

I Probability density function (pdf) of X is defined by pX (x) =∂FX (x)

A random variable X is discrete if FX (x) is a step function of x .I Probability mass function (pmf) of X is a similar idea to a pdf, but in

the discrete sense.

Mixtures of both types also exist.

Pseudo Random Numbers in MATLAB

rand(N) returns an N-by-N matrix containing pseudo-random valuesdrawn from the continuous uniform distribution on the open interval(0, 1).15

randi(imax ,N) returns an N-by-N matrix containing pseudo-randomvalues drawn from the discrete uniform distribution on the openinterval (0, imax).

randn(N) returns an N-by-N matrix containing pseudo-randomvalues drawn from the standard normal distribution with zero meanand unit variance.16

randperm(n, k) returns a row vector containing k unique integersselected randomly from 1 to n.

15Usually denoted by X ∼ u(0, 1).16Usually denoted by X ∼ n(0, 1).

Exercise

1 Generate n values from a continuous uniform distribution on theinterval [a, b] by extending rand.

2 Generate n values from a normal distribution with mean mu andstandard deviation sig by extending randn.

1 function rv=rand general(a,b,n) %[a,b] with n sampling ...points

3 rv=a+(b-a)*rand(n,1);

1 function rv=rand general(mu,sig,n)2

3 rv = mu + sig*randn(n,1);

Exercise

Let ξi be independent and uniformly distributed over (0, 1).

A simple method to generate the standard normal variable is tocalculate

(Σ12i=1ξi )− 6.

Generate 1,000 values generated by this method and randn.

Compare them by hist-ing your results.

1 clear all;2 clc3 % main4 for i=1:1e45 x(i)=sum(rand(12,1))-6;6 y(i)=randn(1);7 end8 figure(1);hist(x(1,1:10000));9 figure(2);hist(y(1,1:10000));

10 mean(x)11 mean(y)12 std(x)13 std(y)

Figure : (Σ12i=1ξi )− 6 Figure : randn

1 mu x =2 -0.01783 mu y =4 -0.00595 std x =6 1.00277 std y =8 1.0009

Expected Values

Let X ,Y be random variables with pmf pX (x) and pY (y),respectively.

Mean measures the average value of a random variable, defined byEX =

∑xpX (x).17

Variance is defined by VarX =∑

(x − EX )2pX (x).I Doing simple algebra, VarX = EX 2 − (EX )2.

Standard deviation shows how much variation or dispersion from theaverage exists, defined by the square root of VarX and denoted byσX .

17Also denoted by µX .Zheng-Liang Lu 485 / 539

Independence and Correlation

X and Y are said to be independent if and only if their joint pmf orpdf equals the product of their pmf or pdf, that is,

P(A ∩ B) = P(A)P(B).

I Denoted by X ⊥ Y .

Covariance is a measure of how much two random variables changetogether, defined by

Cov(X ,Y ) = E[(X − EX )(Y − EY )].

I Doing simple algebra, Cov(X ,Y ) = E(XY )− EX EY .I You can doublecheck it by simply replacing Y by X .

Random variable can be standardized, given by

ZX =X − EX

Correlation can refer to any departure of two or more randomvariables from independence, defined by ρXY = EZXZY .

I −1 ≤ ρXY ≤ 1.I You can prove it by applying Cauchy-Schwarz inequality.

X ⊥ Y if and only if ρXY = 0?I (Necessity) If X ⊥ Y , then ρXY = 0. (Why?)I (Sufficiency) If ρXY = 0, then X ⊥ Y ? False.

Counterexample for Sufficiency

ρXY only measures the linear independence, but the correlation maybe nonlinear.

Further details can be found in any textbook of Statistics18.18Statistical Inferences, Casella, 2/e

Exercise

利用rand和randn產生兩組N個隨機數列，N =logspace(1, 5, 20)。

撰寫一個程式計算出此兩組隨機資料的mean、var、cov、corr。

檢驗計算結果是否與理論值相符。I If X follows the standard uniform distribution,

µX =1

2, σ2

I If X follows the standard normal distribution,

µX = 0, σ2X = 1.

1 clear all;2 clc3 % main4 n =ceil(logspace(1,5,20));5 for i=1:length(n )6 n=n (i);7 x=randn(n,1);8 y=randn(n,1);9 mu(i,:)=[mean(x) mean(y)];

10 sigma(i,:)=[std(x) std(y)];11 temp=cov(x,y);12 cov xy(i,:)=temp(:);13 rho(i)=corr(x,y);14 end15 n =log10(n );16 figure(1); grid on; ...

plot(n ,mu(:,1),'r*:',n ,mu(:,2),'g*:'); grid on;17 figure(2); grid on; ...

plot(n ,sigma(:,1),'r*:',n ,sigma(:,2),'g*:'); grid on;

18 figure(3); grid on; plot(n ,cov xy(:,1),'r*:',...19 n ,cov xy(:,2),'g*:',n ,cov xy(:,4),'b*:'); grid on;20 figure(4); grid on; plot(n ,rho,'r*:'); grid on;

1 1.5 2 2.5 3 3.5 4 4.5 5−0.8

−0.6

−0.4

−0.2

Sample Size (log scale)

UniformNormal

1 1.5 2 2.5 3 3.5 4 4.5 50.2

UniformNormal

1 1.5 2 2.5 3 3.5 4 4.5 5−0.2

UniformCov(X,Y)Normal

1 1.5 2 2.5 3 3.5 4 4.5 5−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

Random Sampling

The random variable X1, . . . ,Xn are called a random sample of size nfrom the population f (x) if X1, . . . ,Xn are mutually independentrandom variables and the marginal pdf or pmf of each Xi is the samefunction f (x).

Alternatively, X1, . . . ,Xn are called independent and identicallydistributed random variables with pdf or pmf f (x).

I For example X1, . . . ,Xniid∼ n(µ, σ).

In the definition here, the random sampling model is also calledsampling from an infinite population.

Simple Random Sampling

When one is dealing with sampling from a finite population, thedifference between the sampling with replacement and withoutreplacement arises.

It is worth to note that sampling from a finite population withoutreplacement is identically distributed because all of observations Xi

have the same marginal pdf or pmf. (Why?)

Sampling from a finite population without replacement is calledsimple random sampling.

I As the population size N grows to be large compared to the samplesize n, then X1, . . . ,Xn are nearly independent.

I Equivalently, the conditional probability of Xi is approximately equal tothe marginal probability of Xi .

A simple random sampling is an unbiased surveying technique.

Statistical Inference

Statistical inference is the process of drawing conclusions from datathat are subject to random variation.

To be more specific, statistical inference makes propositions aboutpopulations, using data drawn from the population of interest viasome form of random sampling19.

I Sample data → populationI Partial information from sampling → real information of population

An estimator is a statistic, a function of the sample data, used toinfer the value of an unknown parameter of a population.

19To eliminate the biasedness.Zheng-Liang Lu 498 / 539

Statistics

Let X1, . . . ,Xn be a random sample of size n from a population andlet T (x1, . . . , xn) be a real-valued or vector-valued function whosedomain includes the sample space of (X1, . . . ,Xn).

Then the random variable or random vector Y = T (X1, . . . ,Xn) iscalled a statistic.

I Note that the statistic cannot be a function of a parameter.

The probability distribution of a statistic Y is called the samplingdistribution of Y .

Three statistics that are often used and provide good summaries ofthe sample are now defined as follows:

I The sample mean is the arithmetic average of the values in a randomsample. It is usually denoted by

X =X1 + · · ·+ Xn

n∑i=1

I The sample variance is the statistic defined by

n − 1

n∑i=1

(Xi − X )2.

I The sample standard deviation is the statistic defined by S =√

Unbiased Statistics

Let X1, . . . ,Xn be a random sample from a population with mean µand variance σ2 <∞. Then:

1 EX = µ,2 ES2 = σ2.

The statistic X is an unbiased estimator of µ, and S2 is am unbiasedestimator of σ2.

VarX =σ2

nis not an unbiased statistic because it depends on the

sample size n.

Convergence in Probability

A sequence of random variables, X1,X2, . . ., converges in probabilityto a random variable X if, for every ε > 0,

limn→∞

P(|Xn − X | < ε) = 1.

Weak Law of Large Numbers, WLLN

Let X1,X2, . . . be iid random variables with EXi = µ and

VarXi = σ2 <∞. Define Xn =1

∑ni=1 Xi . Then, for every ε > 0,

limn→∞

P(|Xn − µ| < ε) = 1,

that is, Xn converges in probability to µ.

Almost Sure Convergence

A sequence of random variables, X1,X2, . . . ,, converges almost surelyto a random variable X if, for every ε > 0,

P( limn→∞

|Xn − X | < ε) = 1.

Strong Law of Large Numbers, SLLN

Let X1,X2, . . . be iid random variables with EXi = µ andVarXi = σ2 <∞.

Define Xn =1

∑ni=1 Xi .

Then, for every ε > 0,

P( limn→∞

|Xn − µ| < ε) = 1,

that is, Xn converges almost surely to µ.

Convergence in Distribution

A sequence of random variables, X1,X2, . . . converges in distributionto a random variable X if:

limn→∞

FXn(x) = FX (x),

at all points x where FX (x) is continuous.

Central Limit Theorem, CLT

Let X1,X2, . . . be a sequence of iid random variables whose mgfs existin a neighborhood of 0.

Let EXi = µ and VarXi = σ2 > 0.

Define Xn = (1/n)∑n

i=1 Xi .

Let Gn(x) denote the cdf ofXn − µσ/√

Then, for any x , −∞ < x <∞,

limn−→∞

Gn(x) =

−∞

1√2π

e−y2/2;

that is,Xn − µσ/√

nhas a limiting standard normal distribution.

Monte Carlo Simulation: Idea

Monte Carlo methods are used in mathematics to solve variousproblems by generating suitable random numbers and observing thatfraction of the numbers that obeys some property or properties.

The method is useful for obtaining numerical solutions to problemstoo complicated to solve analytically.

Monte Carlo methods vary, but tend to follow a particular pattern:1 Define a domain of possible inputs.2 Generate inputs randomly from a probability distribution over the

domain.3 Perform a deterministic computation on the inputs.4 Aggregate the results.

Monte Carlo Simulation: Framework

Any estimator for θ is given by θ = Eh(X ), whereX = X1,X2, . . . ,Xn ∈ Rn, h(·) is a function from Rn to R, andE|h(X )| <∞.

For example, θ is a estimator for the mean of the population if

h(X ) =1

∑nk=1 Xk . Then we can estimate θ by the following

algorithm:

Why is θ a good estimator?

There are two reasons:1 θ is unbiased.2 θ is consistent since θn → θ with probability 1 as n→∞. (SLLN)

Monte Carlo Simulation: History

In the 1930s, Enrico Fermi first experimented with the Monte Carlomethod while studying neutron diffusion, but did not publish anythingon it.

The name ”Monte Carlo” was coined by Metropolis20.

Uses of Monte Carlo methods require large amounts of randomnumbers, and it was their use that spurred the development ofpseudorandom number generators, which were far quicker to use thanthe tables of random numbers that had been previously used forstatistical sampling.

20Nicholas Constantine Metropolis (1915–1999)Zheng-Liang Lu 511 / 539

Monte Carlo Simulation vs. Bootstrapping

The tie between the bootstrap and Monte Carlo simulation of astatistic is obvious:

I Both are based on repetitive sampling and then direct examination ofthe results.

A big difference between the methods is that bootstrapping uses theoriginal, initial sample as the population from which to resample,whereas Monte Carlo simulation is based on setting up a datageneration process (with known values of the parameters).

While Monte Carlo simulation is used to test estimators, bootstrapmethods is used to estimate the variability of a statistic and the shapeof its sampling distribution.

Example: Monte Carlo Method for π

A simple Monte Carlo simulation to approximate the value of π couldinvolve randomly selecting points Xi ,Yini=1 in the unit square and

determine the ratio ρ =π

4, where m is the number of points that

satisfy x2i + y2

i ≤ 1.

1 clear all;2 clc3 % main4 n=1e5;5 x=rand(1,n);6 y=rand(1,n);7 z=x.ˆ2+y.ˆ2;8

9 cir=0;10 j=1;11 cnt=0;12 while (j<n+1)13 if z(j)<=114 cir=cir+1;15 marker(cir,:)=[x(j) y(j)];16 else17 cnt=cnt+1;18 marker (cnt,:)=[x(j) y(j)];19 end

20 j=j+1;21 end22 plot(marker(:,1),marker(:,2),'r*');23 hold on; grid on; axis equal;24 plot(marker (:,1),marker (:,2),'b*');25 4*cir/n

3 ans =4

5 3.1320 % n=1e4

Figure : n=1e3

Figure : n=1e5

Observation and Exercise

Every time a Monte Carlo simulation is made using the same samplesize n, it will come up with a slightly different value.

I The values converge in O(n−1/2).I Monte Carlo simulations outperform when the number of dimensions

increases significantly.

We call one round of Monte Carlo Simulation for π a test.

Calculate the sample mean X and sample variance VarX of m testswith n random points in each test.

I Inputs: m, nI Outputs: X ,VarX

1 function mc pi fn(n,m)2

3 for i=1:m4 x=rand(1,n);5 y=rand(1,n);6 z=x.ˆ2+y.ˆ2;7 cir=0;j=1;cnt=0;8 while (j<n+1)9 if z(j)<=1

10 cir=cir+1;11 marker(cir,:)=[x(j) y(j)];12 else13 cnt=cnt+1;14 marker (cnt,:)=[x(j) y(j)];15 end16 j=j+1;17 end18 pi (i)=4*cir/n;19 end

20 sample mean=mean(pi )21 sample err=std(pi )

1 >> mc pi fn(1e3,1e3)2

3 sample mean =4

5 3.14126

7 sample err =8

9 0.0526

Example: Approximation of Integration

Our aim is to approximate the integral F which is given by

0f (x)dx , (2)

where f (x) is any continuous real function on [0, 1].

We can apply Monte Carlo approach and rewrite the integrationproblem in statistical terms as follows:∫ 1

0f (x)dx =

∫ ∞−∞

f (x)I[0,1](x)dx = E(f (X )), (3)

where I[0,1] is an indicator function that holds 1 if x ∈ [0, 1] and 0otherwise, and X follows a standard uniform distribution.

Solution

1 % Uniform Random Number2 % Monte Carlo method as an approximated integration ...

technique3 % integrate f(x) on the [0,1] interval4 % solution: 1/2, 1/3, and 05 clear all;6 clc;7

8 n=200;9 x=rand(n,1);

11 gav=zeros(n,3);12 gavvar=zeros(n,3);13 gav(1,1)=x(1,1);14 gav(1,2)=x(1,1)ˆ2;15 gav(1,3)=cos(pi*x(1,1));16 for i=2:n

17 gav(i,1)=sum(x(1:i))/i;18 gav(i,2)=sum(x(1:i).ˆ2)/i;19 gav(i,3)=sum(cos(pi*x(1:i)))/i;20 gavvar(i,1)=var(x(1:i));21 gavvar(i,2)=var(x(1:i).ˆ2);22 gavvar(i,3)=var(cos(pi*x(1:i)));23 end24

25 %%%%%%%%% Graphics (mean) %%%%%%%%%%26 figure(1); title('Mean: E(f(x))');27 subplot(3,1,1);28 plot(gav(:,1));29 line((1:n),ones(n,1)/2,'color','red');30 title('f(x)=x');31 %32 subplot(3,1,2);33 plot(gav(:,2));34 line((1:n),ones(n,1)/3,'color','red');

35 title('f(x)=xˆ2');36 %37 subplot(3,1,3);38 plot(gav(:,3));39 line((1:n),ones(n,1)*0,'color','red');40 title('f(x)=cos(\pi x)');41

42 %%%%%%%%% Export a picture %%%%%%%%%%%%%43 print(gcf,'-depsc2','test mc');44

45 %%%%%%%%% Graphics (variance) %%%%%%%%%%46 figure(2); title('Variance: Var(f(x))')47 subplot(3,1,1);48 plot(gavvar(:,1));49 line((1:n),ones(n,1)/12,'color','red');50 title('f(x)=x');51 %52 subplot(3,1,2);

53 plot(gavvar(:,2));54 line((1:n),ones(n,1)*4/45,'color','red');55 title('f(x)=xˆ2');56 %57 subplot(3,1,3);58 plot(gavvar(:,3));59 line((1:n),ones(n,1)*1/2,'color','red');60 title('f(x)=cos(\pi x)');61 print(gcf,'-depsc2','test mc2');

0 20 40 60 80 100 120 140 160 180 200

0.7f(x)=x

0 20 40 60 80 100 120 140 160 180 2000.1

0.5f(x)=x2

0 20 40 60 80 100 120 140 160 180 200−0.2

0.6f(x)=cos(π x)

Variance

0 20 40 60 80 100 120 140 160 180 2000

0.1f(x)=x

0 20 40 60 80 100 120 140 160 180 2000

0.2f(x)=x2

0 20 40 60 80 100 120 140 160 180 2000

0.8f(x)=cos(π x)

Example: Portfolio Evaluation

Consider two stocks A and B.

Let µA, µB be the expected return rates, σA, σB be the annualizedvolatility, T be the years, SA(t) and SB(t) be the prices of A and Bat time t, respectively.

For simplicity, assume that ZA ∼ n(0, 1), ZB ∼ n(0, 1), and ZA ⊥ ZB .

Then the prices21 at time T are given by

SA = SA(0)e(µA−σ2A/2)T+σA

√TZA(T ),

SB = SB(0)e(µB−σ2B/2)T+σB

√TZB(T ).

21Black and Scholes (1973)Zheng-Liang Lu 528 / 539

At t = 0, the investor buys nA units of A and nB units of B, so theinitial wealth is W (0) = nASA(0) + nBSB(0).

Assume that the investor holds the portfolio during the period [0,T ].

At time T , the terminal wealth W (T ) is given by

W (T ) = nASA(T ) + nBSB(T ).

Now, we would like to estimate the probability that the value of myportfolio drops by more than 10%, that is,

Pr(W (T )

W (0)≤ 0.9).

Assume the following parameter values:I T = 0.5I µA = 0.15I µB = 0.12I σA = 0.2I σB = 0.18I SA(0) = 1I SB(0) = 1I nA = 0.3I nB = 0.7

1 clear all;2 clc3 % main4 % parameters5 M=1e5;6 T=0.5;7 mua=.15;8 mub=.12;9 siga=.2;

10 sigb=.18;11 na=0.3;12 nb=0.7;13 Sa0=1;14 Sb0=1;15 W0=na*Sa0+nb*Sb0;16

17 BT=sqrt(T)*randn(2,M);18 STa=Sa0*exp((mua-(sigaˆ2)/2)*T+siga* BT(1,:));19 STb=Sb0*exp((mub-(sigbˆ2)/2)*T+sigb* BT(2,:));

20 WT=na*STa + nb*STb;21

22 sum=0;23 for i=1:M24 if WT(i)/W0 <= 0.925 sum=sum+1;26 end27 end28 theta M=sum/M

3 theta M =4

5 0.0466

Exercise

Assume that µA = 0.10, µB = 0.01, σA = 0.5, and σB = 0.1.

Based on the program in the example, show the trend with the

numbers of holding years T = 1, 2, · · · , 10 if Pr(W (T )

W (0)≤ 1.2).

1 function portfolio(exp return)2

3 M=1e5;4 mua=.10;5 mub=.01;6 siga=.5;7 sigb=.1;8 na=0.3;9 nb=0.7;

10 Sa0=1;11 Sb0=1;12 W0=na*Sa0+nb*Sb0;13 T=linspace(1,10,10);14 WT mean=zeros(length(T),1);15 WT pstd=zeros(length(T),1);16 WT nstd=zeros(length(T),1);17 theta M=zeros(length(T),1);18

19 for j=1:length(T)

20 sum=0;21 BT=sqrt(T(j))*randn(2,M);22 STa=Sa0*exp((mua-(sigaˆ2)/2)*T(j)+siga* BT(1,:));23 STb=Sb0*exp((mub-(sigbˆ2)/2)*T(j)+sigb* BT(2,:));24 WT=na*STa + nb*STb;25

26 for i=1:M27 if (WT(i)/W0<=(1+exp return))28 sum=sum+1;29 end30 end31

32 WT mean(j)=mean(WT);33 WT pstd(j)=WT mean(j)+std(WT);34 WT nstd(j)=min(WT)-std(WT);35 theta M(j)=sum/M;36 end

0 2 4 6 8 10 12−1

Holding Years

1 2 3 4 5 6 7 8 9 1055

Holding Years

Example: European Call Price

1 clear; clc;2

3 M = 1e5; % number of random path4 sig = 0.35; % annualized volatility5 S0 = 1; % spot price6 X = 1.2; % strike price7 r = 0.01; % annual risk-free interest rate8 T = 1; % time to maturity in years9 c = 0; % call price at time 0

11 W = sqrt(T)*randn(M,1);12 ST = S0 * exp((r - (sig ˆ 2) / 2) * T + sig * sqrt(T) * W);13

14 for i = 1 : M15 if (ST(i) - X > 0)16 c = c + (ST(i) - X);17 end

18 end19

20 c = c / M * exp(-r * T) % discounted expected payoff

For European put price, replace ST − X by X − ST in line 15 and 16.

Can you speed up the program by using vectorization?

System of Linear Equations - NTUd00922011/matlab/260/20150923.pdf · System of Linear Equations A...

Documents

Transcript of System of Linear Equations - NTUd00922011/matlab/260/20150923.pdf · System of Linear Equations A...

system of linear equations by Diler

Chapter 1: Systems of Linear Equations System of Linear Equations.pdf · A system of linear equations (or a linear system) is a collection of linear equations that share the same

WEEK 7 SYSTEM OF EQUATIONS SYSTEM OF LINEAR EQUATIONS IN THREE VARIABLES.

System of Linear Equations- Practice Problems

System of linear algebriac equations nsm

System of Linear Fractional Integro-Differential Equations ... · System linear fractional integro -differential equations, Adomian decomposition method, Caputo fractional derivative,

Lecture 10a system of linear equations

7.3: Systems of Linear Equations, Linear Independence ...park633/ma266/Boyce_DE10_ch...7.3: Systems of Linear Equations, Linear Independence, Eigenvalues • A system of n linear equations

System of Linear Equations - University of Texas at …aldhahir/2300/Ch2_2.pdfSystem of Linear Equations Definitions : System of Linear Equations 1. Augmented Matrix : M = [A b] 2.

One Day A system of linear equations is simply two or more linear equations using the same variables. If the system of linear equations has a solution,

Systems. Day 1 Systems of Linear Equations System of Linear Equations: two or more linear equations together The solution of the system of equations.

AS91587 Simultaneous Equations. In mathematics, a system of linear equations (or linear system) is a collection of linear equations involving the same.

solution of linear equations system

CHAPTER 1 Linear Equations in Linear Algebra. §1.1 Systems of Linear Equations Basic concept linear equation( 线性方程 ) system of linear equations( 线性方程.

II. Linear Systems of Equations - UBC Mathfeldman/m152/gauss.pdf · II. Linear Systems of Equations ... §II.2 Solving Linear Systems of Equations ... system of equations by another

Linear System of Equations - Conditioning

6-1 Solving Systems by Graphing. System of Linear Equations: two or more linear equations Solution of a System of Linear Equations: any ordered pair that.

LINEAR EQUATIONS - 1.cdn.edl.io€¦ · math grade 8 unit 5 linear equations exercises. ... a system of linear equations ... solving problems ...

System of Linear Equations Nattee Niparnan. LINEAR EQUATIONS.

{ Solving a System of Equations Linear and Linear Inequalities.