Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate...

32
Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 1 / 13

Transcript of Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate...

Page 1: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Multivariate Statistical Analysis (Math347)

Instructor: Kani Chen

An Overview

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 1 / 13

Page 2: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Math347 vs. Math341:

Blue collar vs. white collar.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 2 / 13

Page 3: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Math347 vs. Math341:

Blue collar vs. white collar.

Construction work vs. architecture.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 2 / 13

Page 4: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Math347 vs. Math341:

Blue collar vs. white collar.

Construction work vs. architecture.

Action vs. Drama.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 2 / 13

Page 5: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Math347 vs. Math341:

Blue collar vs. white collar.

Construction work vs. architecture.

Action vs. Drama.

Messy vs. clean.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 2 / 13

Page 6: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Math347 vs. Math341:

Blue collar vs. white collar.

Construction work vs. architecture.

Action vs. Drama.

Messy vs. clean.

Expect lots of sweat throughout this course.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 2 / 13

Page 7: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Structure of the course:

Part 1: Conventional streamline of elementary statistics:about means;linear regression.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 3 / 13

Page 8: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Structure of the course:

Part 1: Conventional streamline of elementary statistics:about means;linear regression.

Part 2: (Special) methodologies.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 3 / 13

Page 9: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Part I. Conventional problems.One sample problem:

X1, ..., Xn iid ∼ MN(µ,Σ),

where Xi and µ are p-dimension, and Σ is p × p.Concerning with µ. The objective is to estimate µ with accuracyjustification.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 4 / 13

Page 10: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Part I. Conventional problems.One sample problem:

X1, ..., Xn iid ∼ MN(µ,Σ),

where Xi and µ are p-dimension, and Σ is p × p.Concerning with µ. The objective is to estimate µ with accuracyjustification.

Xi is the (IQ, EQ) of the i-th randomly selected student from ouruniversity. p = 2.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 4 / 13

Page 11: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Paired sample problem:

(still one sample problem):(

X1

Y1

)

...

(

Xn

Yn

)

iid ∼ MN(

(

µ1

µ2

)

,Σ),

where Xi , Yi , µ1 and µ2 are p-dimension, and Σ is (2p)× (2p)matrix.Concerning with the difference of µ1 and µ2, and objective:Estimating µ1 − µ2 with accuracy justification.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 5 / 13

Page 12: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Paired sample problem:

(still one sample problem):(

X1

Y1

)

...

(

Xn

Yn

)

iid ∼ MN(

(

µ1

µ2

)

,Σ),

where Xi , Yi , µ1 and µ2 are p-dimension, and Σ is (2p)× (2p)matrix.Concerning with the difference of µ1 and µ2, and objective:Estimating µ1 − µ2 with accuracy justification.

(Xi , Yi) are the (IQ, EQ) of the i-th randomly selected student fromour university before and after a training program. (p = 2.) Doesthe training program make a difference?

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 5 / 13

Page 13: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Repeated measurement.

(Exactly one sample, but with a special care.)

X1, ..., Xn iid ∼ MN(µ,Σ),

Concerning the differences of the components of µ.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 6 / 13

Page 14: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Repeated measurement.

(Exactly one sample, but with a special care.)

X1, ..., Xn iid ∼ MN(µ,Σ),

Concerning the differences of the components of µ.

Example: Xi is the returns of year 2006, 2007, 2008 of the i-thrandomly selected stock in HKEX. (p = 3). Is there a differencebetween the three years in terms of stock returns?The components are of the same nature and there are numericallycomparable.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 6 / 13

Page 15: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Two sample problem:

X1, ..., Xn iid ∼ MN(µ1,Σ1),

Y1, ..., Ym iid ∼ MN(µ2,Σ2),

and {Xi} are independent of {Yj}.Concerning the difference of µ1 and µ2, and the objective is toestimate µ1 − µ2 with accuracy justification.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 7 / 13

Page 16: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Two sample problem:

X1, ..., Xn iid ∼ MN(µ1,Σ1),

Y1, ..., Ym iid ∼ MN(µ2,Σ2),

and {Xi} are independent of {Yj}.Concerning the difference of µ1 and µ2, and the objective is toestimate µ1 − µ2 with accuracy justification.

Example: Xi is the monthly (income, spending) of the i-thrandomly selected household in Hong Kong. Yi is that inShanghai. (p = 2). Any difference between HK and SH in termsincome and spending?Special interest: Σ1 = Σ2.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 7 / 13

Page 17: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Several/Multiple sample problem:

X11, ..., X1n1 iid ∼ MN(µ1,Σ),

· · ·

Xg1, ..., Xgng iid ∼ MN(µg,Σ),

Concerning the differences between µ1, µ2, ..., µg , and theobjective answer the question whether they any different withaccuracy justification.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 8 / 13

Page 18: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Several/Multiple sample problem:

X11, ..., X1n1 iid ∼ MN(µ1,Σ),

· · ·

Xg1, ..., Xgng iid ∼ MN(µg,Σ),

Concerning the differences between µ1, µ2, ..., µg , and theobjective answer the question whether they any different withaccuracy justification.

Example: monthly (income, spending) for randomly selectedhouseholds in HK, SH and BJ. (p = 2 and g = 3).Any difference between HK, SH and BJ in terms of income andspending?MANOVA

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 8 / 13

Page 19: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Regression

Ordinary/univariate linear regression:

Y = β′X + ǫ.

where Y is of 1 dimension (uni);ǫ: 1 dimension;β and X are r + 1 dimension.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 9 / 13

Page 20: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Regression

Ordinary/univariate linear regression:

Y = β′X + ǫ.

where Y is of 1 dimension (uni);ǫ: 1 dimension;β and X are r + 1 dimension.

Example: Yi is math exam score and Xi is (1, time spent) onstudying for the i-th randomly selected student.Here 1 is for the intercept component.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 9 / 13

Page 21: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Regression

Multivariate linear regression:

Y = β′X + ǫ.

where Y as well as ǫ are of m dimension (uni);β: (r + 1) × m matrix;m univariate linear models putting together.Connected by the dependence of the components of ǫ.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 10 / 13

Page 22: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Regression

Multivariate linear regression:

Y = β′X + ǫ.

where Y as well as ǫ are of m dimension (uni);β: (r + 1) × m matrix;m univariate linear models putting together.Connected by the dependence of the components of ǫ.

Example: Yi is (math, physics) exam scores and Xi same asabove.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 10 / 13

Page 23: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Part II:1. Principle component analysis and factor analysis.

Linearly combine the variables in study to produce the mostinfluential (hidden) variables.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 11 / 13

Page 24: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Part II:1. Principle component analysis and factor analysis.

Linearly combine the variables in study to produce the mostinfluential (hidden) variables.

Factor analysis, a refined method.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 11 / 13

Page 25: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Part II:1. Principle component analysis and factor analysis.

Linearly combine the variables in study to produce the mostinfluential (hidden) variables.

Factor analysis, a refined method.

Example: We have the exam scores of history, English, Chinese,math, physics and chemistry for a number of randomly selectedstudents. What are the most important factors that contribute tothe exam performance?

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 11 / 13

Page 26: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

Part II:1. Principle component analysis and factor analysis.

Linearly combine the variables in study to produce the mostinfluential (hidden) variables.

Factor analysis, a refined method.

Example: We have the exam scores of history, English, Chinese,math, physics and chemistry for a number of randomly selectedstudents. What are the most important factors that contribute tothe exam performance?

Example: With the observations of the daily returns of PetroChina,Sinopec, CNOOC, Tencent, Kingsoft and Alibaba in the past year,can we find out what are the underlying driving forces that affectthe stock prices?

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 11 / 13

Page 27: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

2. Canonical Analysis.

Find out and sort out the connection between two sets of variablesusing linear combinations.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 12 / 13

Page 28: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

2. Canonical Analysis.

Find out and sort out the connection between two sets of variablesusing linear combinations.

Example: With the observations of the daily returns of PetroChina,Sinopec, CNOOC, Tencent, Kingsoft and Alibaba in the past year,can we find out and explain the relation, if any, between the stockperformances the oil industry and the internet industry?

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 12 / 13

Page 29: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

3. Classification/Separation/Discrimination.

Given multivariate observations from two populations.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 13 / 13

Page 30: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

3. Classification/Separation/Discrimination.

Given multivariate observations from two populations.

Come up with a proper criterion to classify a new observation intoone of the two populations, in other words, to determine properlywhich population this observation comes from.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 13 / 13

Page 31: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

3. Classification/Separation/Discrimination.

Given multivariate observations from two populations.

Come up with a proper criterion to classify a new observation intoone of the two populations, in other words, to determine properlywhich population this observation comes from.

Hypothetical example: initial judgment of whether patient isinfected with, say, H1N1 is based on fever (degree of temperature)and number of white cells.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 13 / 13

Page 32: Multivariate Statistical Analysis (Math347)makchen/MATH4424/Chap0overview.pdf · Multivariate Statistical Analysis (Math347) Instructor: Kani Chen An Overview Instructor: Kani Chen

3. Classification/Separation/Discrimination.

Given multivariate observations from two populations.

Come up with a proper criterion to classify a new observation intoone of the two populations, in other words, to determine properlywhich population this observation comes from.

Hypothetical example: initial judgment of whether patient isinfected with, say, H1N1 is based on fever (degree of temperature)and number of white cells.

Minimize mistaken classifications.Minimize the cost of misclassifications.

Instructor: Kani Chen (HKUST) Multivariate Statistical Analysis (Math347) 13 / 13