Analysis of multivariate transformations. Transformation of the response in regression The...

38
Analysis of Analysis of multivariate multivariate transformations transformations

Transcript of Analysis of multivariate transformations. Transformation of the response in regression The...

Page 1: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Analysis of multivariate Analysis of multivariate transformationstransformations

Page 2: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Transformation of the response in Transformation of the response in regression regression

• The normalized power transformation is:

is the geometric mean of the observations

The purpose is to find an estimate of for which the errors in z() are approximately normally distributed with constant variance

Page 3: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Score test for transformationScore test for transformation

0

)()()()( 00

z

zz

)()()()( 000 wzz

Txz )(

)()()( 000 wxz T

The score test TThe score test Tscsc((= = 00) is the ) is the

tt-statistic on the constructed -statistic on the constructed variable variable ww((00))

Page 4: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Multivariate transformationsMultivariate transformations

• In this case yi is a v 1 vector of responses at observation i with yij the observation on response j. The normalized transformation of yij is given by:

is the geometric mean of the jth response

Page 5: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Multivariate transformationsMultivariate transformations

• We assume a multivariate linear regression model of the form

Page 6: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Mult. transformations to normalityMult. transformations to normality

• If the transformed obs. are normally distributed with mean μi and cov. matrix Σ the max. loglikelihood is given by

Page 7: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Mult. transformations to normalityMult. transformations to normality

• If the explanatory variables are the same

• The max. lik. estimator of Σ is given by

ei(λ) is a v 1 vector of residuals for observation i for some value of

Page 8: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

The profile loglikelihood (i.e. The profile loglikelihood (i.e. maximized over maximized over μ μ and and Σ) Σ) isis

Page 9: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Multivariate likelihood ratio testMultivariate likelihood ratio test

• The multivariate generalization of TSC is given by:

This statistic must be compared with a 2 distr. with v df.

Page 10: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Swiss heads: monitoring lik. ratio Swiss heads: monitoring lik. ratio test for transf. Htest for transf. H00::λλ=1=1

The last two units (104 and 111) to enter provide all the evidence for a transformation

Page 11: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Boxplot of 6 var. with univariate Boxplot of 6 var. with univariate outliers labelledoutliers labelled

Page 12: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Swiss headsSwiss heads

• The marginal distribution of y4 had the two outliers (units 104 and 111).

• We want to test whether all the evidence for a transformation is due to y4.

• We recalculate the likelihood ratio but now testing whether 4 is equal to 1.

Page 13: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Forward plot of the lik. ratio test HForward plot of the lik. ratio test H00: : 44=1=1

The last two units to enter provide all the evidence for a transformation

Page 14: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Mussels dataMussels data

82 observations on Horse mussels (cozze) from New Zealand. Five variables:

Purpose: to see whether multivariate normality can be obtained by joint transformation of all 5 variables

Page 15: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Mussels data: spmMussels data: spm

Page 16: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Forward lik. ratio for HForward lik. ratio for H00::=1=1

Page 17: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Finding a multivariate transformation Finding a multivariate transformation with the forward searchwith the forward search

• With just one variable for transformation it is extremely easy to use the fan plot from the forward search to find satisfactory transformations and observations which are influential

• With v variables there are 5v combinations of the 5 values of =(-1,-0.5,0,0.5,1)

Page 18: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Suggested procedure for finding Suggested procedure for finding multivariate transformationsmultivariate transformations

• Run the FS through untransformed data, ordering the observations at each m by MD calculated from untransformed observations.

• Estimate at each step.

• Select a preliminary set of transformation parameters

Page 19: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Monitoring of MLE of Monitoring of MLE of H H00:: =1=1

-0.4

0

0.4

0.8

1.2

1.6

Subset size m

la1

la2

la3

la4

la5

HH00: : =(0.5, 0, 0.5, 0, 0)=(0.5, 0, 0.5, 0, 0)

Page 20: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Monitoring of MLE of Monitoring of MLE of HH00:: =(0.5, 0, 0.5, 0, 0)=(0.5, 0, 0.5, 0, 0)

-1

-0.5

0

0.5

1

Subset size m

Mle

of la

mbda

la1

la2

la3

la4

la5

Page 21: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Forward lik. ratio for Forward lik. ratio for HH00::=(0.5,0,0.5,0,0)=(0.5,0,0.5,0,0)

Page 22: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Validation of the transformationValidation of the transformation

• In univariate analysis the likelihood ratio test is

• Asymptotically the null distribution of TLR is chi-squared on one degree of freedom.

Page 23: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Signed square root of TSigned square root of TLRLR

• This test asymptotically has N(0,1)

• Including the sign of the difference between the two gives an indication of the direction of any departure from the hypothesised value

Page 24: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Multivariate version of the signed Multivariate version of the signed sqrt lik. ratiosqrt lik. ratio

• We test just one component of when all others are kept at some specified value

• We calculate a set of tests by varying each component of about 0

Page 25: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Example: mussels data validation Example: mussels data validation of of 00=(0.5,0,0.5,0,0)=(0.5,0,0.5,0,0)

• Purpose to validate in a multivariate way 1=0.5 for the first variable

• To form the likelihood ratio test we need an estimator = (1, …, v) found by maximization only over 1.

• The other parameters keep their values in 0. (In this example 0,0.5,0,0)

1 takes the 5 standard values of (-1,-0.5,0,0.5,1)

Page 26: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Example: validation of Example: validation of 11

• We perform 5 independent FS with0=(-1, 0,0.5,0,0)0=(-0.5, 0,0.5,0,0)0=(0, 0,0.5,0,0)0=(0.5, 0,0.5,0,0)0=(-1, 0,0.5,0,0)• We monitor for each search the signed

square root likelihood ratio test

Page 27: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Version for multivariate data of the Version for multivariate data of the signed sqrt LR testsigned sqrt LR test

j is the parameter under test S

j is one of the 5 standard values of 0j is the vector of parameter values in which j

takes one of the 5 standard values S while the other parameters keep their value in 0

• One plot for each j j =1, …, v

Page 28: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Mussels data: validation of Mussels data: validation of 00=(0.5,0,0.5,0,0)=(0.5,0,0.5,0,0)

Page 29: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Forward lik. ratio for Forward lik. ratio for HH00::=(1/3,1/3,1/3,0,0)=(1/3,1/3,1/3,0,0)

Page 30: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Mussels data: spm (transf. obs.)Mussels data: spm (transf. obs.)

Page 31: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Monitoring MD before transformingMonitoring MD before transforming

Page 32: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Monitoring MD after transformingMonitoring MD after transforming

Page 33: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Minimum MD before and after Minimum MD before and after transformingtransforming

The transformation has separated the outliers from the bulk of the

data.

Page 34: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Gap before and after transformingGap before and after transforming

Page 35: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

ConclusionsConclusions

• This was an example of our approach to finding a mult. transformation in the presence of potential influential obs. and outliers.

• Procedure: start the search with untransformed data to suggest a transformation and repeat the analysis until you find an acceptable transformation.

• In this example only 3 searches were necessary to find a transformation which is stable for all the search, any changes being at the end.

Page 36: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

ExercisesExercises

Page 37: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Exercise 1Exercise 1

• The next slide gives two sets of bivariate data. Which of the two has to be transformed to achieve bivariate normality?

• Consider a forward search in which you monitor the likelihood ratio test for the hypothesis of no transformation. Describe the plot you would expect to get for each of the two sets of data.

Page 38: Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

Two sets of simulated bivariate Two sets of simulated bivariate datadata