Imb 2011 Seminar6 Pca Fa
-
Upload
ancutza-rosu -
Category
Documents
-
view
220 -
download
0
Transcript of Imb 2011 Seminar6 Pca Fa
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
1/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 1
PRINCIPAL COMPONENT ANALYSIS
The main idea of this method is to form, from a set of existing variables, a new variable
(or new variables, but as few as possible) that contain as much variability of the original
data as possible. This is a method of data reduction; we reduce the number of variables
in order to handle data more easily.
In most cases we wish to get only one dimension (variable) that contains most of the
variability of the original data. This variable than represents some sort of index of a
certain property that is measured by the original variables. For example:
- we are measuring the development of a region. We measure the differences with
several variables (e.g. GDP/pc, infant mortality,...). With the help of principal
component analysis we can construct an index of development.
- a controller in a factory has several indicators of quality - with principal
components analysis we can construct a quality index
PRINCIPAL COMPONENT ANALYSIS WITH SPSS PROCEDURE FACTOR
ANALYSIS
SPSS can perform principal component analysis, but the procedure for doing so is
hidden within the procedure for factor analysis. Procedure can perform the analysis with
standardized and original (non-standardized) data. With this procedure we can
- compute descriptive statistics for all variables
- make the correlation matrix
- compute communalities
- compute the share of variance of original data, explained by each and all components
- plot the scree-plot
COMPUTATION OF THE PARAMETERS OF PRINCIPAL COMPONENTS ANALYSIS
1. Enter or load the data
2. Select Analyze | Dimension Reduction | Factor; we get the menu Factor Analysis
(Figure 1)
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
2/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 2
Figure 1: Dialog window Factor Analysis
3. In the left box we select the variables that we want to enter into the principal
components analysis and transfer them into the right box.
4. Click Extraction...; we get the menu Factor Analysis: Extraction (Figure 2). The
option for performing principal components analysis is Principal Components in the
fieldMethod. Other options in this field are for factor analysis. .
5. We click OK, the window Factor Analysis closes and the results of the analysis
appear in theViewerwindow.
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
3/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 3
Figure 2: Dialog window Factor Analysis: Extraction
In the boxAnalyzewe can set, whether the analysis will be performed on original (non-
standardized) (Covariance matrix) or standardized data (Correlation matrix).
When choosing the analysis on original data, the importance of a variable is determined
by the relative size of its variance higher variance means higher importance of that
variable. If we dont want the variability of a variable to determine its importance, we
decide to standardize data and so to use the correlation matrix.
The decision, which one to use, depends on the nature of the problem. If we think the
variables are more or less equally important, we decide for the standardization; if the
variability of the variable is of any importance, we use covariance matrix in the analysis.
When variables are of very different measurement sizes (e.g. infant mortality in % against
GDP/pc in $) the standardization is usually the only sensible choice.
Field Display offers the possibility of printing the unrotated solution (the only one in
principal component analysis). The solution can contain only some components; the
number of components is set by the rules in the fieldExtract.
Field Displayalso sets the display of the scree-plot. Scree-plot is useful in determining the
number of components needed.
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
4/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 4
In fieldExtractwe set how many components we want to be displayed. We can set the
number of components we want or set the cut-off eigenvalue. Default value is 1 in the
case of standardized data or the average eigenvalue in case of original data.
DESCRIPTIVE STATISTICS AND CORRELATION MATRICES
ClickDescriptives, which opens the dialog windowFactor Analysis: Descriptives (Figure
3). In this dialog we set:
- in field Statistics the display of descriptive statistics and the initial solution (all
components)
Figure 3: Dialog window Factor Analysis: Descriptives
- in field Correlation Matrix we set the display of correlation matrix, significances,...
KMO or Keiser-Meyer-Olin-ova measure of sampling adequacy shows the strength
of connection between variables; it can be between 0 and 1, values closer to 1 are
more desirable. Bartlet test of sphericity tests for the assumption, that the correlation
matrix is an identity matrix (variables are not correlated). In this case, principal
component analysis can not be performed.
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
5/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 5
EXAMPLE
FACTOR/VARIABLES total_liters value_sum transactions share_olive_oil/MISSING LISTWISE/ANALYSIS total_liters value_sum transactions share_olive_oil/PRINT UNIVARIATE INITIAL CORRELATION KMO EXTRACTION/PLOT EIGEN/CRITERIA MINEIGEN(1) ITERATE(25)/EXTRACTION PC/ROTATION NOROTATE/METHOD=CORRELATION.
Factor Analysis
Descriptive Statistics
Mean Std. Deviation Analysis N
total_liters 1.5709 1.49828 504
value_sum 10.1272 9.69014 504
transactions 1.90 1.597 504
share_olive_oil 8.6048 11.73409 504
Correlation Matrix
total_liters value_sum transactions share_olive_oil
total_liters 1.000 .824 .842 .249
value_sum .824 1.000 .867 .299
transactions .842 .867 1.000 .210
Correlation
share_olive_oil .249 .299 .210 1.000
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .767
Approx. Chi-Square 1436.940
df 6
Bartlett's Test of Sphericity
Sig. .000
Communalities
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
6/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 6
Initial Extraction
total_liters 1.000 .863
value_sum 1.000 .894
transactions 1.000 .881
share_olive_oil 1.000 .157
Extraction Method: Principal
Component Analysis.
Total Variance Explained
Initial Eigenvalues
Extraction Sums of Squared
Loadings
Component
Total% of
VarianceCumulative
% Total% of
VarianceCumulative
%
1 2.796 69.898 69.898 2.796 69.898 69.898
2 .898 22.461 92.359
3 .180 4.511 96.870
dimen
sion0
4 .125 3.130 100.000
Extraction Method: Principal Component Analysis.
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
7/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 7
Component Matrixa
Compone
nt
1
total_liters .929
value_sum .946
transactions .939
share_olive_oil .396
Extraction Method: Principal
Component Analysis.
a. 1 components extracted.
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
8/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 8
FACTOR ANALYSIS
With principal component analysis we tried to explain as much variance of the original
data as possible by forming new, synthetic variables. In factor analysis we try to find
some dimensions, traits, that can not be measured directly, but affect certain variables
that can be measured.
For example, measuring intelligence. We can not measure intelligence, but we can
measure certain capabilities of an individual (mathematical, logical...) that are affected by
intelligence.
FACTOR ANALYSIS WITH SPSS DIFFERENCES FROM PRINCIPALCOMPONENTS ANALYSIS
Although the logic of both is different, both principal components and factor analysis are
supported in the same SPSS function. In factor analysis the following methods of
extraction are used:
1. Principal factors
- this method differs from principal components only in logic and explanation.
Initial solution is always based on this method
- Methods creates factors, that are uncorrelated (between themselves) linear
combinations of initial variables.
2. Principal axes
- Method creates factors from the modified correlation matrix, which has diagonal
values less than 0. This is an iteration method; in the first step the diagonal values
are communalities of the initial (principal factors) solution. In the following steps,
communities from previous steps are used until the solution converges.
3. alpha factoring
- method assumes, that we deal with a sample and tests for significances.
4. image factoring
- this is actually the first step of principal axes method; modified correlation matrix
with multiple determination coefficients on the diagonal is used.
5. ordinary least squares
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
9/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 9
- minimizes the differences between the actual and estimated correlation matrix,
not taking account of the diagonal values
6. generalized least squares
- minimizes the differences between the actual and estimated correlation matrix,
not taking account of the diagonal values; variables are weighted by the inverse
value of their uniqueness
Most commonly used is the method of principal axes. Principal factors is less
appropriate, because it doesnt take account of the existence of specific factors, that
influence variables, existence of which if shown by communalities less than 1. It is only
used when other methods dont converge.
Rotation is used in order do improve the solution, to get a more clear picture. We know
orthogonal and oblique (non-orthogonal) rotations.
Rotations in SPSS:
1. Varimax
- orthogonal rotation, that minimizes the number of variables that have high
loadins on each factor; it simplifies the interpretation of factors
2. Quartimax- orthogonal rotation; that minimizes the number of factors needed to explain each
variable; it simplifies the interpretation of the observed variables
3. Equamax
- orthogonal rotation, combination of varimax and quartimax.
4. Oblimin
- oblique rotation; non-orthogonal rotations are used, when orthogonal rotation
dont give an interpretable solution. Delta determines the obliqueness, 0 meaning
the most oblique rotation
5. Promax
- oblique rotation
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
10/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 10
Difference between pattern and structure loadings
- structure loadings are correlation coefficients between variable and factor
- pattern loadings are regression coefficients between variable and factor
- product of pattern loadings for two variables gives correlation between this two
variables
- structure loadings are commonly explained
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
11/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 11
EXAMPLE
Factor Analysis
This example is done on the personality questions in the database.
We do the factor analysis following the same steps as with principal factor analysis.
FACTOR/VARIABLES Q17.1 Q17.2 Q17.3 Q17.4 Q17.5 Q17.6 Q17.7 Q17.8 Q17.9
Q17.10Q17.11 Q17.12 Q17.13 Q17.14 Q17.15 Q17.16 Q17.17 Q17.18 Q17.19
Q17.20/MISSING LISTWISE /ANALYSIS Q17.1 Q17.2 Q17.3 Q17.4 Q17.5 Q17.6
Q17.7 Q17.8Q17.9 Q17.10 Q17.11 Q17.12 Q17.13 Q17.14 Q17.15 Q17.16 Q17.17
Q17.18Q17.19 Q17.20/PRINT UNIVARIATE INITIAL CORRELATION KMO EXTRACTION ROTATION/PLOT EIGEN/CRITERIA MINEIGEN(1) ITERATE(25)/EXTRACTION PAF/CRITERIA ITERATE(25)/ROTATION VARIMAX/METHOD=CORRELATION .
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
12/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 12
Correlationmatrix
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
13/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 13
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
14/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 14
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
15/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 15
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
16/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 16
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
17/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 17
ADEQUACY OF DATA
From the correlation matrix we could see that most correlations are not high, but some
are and many more are statistically significant.
Bartlett test shows significant differences and KMO measure at 0.738 shows that the data is
appropriate for this type of analysis.
STANDARDIZED OR ORIGINAL DATA?
As all questions are measured on the same scale, one could use covariance matrix (non-
standardized data) for the analysis. However, use of standardized data is still correct.
Because of a simpler output and because its much more common in practice, correlation
matrix is usually used in the example.
NUMBER OF FACTORS
Based on the scree plot one would use four factors, although the Kaiser rule suggests to
use five factors.
INTERPRETATION OF FACTORS
Factors are interpreted based on structure loadings. We can interpret the non-rotated solution or
use one of the rotations.
In the example, we used varimax rotation. We have four factors that can be interpreted as
follows:- optimism and self-esteem
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
18/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 18
- sociability
- desperation and indecisiveness
- artism
When orthogonal rotation doesnt give a sensible interpretation we use oblique rotation.
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
19/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 19
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
20/21
Seminar 6: PCA and Factor Analysis January 19 2011
Marko Pahor 20
-
7/30/2019 Imb 2011 Seminar6 Pca Fa
21/21
Seminar 6: PCA and Factor Analysis January 19 2011
M k P h 21
In our case there arent many differences between orthogonal and oblique rotation.Factor correlation matrix shows the obliqueness higher the correlations, more obliquethe rotation.