Embed Size (px)
Transcript of catpca
Categorical Principal Components Analysis (CATPCA)
SPSS is a registered trademark and the other product names are the trademarks of SPSS Inc. for its proprietary computer software. No material describing such software may be produced or distributed without the written permission of the owners of the trademark and license rights in the software and the copyrights in the published materials. The SOFTWARE and documentation are provided with RESTRICTED RIGHTS. Use, duplication, or disclosure by the Government is subject to restrictions as set forth in subdivision (c)(1)(ii) of The Rights in Technical Data and Computer Software clause at 52.227-7013. Contractor/manufacturer is SPSS Inc., 233 South Wacker Drive, 11th Floor, Chicago, IL 60606-6307. General notice: Other product names mentioned herein are used for identification purposes only and may be trademarks of their respective companies. TableLook is a trademark of SPSS Inc. Windows is a registered trademark of Microsoft Corporation. ImageStream Graphics & Presentation Filters, copyright 1991-1999 by INSO Corporation. All Rights Reserved. ImageStream Graphics Filters is a registered trademark and ImageStream is a trademark of INSO Corporation. DataDirect, INTERSOLV, and SequeLink are registered trademarks of MERANT Solutions Inc. DataDirect Connect is a trademark of MERANT Solutions Inc. Copyright 1999 by SPSS Inc. All rights reserved. Printed in the United States of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.
This procedure simultaneously quantifies categorical variables while reducing the dimensionality of the data. Categorical principal components analysis is also known by the acronym CATPCA, for CATegorical Principal Components Analysis. The goal of principal components analysis is to reduce an original set of variables into a smaller set of uncorrelated components that represent most of the information found in the original variables. The technique is most useful when a large number of variables prohibits effective interpretation of the relationships between objects (subjects and units). By reducing the dimensionality, you interpret a few components rather than a large number of variables. Standard principal components analysis assumes linear relationships between numerical variables. On the other hand, the optimal-scaling approach allows variables to be scaled at different levels. Categorical variables are optimally quantified in the specified dimensionality. As a result, nonlinear relationships between variables can be modeled.Example. Categorical principal components analysis could be used to graphically display the relationship between job category, job division, region, amount of travel (high, medium, and low), and job satisfaction. You might find that two dimensions account for a large amount of variance. The first dimension might separate job category from region, whereas the second dimension might separate job division from amount of travel. You also might find that high job satisfaction is related to a medium amount of travel. Statistics and plots. Frequencies, missing values, optimal scaling level, mode, variance accounted for by centroid coordinates, vector coordinates, total per variable and per dimension, component loadings for vector-quantified variables, category quantifications and coordinates, iteration history, correlations of the transformed variables and eigenvalues of the correlation matrix, correlations of the original variables and eigenvalues of the correlation matrix, object scores, category plots, joint category plots, transformation plots, residual plots, projected centroid plots, object plots, biplots, triplots, and component loadings plots. Data. String variable values are always converted into positive integers by ascending alphanumeric order. User-defined missing values, system-missing values, and values less than 1 are considered missing; you can recode or add a constant to variables with values less than 1 to make them non-missing. Assumptions. The data must contain at least three valid cases. The analysis is based on positive integer data. The discretization option will automatically categorize a fractional-valued variable by grouping its values into categories with a close to
normal distribution and will automatically convert values of string variables into positive integers. You can specify other discretization schemes.Related procedures. Scaling all variables at the numerical level corresponds to standard principal components analysis. Alternate plotting features are available by using the transformed variables in a standard linear principal components analysis. If all variables have multiple nominal scaling levels, categorical principal components analysis is identical to homogeneity analysis. If sets of variables are of interest, categorical (nonlinear) canonical correlation analysis should be used.
To Obtain a Categorical Principal Components Analysis From the menus, choose: Analyze Data Reduction Optimal Scaling Figure 1 Optimal Scaling dialog box
Select Some variable(s) not multiple nominal.
Select One set. Click Define. Figure 2 Categorical Principal Components dialog box
Select at least two Analysis Variables and the number of Dimensions in solution. Click OK.
You may optionally specify supplementary variables, which are fitted into the solution found, or labeling variables for the plots.
Define Scale and Weight in CATPCAYou can set the optimal scaling level for analysis variables and supplementary variables. By default, they are scaled as second-degree monotonic splines (ordinal) with two interior knots. Additionally, you can set the weight for analysis variables.Variable weight. You can choose to define a weight for each variable. The value specified must be a positive integer. The default value is 1. Optimal scaling level. You can also select the scaling level to be used to quantify each variable.n Spline ordinal. The order of the categories of the observed variable is preserved in
the optimally scaled variable. Category points will be on a straight line (vector) through the origin. The resulting transformation is a smooth monotonic piecewise polynomial of the chosen Degree. The pieces are specified by the user-specified number and procedure-determined placement of the Interior knots.n Spline nominal. The only information in the observed variable that is preserved in
the optimally scaled variable is the grouping of objects in categories. The order of the categories of the observed variable is not preserved. Category points will be on a straight line (vector) through the origin. The resulting transformation is a smooth, possibly non-monotonic, piecewise polynomial of the chosen Degree. The pieces are specified by the user-specified number and procedure-determined placement of the Interior knots.n Multiple nominal. The only information in the observed variable that is preserved in
the optimally scaled variable is the grouping of objects in categories. The order of the categories of the observed variable is not preserved. Category points will be in the centroid of the objects in the particular categories. Multiple indicates that different sets of quantifications are obtained for each dimension.n Ordinal. The order of the categories of the observed variable is preserved in the
optimally scaled variable. Category points will be on a straight line (vector) through the origin. The resulting transformation fits better than the spline ordinal transformation but is less smooth.n Nominal. The only information in the observed variable that is preserved in the
optimally scaled variable is the grouping of objects in categories. The order of the categories of the observed variable is not preserved. Category points will be on a straight line (vector) through the origin. The resulting transformation fits better than the spline nominal transformation but is less smooth.
n Numerical. Categories are treated as ordered and equally spaced (interval level).
The order of the categories and the equal distances between category numbers of the observed variable are preserved in the optimally scaled variable. Category points will be on a straight line (vector) through the origin. When all variables are at the numerical level, the analysis is analogous to standard principal components analysis.
To Define the Scale and Weight in CATPCA Select a variable in the Analysis Variables list in the Principal Components with
Optimal Scaling dialog box. Click Define Scale and Weight. Figure 3 Categorical Principal Components: Define Scale and Weight dialog box
Enter the weight value for the variable. Select the optimal scaling level to be used in the analysis. If you choose a spline
transformation, you must also specify the degree of the polynomial and the number of interior knots. Click Continue.
You can alternately define the scaling level for supplementary variables by selecting them from the list and clicking Define Scale.
Categorical Principal Components DiscretizationThe Discretization dialog box allows you to select a method of recoding your variables. Fractional-valued variables are grouped into seven categories (or into the number of distinct values of the variable if this number is less than seven) with an approximately normal distribution, unless specified otherwise. String variables are always converted into positive integers by assigning category indicators according to ascending alphanume