Iris - eio.usc.eseio.usc.es/eipc1/BASE/BASEMASTER/FORMULARIOS-PHP... · Resumen de las funciones...

Análisis factorial discriminante con el SPSS

Iris.sav • Realizamos un análisis factorial discriminante para investigar si la

especie de la flor puede explicarse razonablemente teniendo presente la

longitud y anchura de sus pétalos y sépalos.

• La respuesta es Species= Setosa (1), Versicolor (2), Virginica (3).

Las variables clasificadoras: sepal.l, sepal.w, petal.l, petal.w

Vamos a llevar a cabo un ACF, con las siguientes características:

1. Introducir todas las variables juntas en el modelo.

2. Estadísticos descriptivos:

a. Medias.

b. Coeficientes de la función de Fisher.

c. Covarianzas y correlaciones intra grupos.

d. Covarianzas de grupos separados.

3. Opciones de clasificación:

a. Estimación de las probabilidades según los tamaños de los

grupos (regla de clasificación de Bayes),

b. Utilizar las matrices de covarianzas con grupos separados.

4. Todos los posibles gráficos.

Analizar… Clasificar…

Discriminante….

En la tabla siguiente se muestran las medias, desviaciones típicas y tamaños muestrales de cada variable para cada uno de los grupos y también las totales.

Estadísticos de grupo

5.006 .3525 50 50.0003.428 .3791 50 50.0001.462 .1737 50 50.000

.246 .1054 50 50.0005.936 .5162 50 50.0002.770 .3138 50 50.0004.260 .4699 50 50.0001.326 .1978 50 50.0006.588 .6359 50 50.0002.974 .3225 50 50.0005.552 .5519 50 50.0002.026 .2747 50 50.0005.843 .8281 150 150.0003.057 .4359 150 150.0003.758 1.7653 150 150.0001.199 .7622 150 150.000

SEPAL.LSEPAL.WPETAL.LPETAL.WSEPAL.LSEPAL.WPETAL.LPETAL.WSEPAL.LSEPAL.WPETAL.LPETAL.WSEPAL.LSEPAL.WPETAL.LPETAL.W

SPECIESSetosa

Versicolor

Virginica

Total

Media Desv. típ.No

ponderados Ponderados

N válido (según lista)

Pruebas de igualdad de las medias de los grupos

.381 119.265 2 147 .000

.599 49.160 2 147 .000

.059 1180.161 2 147 .000

.071 960.007 2 147 .000

SEPAL.LSEPAL.WPETAL.LPETAL.W

Lambdade Wilks F gl1 gl2 Sig.

Los resultados siguientes aportan estimación conjunta de la matriz de varianzas covarianzas intra-grupos (W) y su correspondiente matriz de correlaciones.

Matrices intra-grupo combinadas a

.265 .093 .168 .038

.093 .115 .055 .033

.168 .055 .185 .043

.038 .033 .043 .0421.000 .530 .756 .365

.530 1.000 .378 .471

.756 .378 1.000 .484

.365 .471 .484 1.000

SEPAL.LSEPAL.WPETAL.LPETAL.WSEPAL.LSEPAL.WPETAL.LPETAL.W

Covarianza

Correlación

SEPAL.L SEPAL.W PETAL.L PETAL.W

La matriz de covarianza tiene 147 grados de libertada. Las estimaciones de la matriz de varianzas covarianzas de cada grupo por separado se observan en la siguiente tabla.

Matrices de covarianza a

.124 .099 .016 .010

.099 .144 .012 .009

.016 .012 .030 .006

.010 .009 .006 .011

.266 .085 .183 .056

.085 .098 .083 .041

.183 .083 .221 .073

.056 .041 .073 .039

.404 .094 .303 .049

.094 .104 .071 .048

.303 .071 .305 .049

.049 .048 .049 .075

.686 -.042 1.274 .516-.042 .190 -.330 -.1221.274 -.330 3.116 1.296.516 -.122 1.296 .581

SEPAL.LSEPAL.WPETAL.LPETAL.WSEPAL.LSEPAL.WPETAL.LPETAL.WSEPAL.LSEPAL.WPETAL.LPETAL.WSEPAL.LSEPAL.WPETAL.LPETAL.W

SPECIESSetosa

Versicolor

Virginica

Total

SEPAL.L SEPAL.W PETAL.L PETAL.W

La matriz de covarianza total presenta 149 grados de libertad.a.

Prueba de Box sobre la igualdad de las matrices de covarianza

Logaritmo de los determinantes

4 -13.0674 -10.8744 -8.9274 -9.959

SPECIESSetosaVersicolorVirginicaIntra-grupos combinada

RangoLogaritmo deldeterminante

Los rangos y logaritmos naturales de los determinantesimpresos son los de las matrices de covarianza de los grupos.

Resultados de la prueba

146.6637.045

2077566.75

.000

M de BoxAprox.gl1gl2Sig.

F

Contrasta la hipótesis nula de que las matricesde covarianza poblacionales son iguales.

En este caso, dado que el p-valor es aproximadamente 0.000, resulta que la estructura de covarianza para los distintos grupos es diferente, es decir, no todos los grupos tienen la misma matriz de covarianzas.

Resumen de las funciones canónicas discriminantes Autovalores

32.192a 99.1 99.1 .985.285a .9 100.0 .471

Función12

Autovalor% de

varianza % acumuladoCorrelacióncanónica

Se han empleado las 2 primeras funciones discriminantescanónicas en el análisis.

a.

Lambda de Wilks

.023 546.115 8 .000

.778 36.530 3 .000

Contraste delas funciones1 a la 22

Lambdade Wilks Chi-cuadrado gl Sig.

• El contraste de la Lambda de Wilks nos muestra como tanto los dos

primeros factores discriminantes, examinados conjuntamente, como el segundo de ellos por separado, presentan p-valores de aproximadamente cero.

• Esto implica que existen diferencias significativas entre las tres especies,

tanto en lo tocante a los dos factores discriminantes simultáneamente como respecto del segundo factor discriminante solamente. Es decir, ambos factores aportan discriminación estadísticamente significativa.

Los valores de la tabla siguiente son los coeficientes estandarizados de los dos primeros factores discriminantes.

Coeficientes estandarizados de las

funciones discriminantes canónicas

-.427 .012-.521 .735.947 -.401.575 .581

SEPAL.LSEPAL.WPETAL.LPETAL.W

1 2Función

Los valores de la tabla siguiente son los coeficientes de correlación entre cada variable en cuestión y cada una de las dos funciones discriminantes. Se marca con un asterisco aquella variable que es mayor de las dos (por filas) y ordenando las variables para que queden agrupadas con respecto a cuál de los factores parece asociarse.

Matriz de estructura

.706* .168-.119 .864*.633 .737*.223 .311*

PETAL.LSEPAL.WPETAL.WSEPAL.L

1 2Función

Correlaciones intra-grupo combinadas entre lasvariables discriminantes y las funcionesdiscriminantes canónicas tipificadas Variables ordenadas por el tamaño de lacorrelación con la función.

Mayor correlación absoluta entre cadavariable y cualquier función discriminante.

*.

1. En este caso vemos como la longitud del pétalo es la variable que más claramente se asocia al primer factor (además positivamente).

2. La anchura del pétalo se asocia positivamente a este primer factor

pero en menor medida.

3. Por su parte, el segundo factor discriminante presenta una correlación muy importante (positiva) con las anchuras del sépalo y del pétalo.

4. La correlación de este segundo factor con las longitudes es menos

importante.

Aquí se presentan (para cada uno de los grupos) las coordenadas de los centroides, que son las medias de los dos factores discriminantes. Una representación gráfica de los mismos se puede ver en el mapa territorial.

Funciones en los centroides de los grupos

-7.608 .2151.825 -.7285.783 .513

SPECIESSetosaVersicolorVirginica

1 2Función

Funciones discriminantes canónicas no tipificadasevaluadas en las medias de los grupos

Covarianzas de grupo de las funciones

discriminantes canónicas

.718 -.534-.534 .8351.074 .243.243 .763

1.208 .292.292 1.402

Función121212

SPECIESSetosa

Versicolor

Virginica

1 2

La matriz de covarianza intra-grupo combinadade las funciones canónicas discriminantes espor definición una matriz identidad.

Prueba de Box sobre la igualdad de las matrices de covarianza de las funciones canónicas discriminantes.

Resultados de la prueba

46.8627.657

6538562.8

.000

M de BoxAprox.gl1gl2Sig.

F

Contrasta la hipótesis nula de que las matrices decovarianza pertenencen a poblaciones igualespara las funciones canónicas discriminantes.

Estadísticos de clasificación

Probabilidades previas para los grupos

.333 50 50.000

.333 50 50.000

.333 50 50.0001.000 150 150.000

SPECIESSetosaVersicolorVirginicaTotal

PreviasNo

ponderados Ponderados

Casos utilizados en elanálisis

Coeficientes de la función de clasificación

23.544 15.698 12.44623.588 7.073 3.685

-16.431 5.211 12.767-17.398 6.434 21.079-86.308 -72.853 -104.368

SEPAL.LSEPAL.WPETAL.LPETAL.W(Constante)

Setosa Versicolor VirginicaSPECIES

Funciones discriminantes lineales de Fisher

Para clasificar, simplemente calcularíamos las tres combinaciones lineales y

observaríamos cuál es la de mayor valor.

Mapa territorial Discriminante canónica Función 2

-12.0 -8.0 -4.0 .0 4.0 8.0 12.0 ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô

12.0 ô 13 ô ó 13 ó ó 13 ó ó 13 ó ó 13 ó ó 13 ó

8.0 ô ô 13 ô ô ô ô ô ó 13 ó ó 13333 ó ó 122233333 ó ó 12 222223333 ó ó 12 2222333 ó

4.0 ô ô 12ô 22233 ô ô ô ó 12 2233 ó ó 12 223 ó ó 12 23 ó ó 12 23 ó

ó 12 23 * ó

.0 ô ô* 12 ô 23 ô ô

ó 12 * 23 ó ó 12 23 ó ó 12 23 ó ó 12 23 ó ó 12 23 ó

-4.0 ô ô ô 12 ô 23ô ô ô ó111 12 23 ó ó2221111 12 223 ó ó 22221111 12 233 ó ó 222211111 12 23 ó ó 2222211111112 223 ó

-8.0 ô ô 2222222 2233 ô ô ô ó 2233 ó ó 2233 ó ó 2233 ó ó 22233 ó ó 22333 ó

-12.0 ô 22233 ô ôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòôòòòòòòòòòô

-12.0 -8.0 -4.0 .0 4.0 8.0 12.0 Función discriminante canónica 1

_ Símbolos usados en el mapa territorial Símbolo Grupo Etiqueta ------ ----- -------------------- 1 1 Setosa 2 2 Versicolor 3 3 Virginica * Indica un centroide de grupo En este mapa se representan las tres regiones en las que se clasificaría una futura observación a cada uno de los tres grupos (de las tres especies, en este caso). Como se observa, los límites de las regiones no son líneas rectas ya que no es la regla discriminante de Fisher la que se está usando, pues hemos permitido que utilice matrices de varianzas covarianzas distintas (tal como indicamos en el menú de entrada de datos).

Gráficos por grupos separados


SPECIES = Setosa

Función 1

-5-6-7-8-9-10

Func

ión

2

3

2

1

0

-1

-2

Centroide de grupo

Centroide de grupo

Setosa


SPECIES = Versicolor

Función 1

543210-1

Func

ión

2

2

1

0

-1

-2

-3

Centroide de grupo

Centroide de grupo

Versicolor


SPECIES = Virginica

Función 1

109876543

Func

ión

2

3

2

1

0

-1

-2

-3

Centroide de grupo

Centroide de grupo

Virginica

Las tres gráficas anteriores muestran diagramas de dispersión de los dos factores discriminantes para cada una de las especies.


Función 1

100-10-20

Func

ión

2

3

2

1

0

-1

-2

-3

SPECIES

Centroides de grupo

Virginica

Versicolor

Setosa

Virginica

Versicolor

Setosa

En la gráfica de arriba se observa cómo la clasificación de las tres especies es (visualmente) óptima, pues los datos de las distintas especies apenas se entremezclan.

Resultados de la clasificacióna

50 0 0 500 47 3 500 1 49 50

100.0 .0 .0 100.0.0 94.0 6.0 100.0.0 2.0 98.0 100.0

SPECIESSetosaVersicolorVirginicaSetosaVersicolorVirginica

Recuento

%

OriginalSetosa Versicolor Virginica

Grupo de pertenencia pronosticadoTotal

Clasificados correctamente el 97.3% de los casos agrupados originales.a.

Se observa como la probabilidad de clasificación incorrecta es del orden de un 0.7%. El error mayor se comete al clasificar un 6% de casos de la especie Versicolor como Virginica. La especie Setosa se clasifica correctamente en el 100% de los casos y la Virginica en un 98%.

También podemos guardar el grupo pronosticado (grupo_pr) y las puntuaciones discriminantes (score_1 y score_2)

505050N =

VirginicaVersicolorSetosa

scor

e_1

10

5

0

-5

-10

-15505050N =

VirginicaVersicolorSetosa

scor

e_2

3.0

2.5

2.0

1.5

1.0

.5

0.0

-.5

-1.0

-1.5

-2.0

-2.5

-3.0

Análisis factorial discriminante con R Se utiliza la function lda

>iris.discrim<-lda(SPECIES ~ SEPAL.L + SEPAL.W + PETAL.L + PETAL.W, data = iris)

> iris.discrim

Call:

lda(SPECIES ~ SEPAL.L + SEPAL.W + PETAL.L + PETAL.W, data = iris)

Prior probabilities of groups: Setosa Versicolor Virginica 0.3333333 0.3333333 0.3333333 Group means: SEPAL.L SEPAL.W PETAL.L PETAL.W Setosa 5.006 3.428 1.462 0.246 Versicolor 5.936 2.770 4.260 1.326 Virginica 6.588 2.974 5.552 2.026 Coefficients of linear discriminants: LD1 LD2 SEPAL.L 0.8293776 - 0.02410215 SEPAL.W 1.5344731 -2.16452123 PETAL.L -2.2012117 0.93192121 PETAL.W -2.8104603 -2.83918785 Proportion of trace: LD1 LD2 0.9912 0.0088

predict.lda: Classify multivariate observations in conjunction with lda, and also project data onto the linear discriminants. group<-predict(iris.discrim, method="plug-in")$class table(group,iris$SPECIES) group Setosa Versicolor Virginica Setosa 50 0 0 Versicolor 0 48 1 Virginica 0 2 49

plot(iris.discrim)

-5 0 5 10

-6-4

-20

24

6

LD1

LD2 Setosa

Setosa

Setosa

Setosa Setosa

Setosa

SetosaSetosa

Setosa

Setosa

SetosaSetosaSetosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

SetosaSetosa

Setosa

Setosa

SetosaSetosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

Setosa

SetosaSetosa

Setosa

Setosa

Setosa

Setosa

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

VersicolorVersicolor


Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor


Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Versicolor Versicolor




Versicolor

Versicolor

Versicolor

Versicolor

Versicolor

Virginica

Virginica

Virginica

Virginica

Virginica

VirginicaVirginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

VirginicaVirginica

Virginica

VirginicaVirginica

Virginica

Virginica

Virginica Virginica

Virginica

Virginica

Virginica

Virginica

VirginicaVirginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

Virginica

DISCRIMINANTE EN R lda package:MASS R Documentation Linear Discriminant Analysis Description: Linear discriminant analysis. Arguments: formula: A formula of the form 'groups ~ x1 + x2 + ...' That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators. data: Data frame from which variables specified in 'formula' are preferentially to be taken. x: (required if no formula is given as the principal argument.) a matrix or data frame or Matrix containing the explanatory variables. grouping: (required if no formula principal argument is given.) a factor specifying the class for each observation. prior: the prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels. tol: A tolerance to decide if a matrix is singular; it will reject variables and linear combinations of unit-variance variables whose variance is less than 'tol^2'. subset: An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) na.action: A function to specify the action to be taken if 'NA's are found. The default action is for the procedure to fail. An alternative is 'na.omit', which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)

method: '"moment"' for standard estimators of the mean and variance, '"mle"' for MLEs, '"mve"' to use 'cov.mve', or '"t"' for robust estimates based on a t distribution. CV: If true, returns results (classes and posterior probabilities) for leave-one-out cross-validation. Note that if the prior is estimated, the proportions in the whole dataset are used. nu: degrees of freedom for 'method = "t"'. ...: arguments passed to or from other methods. Details: The function tries hard to detect if the within-class covariance matrix is singular. If any variable has within-group variance less than 'tol^2' it will stop and report the variable as constant. This could result from poor scaling of the problem, but is more likely to result from constant variables. Specifying the 'prior' will affect the classification unless over-ridden in 'predict.lda'. Unlike in most statistical packages, it will also affect the rotation of the linear discriminants within their space, as a weighted between-groups covariance matrix is used. Thus the first few linear discriminants emphasize the differences between groups with the weights given by the prior, which may differ from their prevalence in the dataset. If one or more groups is missing in the supplied data, they are dropped with a warning, but the classifications produced are with respect to the original set of levels. Value: If 'CV = TRUE' the return value is a list with components 'class', the MAP classification (a factor), and 'posterior', posterior probabilities for the classes. Otherwise it is an object of class '"lda"' containing the following components: prior: the prior probabilities used. means: the group means. scaling: a matrix which transforms observations to discriminant functions, normalized so that within groups covariance matrix is spherical.

svd: the singular values, which give the ratio of the between- and within-group standard deviations on the linear discriminant variables. Their squares are the canonical F-statistics. N: The number of observations used. call: The (matched) function call. Note: This function may be called giving either a formula and optional data frame, or a matrix and grouping factor as the first two arguments. All other arguments are optional, but 'subset=' and 'na.action=', if required, must be fully named. If a formula is given as the principal argument the object may be modified using 'update()' in the usual way. References: Venables, W. N. and Ripley, B. D. (2002) _Modern Applied Statistics with S._ Fourth edition. Springer. Ripley, B. D. (1996) _Pattern Recognition and Neural Networks_. Cambridge University Press. See Also: 'predict.lda', 'qda', 'predict.qda'

predict.lda(MASS) R Documentation

Classify Multivariate Observations by Linear Discrimination

Description

Classify multivariate observations in conjunction with lda, and also project data onto the linear discriminants.

Usage

## S3 method for class 'lda': predict(object, newdata, prior = object$prior, dimen, method = c("plug-in", "predictive", "debiased"), ...)

Arguments

object object of class "lda" newdata data frame of cases to be classified or, if object has a formula, a data frame with

columns of the same names as the variables used. A vector will be interpreted as a row vector. If newdata is missing, an attempt will be made to retrieve the data used to fit the lda object.

prior The prior probabilities of the classes, by default the proportions in the training set or what was set in the call to lda.

dimen the dimension of the space to be used. If this is less than min(p, ng-1), only the first dimen discriminant components are used (except for method="predictive"), and only those dimensions are returned in x.

method This determines how the parameter estimation is handled. With "plug-in" (the default) the usual unbiased parameter estimates are used and assumed to be correct. With "debiased" an unbiased estimator of the log posterior probabilities is used, and with "predictive" the parameter estimates are integrated out using a vague prior.

... arguments based from or to other methods

Details

This function is a method for the generic function predict() for class "lda". It can be invoked by calling predict(x) for an object x of the appropriate class, or directly by calling predict.lda(x) regardless of the class of the object.

Missing values in newdata are handled by returning NA if the linear discriminants cannot be evaluated. If newdata is omitted and the na.action of the fit omitted cases, these will be omitted on the prediction.

This version centres the linear discriminants so that the weighted mean (weighted by prior) of the group centroids is at the origin.

Value

a list with components

class The MAP classification (a factor) posterior posterior probabilities for the classes x the scores of test cases on up to dimen discriminant variables

References

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge University Press.

See Also

lda, qda, predict.qda

Examples

data(iris3) tr <- sample(1:50, 25) train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3]) test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3]) cl <- factor(c(rep("s",25), rep("c",25), rep("v",25))) z <- lda(train, cl) predict(z, test)$class

[Package MASS version 7.2-40 Index]

Iris - eio.usc.eseio.usc.es/eipc1/BASE/BASEMASTER/FORMULARIOS-PHP... · Resumen de las funciones...

Documents

Transcript of Iris - eio.usc.eseio.usc.es/eipc1/BASE/BASEMASTER/FORMULARIOS-PHP... · Resumen de las funciones...