Multi Variate Part 2

download Multi Variate Part 2

of 57

Transcript of Multi Variate Part 2

  • 8/13/2019 Multi Variate Part 2

    1/57

    Multivariate Data Analysis Using

    Statgraphics Centurion: Part 2

    Dr. Neil W. Polhemus

    Statpoint Technologies, Inc.

    1

  • 8/13/2019 Multi Variate Part 2

    2/57

    Multivariate Statistical Methods

    The simultaneous observation and analysis of more than one responsevariable.

    *Primary Uses

    1. Data reduction or structural simplification2. Sorting and grouping

    3. Investigation of the dependence among variables

    4. Prediction

    5. Hypothesis construction and testing

    *Johnson and Wichern, Applied Multivariate Statistical Analysis

    2

  • 8/13/2019 Multi Variate Part 2

    3/57

    Data set: U.S. Demographics

    3

  • 8/13/2019 Multi Variate Part 2

    4/57

    Variables

    Education: percent completing college

    Income per Capita: in 2011 dollars

    Urban Percentage: percent living in urban areas

    Population Density: people per square mile

    Non-English Language: percent that dont speak English

    Median Age: in years

    Unemployment Rate: for 2012

    Percent Female: of general population

    Crime Rate: violent crimes per 100,000

    Manufacturing: % of GDP

    Infant Mortality: 2006

    Poverty Rate: percentage below poverty level

    Election 2012: results of 2012 presidential election (Obama or Romney)

    4

  • 8/13/2019 Multi Variate Part 2

    5/57

    Urban Percentage

    5

  • 8/13/2019 Multi Variate Part 2

    6/57

  • 8/13/2019 Multi Variate Part 2

    7/57

    Poverty Rate

    7

  • 8/13/2019 Multi Variate Part 2

    8/57

    Star Glyphs

    8

  • 8/13/2019 Multi Variate Part 2

    9/57

    Data Input

    9

  • 8/13/2019 Multi Variate Part 2

    10/57

    10

  • 8/13/2019 Multi Variate Part 2

    11/57

    Cluster Analysis

    Method for assigning objects to groups so thatobjects in the same group are more similar toeach other than they are to objects in other

    groups.

    Some questions:

    1. How many natural groups are there?

    2. Which objects belong to each group?

    11

  • 8/13/2019 Multi Variate Part 2

    12/57

    Data Input

    12

  • 8/13/2019 Multi Variate Part 2

    13/57

  • 8/13/2019 Multi Variate Part 2

    14/57

    Dendrogram Nearest Neighbor

    14

  • 8/13/2019 Multi Variate Part 2

    15/57

    Dendrogram Furthest Neighbor

    15

    6

  • 8/13/2019 Multi Variate Part 2

    16/57

    Agglomeration Distance Plot

    16

  • 8/13/2019 Multi Variate Part 2

    17/57

    Dendrogram - 4 Clusters

    17

    8

  • 8/13/2019 Multi Variate Part 2

    18/57

    Save Cluster Numbers

    18

  • 8/13/2019 Multi Variate Part 2

    19/57

    Map of Cluster Numbers

    19

    20

  • 8/13/2019 Multi Variate Part 2

    20/57

    Method of k-Means

    1. Starts with a set of typical objects (seeds) foreach cluster.

    2. Each object is assigned to group with nearest

    seed.3. Centroids of groups are computed.

    4. Objects are assigned to different groups ifnearer to a different centroid.

    5. Steps 3 and 4 are repeated until no change.

    20

    21

  • 8/13/2019 Multi Variate Part 2

    21/57

    Method of k-Means

    21

    22

  • 8/13/2019 Multi Variate Part 2

    22/57

    Seeds

    22

    Row 5 = California

    Row 32 = New York

    Row 34 = North Dakota

    Row 43 = Texas

    23

  • 8/13/2019 Multi Variate Part 2

    23/57

    Map for k-Means

    23

    24

  • 8/13/2019 Multi Variate Part 2

    24/57

    Classification

    The classification problem deals with theassignmentof objects to knowngroups. Goalsinclude:

    1. Understanding what differentiates the groups.

    2. Predicting group membership for unclassifiedobjects.

    24

    25

  • 8/13/2019 Multi Variate Part 2

    25/57

    Methods

    Discriminant Analysis (on the CenturionRelatemenu) based on Fishers lineardiscriminant functions.

    Neural Network Classifier (on the CenturionRelatemenu) based on a Bayesian classifierwith priors and costs.

    Uniwin(companion program to Statgraphics) includes methods for dealing with qualitative

    factors.

    25

    26

  • 8/13/2019 Multi Variate Part 2

    26/57

    (Linear) Discriminant Analysis

    Constructs linear functions of the standardizedclassification variables:

    These functions are derived so as to maximize theseparation of the groups. If there are ggroups,there areg-1 discriminant functions.

    26

    pjpjjj ZdZdZdD ...

    2211

    27

  • 8/13/2019 Multi Variate Part 2

    27/57

    Example

    27

    28

  • 8/13/2019 Multi Variate Part 2

    28/57

    Data Input

    28

    29

  • 8/13/2019 Multi Variate Part 2

    29/57

    Analysis Options

    29

    30

  • 8/13/2019 Multi Variate Part 2

    30/57

    All Variables

    30

    31

  • 8/13/2019 Multi Variate Part 2

    31/57

    Discriminant Function Plot

    31

    32

  • 8/13/2019 Multi Variate Part 2

    32/57

    Backward selection

    32

    33

  • 8/13/2019 Multi Variate Part 2

    33/57

    Classification Functions

    These functions create a score for each group:

    Objects are assigned to whatever group has thelargest value of (Cj* priorj) wherepriorjis theprior probability of belonging to groupj.

    33

    02211 ... jpjpjjj cXcXcXcC

    34

  • 8/13/2019 Multi Variate Part 2

    34/57

    Classification Functions

    34

    35

  • 8/13/2019 Multi Variate Part 2

    35/57

    Classification Summary

    35

    36

  • 8/13/2019 Multi Variate Part 2

    36/57

    Classification Table

    3

    37

  • 8/13/2019 Multi Variate Part 2

    37/57

    Validation Set

    May leave some of the objects out of the trainingset and use them to validate the results.

    37

    Correctly classified 8 out of 10 omitted states.

    38

  • 8/13/2019 Multi Variate Part 2

    38/57

    Neural Network Classifier

    Implements a nonparametric method forclassifying objects based on the product of 3quantities:

    1. The estimated density function in the neighborhood of the object(given a specified value of s).

    2. The prior probabilities of belonging to each group.

    3. The costs of misclassifying cases that belong to a given group.

    3

    39

  • 8/13/2019 Multi Variate Part 2

    39/57

    Bivariate Density (s= 0.5)

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0123

    45

    (X 0.001)

    density

    Romney

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0123

    45

    (X 0.001)

    density

    Romney

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0123

    45

    (X 0.001)

    density

    Romney

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0123

    45

    (X 0.001)

    density

    Romney

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0123

    45

    (X 0.001)

    density

    Romney

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0123

    45

    (X 0.001)

    density

    Romney

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0123

    45

    (X 0.001)

    density

    RomneyObama

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0

    1

    2

    3

    4(X 0.001)

    density

    Obama

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0

    1

    2

    3

    4(X 0.001)

    density

    Obama

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0

    1

    2

    3

    4(X 0.001)

    density

    Obama

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0

    1

    2

    3

    4(X 0.001)

    density

    Obama

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0

    1

    2

    3

    4(X 0.001)

    density

    Obama

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0

    1

    2

    3

    4(X 0.001)

    density

    Obama

    Bivariate Density

    3555

    7595

    115Urban Percentage

    8

    12

    16

    20

    24

    Poverty rate0

    1

    2

    3

    4(X 0.001)

    density

    40

  • 8/13/2019 Multi Variate Part 2

    40/57

    Schematic Diagram

    41

  • 8/13/2019 Multi Variate Part 2

    41/57

    Data Input

    42

  • 8/13/2019 Multi Variate Part 2

    42/57

    Analysis Options

    43

  • 8/13/2019 Multi Variate Part 2

    43/57

    Classification Summary

    44

  • 8/13/2019 Multi Variate Part 2

    44/57

    Reduced Set of Variables

    45

  • 8/13/2019 Multi Variate Part 2

    45/57

    Some Improvement

    46

  • 8/13/2019 Multi Variate Part 2

    46/57

    Classification Plot

    47

  • 8/13/2019 Multi Variate Part 2

    47/57

    Classification Plot

    48

  • 8/13/2019 Multi Variate Part 2

    48/57

    UNIWIN Plus from Sigma Plus

    Software package written by our longtimecolleague Christian Charles at Sigma Plus inFrance.

    Contains additional features for multivariateanalysis.

    Reads Statgraphics data files.

    49

  • 8/13/2019 Multi Variate Part 2

    49/57

    Uniwin Cluster and Classification

    Handle qualitative variables.

    50

  • 8/13/2019 Multi Variate Part 2

    50/57

    New Variables

    51

  • 8/13/2019 Multi Variate Part 2

    51/57

    Uniwin Cluster Analysis

    52

  • 8/13/2019 Multi Variate Part 2

    52/57

    Dendrogram

    53

  • 8/13/2019 Multi Variate Part 2

    53/57

    Uniwin Clusters

    54

  • 8/13/2019 Multi Variate Part 2

    54/57

    Uniwin Classification

    55

  • 8/13/2019 Multi Variate Part 2

    55/57

    Classification Results

    Wrongly classified only Nevada (only cattle state to vote for Obama).

    56

  • 8/13/2019 Multi Variate Part 2

    56/57

    More Information

    Statgraphics Centurion:www.statgraphics.com

    Uniwin Plus:www.statgraphics.frorwww.sigmaplus.fr

    Or send e-mail to [email protected]

    http://www.statgraphics.com/http://www.statgraphics.fr/http://www.sigmaplus.fr/mailto:[email protected]:[email protected]://www.sigmaplus.fr/http://www.statgraphics.fr/http://www.statgraphics.com/
  • 8/13/2019 Multi Variate Part 2

    57/57

    Join the Statgraphics Community on:

    Follow us on

    https://twitter.com/STATGRAPHICShttp://www.linkedin.com/groups?gid=1796550https://www.facebook.com/pages/STATGRAPHICS/79706768276