Using SPSS - York University SPSS.pdf · Using SPSS Starting SPSS 1. From the Windows start menu...

1

Using SPSS

Starting SPSS 1. From the Windows start menu choose: 2. SPSS for Windows 3. then SPSS for Windows.

or If there is an SPSS icon on the desktop, you can start the program by clicking it. SPSS is a big program and can take a while to load. You know that the program is loading if the pointer changes to an hourglass. When the loading is completed the Data Editor window will appear. This is the main SPSS screen, with several toolbars and sometimes buttonbars across the top. You can adjust what is visible using the options on the View menu.

Note that there are two panels in the Data Editor, and you can easily see which one you are in by looking at the tabs at the bottom of the window. One panel in the Data View, and this is where you can enter and examine the information you want to analyse. The other panel is the Variable View, which is where you tell SPSS what the columns of numbers mean. To change between the panels you simply click on the tab that you want. You can move backwards and forwards between the Data View and the Variable View panels easily, but the Data View is usually the more useful panel.

Opening a Data File There are several ways to open a data file which already exists. From the menu of SPSS choose:

1. File 2. Open 3. Data…

2

Alternatively you can click the Open File icon on the buttonbar

The Open File dialog box will appear and you can select the file as usual. By default, only files with the SPSS suffix .sav will appear. You can make others appear by changing the type of file option using the drop-down list.

Creating a Data File In most cases you will start working by creating a new SPSS data file, with the information that you wish to analyse. You enter your data into the Data View panel of the editor. Click on the Data View tab at the bottom-left of the display if necessary.

Defining Variables The first step to creating a data file to analyse is to define the variables in your data set. As part of this introduction we are going to use the following dataset. Table 1 Dataset for Workshop use. Taken from http://webpub.allegheny.edu/dept/psych/SPSS/SPSSIntro.html

ID STUDY HOURS GPA GENDER 1 32 3.6 M 2 16 3.5 F 3 21 2.8 M 4 23 3.7 F 6 8 3.5 F 7 4 3.7 F 8 10 2.5 M

3

10 15 2.3 F 11 31 3.0 F 12 40 3.9 M 13 5 3.1 F 14 28 2.7 M 15 15 2.3 F To define the first variable, ID, click on the variable view tab on the bottom-left of the Data Editor display. Your window should now look like this

First, we will want to give the variable a name that is useful to us. Click on the Name column, and type id (see figure below). Then press the Enter key.

In the case of id we want the entries to be whole numbers, without decimal places and either one or two numerals in length. To do this click, click the small square with three dots in the Type column next to the word Numeric. The panel seen below will appear.

4

We want to treat this data as a number, so leave the dot next to Numeric. But we want just two digits maximum and no decimal places. Make those changes in the appropriate boxes and then press OK. Notice that the Width and Decimals designations for id have changed on the Variable View page.

This is all we need to do with this variable. Using these procedures, name the second column StudyHr. It should also be numeric with a format of 2.0. Note: Even if you type the variable name StudyHrs SPSS changes it to studyhrs. And remember that the maximum length of a variable name is 8 characters. The next variable we want to define is grade point average (GPA). Call this new variable GPA (or gpa) and set it for a width of two digits with one place after the decimal. Sometimes it is easy to forget what the variable is from an abbreviated name (max 8 characters). To get over this problem we use the label column to fill in a longer, more informative name. In this case, set the label for gpa to Grade Point Average. Now,

5

when analyses are performed, the extended label Grade Point Average will be included, reminding you of the meaning of the variable.

The last variable is gender. Although it is possible to use M and F for male and female, it is better to code this information in a numeric form e.g. 0 for males and 1 for females. Begin by defining the variable as before defining its type as numeric 1.0. Give it a more complete description in the label column by typing Gender of Participants. But how do we remember what the 0 and 1 stands for? Select the Values box for the Gender variable. A small box with the three dots should appear, and clicking on this makes a menu like the one below appear.

6

Now, enter one variable value e.g. 0 in the value box, and the information it represents in the Label box e.g. male and click on the Add button. Do the same for ‘1’ and ‘female’ and add those to the list.

Click OK when you have finished adding values and labels. We now have a complete definition of the possible values and their meanings for the Gender variable.

Entering the Data We have now set up each of the variables in our dataset and it is time to add the data for each individual. Click the Data View tab at the bottom-left of the Editor display screen. Click on the first cell, and enter 1 in the id column and press enter.

One of the advantages of defining your variables before you begin entering data is that it provides some checks on the data that you enter. For example, try to enter a 4-digit number in the ID column. Rather than entering an incorrect number, SPSS just puts ** in the cell to remind you that incorrect data has been entered.

Continue typing the correct information from the dataset into the Data View of the Editor.

7

Saving Your Data After the data has been entered into the spreadsheet you should save the data, even if you plan to do all your analyses before you log off. There are several ways to save your data. You can select the File menu at the top of the SPSS window, and then select Save (or Save As) and you will see a window like this:

To save the data just click on the File Name box and enter a name for this file. Press enter and it will be saved in the folder indicated at the top of the panel, in a file with the name you have given it and .sav on the end.

Exiting SPSS Now that we have entered our data and saved it, we can quit SPSS. Just go to the pull-down menu File and select Exit. This will close the main window and unload SPSS.

Entering Non-numeric (String) Data In addition to computing numeric data, SPSS can handle label data. In the example dataset below (based on one taken from http://webpub.allegheny.edu/dept/psych/SPSS/SPSSIntro.html#Defining%20Variables) the data is the number of students who graduated from each major in 1998. Major Number of 1998 graduates Art 10 Biology 51 Chemistry 12

8

Classics 0 Communication Arts 21 Computer Science 8 Economics 26 English 21 Environmental Science/Studies 54 Geology 6 History 15 International Studies 4 Mathematics 6 Modern Languages 6 Music 2 Neuroscience 12 Philosophy 3 Physics 6 Political Science 28 Psychology 46 Religious Studies 4 Sociology/Anthropology 9 Student-Designed Majors 2 Women’s Studies 1 Total: 353

First we need to define the variables. The first is the Majors and this should be defined using the Variable Viewer panel. Remembering that variable names cannot be longer than 8 characters, name the first variable “major”. Click on the three small dots in the Type column (next to Numeric) and change the variable to "string" (a "string" variable is one that is characterized by a word or set of letters (that is, not numbers). In the Labels column, call this variable "Allegheny College Major" – this command allows us to give a fuller description of the variable than we can give with just the name.

When you have entered the data into the Data View screen, your data editor should now look like this:

When you have entered the data into the Data View screen, your data editor should now look like this:

9

Now we need to enter the actual number of students in each major for 1998. Follow the same procedure using the second column, except that this variable name should be "grad98" (note: you cannot begin a variable name with numbers, so we have to start with letters). This variable is a numeric variable, and since it is the number of graduates, it is always going to be a whole number, and we can type 0 as the number of decimal places. Give it the label "1998 graduates", and when you are finished, begin to type in the second column of data.

Be careful to check your numbers after you have entered them, though – entering the data is where most errors occur.

Descriptive Statistics There are variety of ways that can be used to compare distributions, including dot plots (to get an idea of the spread of the data) as well as the more traditional summary statistics (mean, median, etc.). Moreover, descriptive statistics include bar charts and histograms, as well as finding relationships between variables using visual association & correlation. Let's begin by assuming that we have a data set that contains hypothetical scores on an exam for 11 different introductory psychology classes. There were 100 points possible on the exam, and there were 50 students in each class. We want to compare the performance across classes. Frequency Counts Using Dot Plots Dot Plots can be used to provide a simple frequency chart for each class thus enabling us to compare the resulting distributions to each other. Creating these simple frequency charts is very easy and is best done using the Interactive Graphs menu. To begin, simply click on the Graphs menu and choose Interactive. In that sub menu, choose Dot. The kind of chart we are going to create is called a "dotplot", in which each

10

dot represents a number (or several numbers, if they happen to be the same) in the distribution. When the Create Dots panel appears, you will see a list of all the variables (the classes in our case) in the left window and spaces representing the vertical and horizontal axes in the right window.

All of the Interactive Graphs are done with a drag-and-drop technique (see the description of preparing histograms for more details). Simply highlight the class that you want to examine (Class A, in this case), and drag it into the box for the horizontal axis. When you have done that, click OK, and the chart will pop up in the Output window. The window also provides a number of options for adding titles, lines, and various other accessories to your chart. Now do the same thing for Classes B and C. (Notice that when you choose another variable to plot, it automatically replaces the one in the horizontal axis box.) Fortunately, SPSS makes it extremely easy for you to examine these charts and compare them. To do so, simply locate the small icon in the left panel that represents each of the charts. When you single-click on one, you will see it in the Output window.

Numerical Summaries of Distributions To produce numerical summaries of any distribution just go to the Analyze menu, choose Descriptive Statistics, and then choose Frequencies. When you get the Frequencies dialogue box, simply highlight each variable name, and then click on the arrow to put it in the right side of the box. You can do the variables one at a time or all at once. Then click on the box labeled Statistics. This will take you to a dialogue box labeled Frequencies: Statistics, where you can decide which descriptive statistics you would like to have calculated.. When you selected the statistics you want, click Continue. Then click Ok, and SPSS will calculate the descriptive statistics.

11

Bar Charts One of the best ways to present data is by using a bar chart or a histogram. Bar charts are appropriate when you have discrete sets of data (males, females) while histograms should be used when you have a continuous variable (grade point average). Below are instructions on how to produce each type using SPSS.

Consider a situation where we have data on Allegheny graduates in various majors for 1998 and we want to be able to compare the numbers of graduates from the various majors. The easiest way to do that graphically is to create a bar chart. To do that, follow the following steps.

1. Under the Graphs menu 2. select legacy Dialogs 3. select Bar

When the Bar Charts panel comes up, you will see that you have several selections of the kind of bar chart you want. (We have selected Simple.) You need to select the Values of Individual Cases option.

12

In this case, we want the bars to represent the number of graduates for each major, so we highlight that variable in the list on the left (by clicking on it), and then we click on the small arrow to move that variable into the Bars Represent box. Under Category Labels we are given two choices, Case Number or Variable; we want to see the major names under each bar (instead of the case numbers, which would simply be 1,2,3,4, etc.), so we click the button for Variable, highlight the "major" variable on the left and then click on the arrow to move it into the Category Labels box. The Define Simple Bar dialogue box should now look like this:

13

We have now finished defining our bar chart. Click OK on the upper right corner of the panel, and in a few moments, your finished graph should pop up on to your screen!

There are a few things you should know about the SPSS output file that you have created. The information comes in a new, seperate Output window, and it has two parts. On the right is the output from whatever procedures you have just completed. On the left is the Outline – an outline representation of all the recent SPSS procedures that you have run. You can explore this panel as you want, but the main thing to remember is that it can be a

14

good way of helping to keep track of the procedures that you have run. You can also edit, rearrange, and delete items in your output file using this feature.

Histograms In situations where the variable of concern is continuous, a histogram is more appropriate. Consider a situation where we can to examine the lengths of the reigns of British rulers since 1066. Here is the data:

Reigns of British Rulers since 1066

Ruler Reign (in years) Ruler Reign (in years)

William I 21 Edward VI 6

William II 13 Mary 1 5

Henry I 35 Elizabeth I 44

Stephen 19 James I 22

Henry II 35 Charles I 24

Richard I 10 Charles II 25

John 17 James II 3

Henry III 56 William III 13

Edward I 35 Mary II 6

Edward II 20 Anne 12

Edward III 50 George I 13

Richard II 22 George II 33

Henry IV 13 George III 59

Henry V 9 George IV 10

Henry VI 39 William IV 7

Edward IV 22 Victoria 63

15

Edward V 0 Edward VII 9

Richard III 2 George V 25

Henry VII 24 Edward VIII 1

Henry VIII 38 George VI 15

Rather than simply presenting a figure where each king or queen is represented by a bar, we want a histograms that summarizes the data by grouping the information together into categories. In this case, the categories will be ranges of lengths of reigns.

To begin, open a new Data Editor in SPSS. We are going to enter in all of the lengths of the reigns. However, since we are not going to use the individual kings’ and queens’ names in this analysis (since we are going to group the data together into categories), we only need one column, which you should name reign and label as "Reigns of British Rulers." Define that variable (it is numerical, and we don’t need any decimal places), and enter the data into the data editor.

Once you have entered the data, we need to tell SPSS to create a histogram that displays it. There are two ways to do this, and we will try both.

Legacy Dialog: First, down towards the bottom of the Graphs menu you will find an item called Histogram … Click on this item. In the left portion of the panel you should see the description of the "reign" variable. Simply highlight that variable label and then click on the arrow to insert it under Variable. Then click OK, and in a few moments, you should have a histogram.

16

Changing the Look of Histograms One thing that you may have noticed is that SPSS decided how many intervals to create for your histogram and what the width of those intervals should be. However, you can change that if you want, and changing the interval width can change the look of the histogram significantly. To change the interval width we will use a slightly different feature of SPSS, which is called "Interactive Graphs". This part of SPSS will do much the same thing as the histogram program that we used above; the only difference is that with an interactive graph, we can control more of how the histogram will eventually look.

Select Graphs, then Interactive, and Histogram to get the panel seen below.

17

This panel works on the principle of "drag and drop"; instead of typing in the variable names that you want, you can simply highlight the one you want and drag it to its proper place. For example, you see in the panel two arrows, representing the vertical and horizontal axes of the histogram. The vertical axis is already labeled "count," since it will simply show how many of the reigns fall into each interval. Now highlight the variable "reign" and drag it into the open space on the horizontal axis, since that is where we want it. To customize the look of the histogram, we now simply click on the tab labeled Histogram. There are all kinds of things we can do at this point, such as superimposing a normal curve over the histogram, or giving it a 3-D look; you can experiment with those when you want. For now, the most important feature is the Interval Size box. Here you can either ask SPSS to set the interval size, or you can specify a particular number of intervals or width of intervals. To see how different the histogram might look with a different number of intervals, count how many intervals were in your original histogram, and use half that number here. Then click on the Titles tab, and type in a title for your histogram in the top space. Then click OK to create the histogram. Now choose a different interval width, and create a histogram.

Visual Association The relationships between a number of variables are often best examined by looking at some type of scatter plot of pairs of variables. By looking at these plots not only can relationships be oberved but outliers, problems of limited range, etc. can become obvious. To prepare scatter plots, simply enter the data into SPSS, one variable per column of data. To make the output more readable, make sure that you have provided labels for the variables. Then pull down the Graphs menu and choose Interactive > Scatter. Put the one variable on the Y (vertical) axis and another variable on the X-axis.

18

The Interactive graphs make use a drag-and-drop technique. Highlight the variable you want to be on the Y-axis and drag it to the vertical box. So the same for the X axis (drag the name of the variable to the horizontal box. The Create Scatterplot box should look something like this.

Unless you are doing a plot with more than two variables you will not need to make use of the Legend Variables options.

Bivariate Correlation In addition to examining our data using scatter plots, we will often want to examine correlation coefficients among the variables. Once you have entered your data (one variable per column), simply click on the Statistics menu, then go to Correlate, and choose Bivariate (because we have two variables). When you get to the Bivariate Correlations dialogue box, simply insert both variables into the Variables box, check the Pearson box (if your data is interval/ratio, otherwise choose Kendall's tau or Spearman's correlation), and click OK.

19

Your output will provide you with correlation coefficients between each pair of variables you selected (three in the above example).

In addition, a measure of significance of each correlation (is the correlation significantly different from zero) is included. Note that the correlation between a particular set of variables is given twice (Miles per Gallon vs. Time to Accelerate as well as Time to Accelerate vs. Miles per Gallon). Correlations between a variable and itself are shown as being 1.000.

Analysing Data

t-tests t-tests for Between Subjects Designs

Between-subjects t-test is used to compare the difference between the means of two separate samples that represent two separate treatments or two separate populations. A

20

significant difference indicates that there appears to be a consistent, systematic difference between the two treatments and that the obtained mean ins very unlikely ( p < .05) to have occurred by chance alone. The significance is determined by the p value that is reported as part of the computer output.

Example: The following is the mean response time of eight participants who pressed a button when a light appeared. One group carried out this task in a dark room, while the other group did the same task in bright surroundings.

Dark Room Bright Room 200 290 210 320 280 360 270 330

Data Entry

1. The scores are entered in the stacked format – all scores from both samples are entered as a single variable e.g. RT (in a single column in the Data View).

2. Values are then entered in the second column to designate the sample or treatment corresponding to each of the scores. The advantage of setting up the variables before entering the data is that the labels for the two samples will be available in a drop-down menu in the Data View once you have set them up in the Variable View. For example, use the Variable Label panel to define 1 to mean Dark Room and 2 to mean Bright room.

21

Data Analysis

1. Click Analyze on the tool bar. 2. Select compare Means 3. Click on Independent-Samples t-test. 4. Highlight the column label for the set of scores (e.g. RT) in the left box. 5. Click the arrow to move the column label into the Test Variable box. 6. Highlight the column label containing the sample numbers in the left box. 7. Click the arrow to move the column label into the Group Variable box. 8. Click on Define Groups. 9. Assuming that you used a 1 to identify the Dark Room sample and a 2 to identify

the Bright Room sample, enter the values 1 and 2 in to the appropriate group boxes.

10. Click continue 11. Click OK

SPSS will produce a summary table showing the number of scores, the mean, the standard deviations and the standard error for each of the two samples. Homogeneity of variance is an assumption for the t test and requires that the two populations from which the samples were obtained have equal variances. This test should not be significant (you do not want the two variances to be different), so you want the reported Sig. value to be greater than .05.

Next the results of the hypothesis testing are presented using two different assumptions; we will focus on the top row where equal variances are assumed. The test results include the calculated t value, the degrees of freedom, the level of significance (probability of a Type I error) and the size of the mean difference. Finally, the output includes a report of the standard error of the mean difference, Finally the output includes a report of the standard error of the mean difference and a 95% Confidence Interval that provides a range of values estimating how much difference exists between the two treatment conditions.

22

Group Statistics

4 240.00 40.825 20.4124 325.00 28.868 14.434

LightingDark RoomBright Room

RTN Mean Std. Deviation

Std. ErrorMean

Independent Samples Test

2.700 .151 -3.400 6 .014 -85.000 25.000 -146.173 -23.827

-3.400 5.400 .017 -85.000 25.000 -147.859 -22.141

Equal variancesassumedEqual variancesnot assumed

RTF Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

In this example, responses to the onset of a target are reliably faster in the dark (M = 240 msec, SD = 40.0 msec) compared to the bright room (M = 325 msec, SD = 28.9 msec), t(6) = -3.400 p < .05

Repeated-measures t-test

Data Entry

1. Define the two variables as before in the variable viewer. 2. Enter the data into two columns in the data View, with the score for each

participant in the first condition in the first column and the score for the second condition in the second column.

Data Analysis

1. Click Analyze on the tool bar. 2. Select Compare Means 3. Click on Paired-Samples T Test. 4. Highlight both of the labels for the two data columns in the left box (clicjk one on

each one) 5. Click on the arrow to move two labels into the paired variable box. 6. Click OK

SPSS will produce a summary table showing descriptive statistics for each of the two sets of scores, and a table showing the correlation between the first and second score. Finally, SPSS conducts the t test for the difference scores. The output shows the mean difference, the standard deviation, and the standard error for the difference scores as well as the t-value, the value for DF and the level of significance (p value). The output includes a 95% Confidence Interval that provides a range of values estimating how3 much difference exists between the two treatment conditions.

23

Single-Factor Independent-Measures Analysis of Variance (ANOVA)

The single-factor, independent-measures ANOVA is used to compare the means from a between-subjects research study using two or more separate samples to compare two or more separate treatment conditions or populations. A significant difference indictes that there appears to be a consistent, systematic difference between at least two of the treatments and the obtained mean differences are very unlikely (p < .05) to have occurred.

Example:

1st Treatment 2nd Treatment 3rd Treatment 0 6 6 4 8 5 0 5 9 1 4 4 0 2 6

Data entry

1. The scores are entered in a stacked format in the data matrix, with all the scores from all of the different treatments are entered in a single column.

2. The second column is set up to denote the different conditions/treatments.

24

Data Analysis

1. Click Analyze on the tool bar. 2. Select Compare Means 3. Click on One-way ANOVA 4. Highlight the column for the scores in the left box. 5. Click the arrow to move the column label into the factor box. 6. Click on the Options box and select Descriptives if you want descriptive

statistics for each sample and then click Continue. 7. Click OK.

If you selected the Descriptives Option, SPSS will produce a table showing descriptive statistics for each of the samples along with a summary table showing the results from the analysis of variance.

Example: For the example data, the first treatment has M = 1.00 (SD = 1.73), the second treatment has M = 5.00 (SD = 2.24) and the third treatment has M = 6.00 (SD = 1.87). A independent-measure ANOVA indicates that there is a reliable effect of treatment, F(2,12) = 9.13, p < .01.

Descriptives

obs_data

5 1.00 1.732 .775 -1.15 3.15 0 45 5.00 2.236 1.000 2.22 7.78 2 85 6.00 1.871 .837 3.68 8.32 4 9

15 4.00 2.878 .743 2.41 5.59 0 9

first treatmentsecond treatmentthrid treatmentTotal

N Mean Std. Deviation Std. Error Lower Bound Upper Bound

95% Confidence Interval forMean

Minimum Maximum

25

ANOVA

obs_data

70.000 2 35.000 9.130 .00446.000 12 3.833

116.000 14

Between GroupsWithin GroupsTotal

Sum ofSquares df Mean Square F Sig.

Single-Factor, Repeated-Measures ANOVA

This test is used to compare the means from a within-subjects research study which uses a single sample of participants to compare the DV in two or more treatment conditions. Each individual is measured in each of the treatment conditions. A significant difference indicates that there appears to be a consistent, systematic difference between at least two of the treatments and that the obtained mean differences are very unlikely (p < .05) to have occurred by chance alone.

Participant ID 1st Treatment 2nd Treatment 3rd Treatment 1 0 6 6 2 4 8 5 3 0 5 9 4 1 4 4 5 0 2 6

Data Entry

1. Set up a variable for each of the treatment conditions in the Variable View. 2. Enter the scores for each individual, so that the scores for an individual

treatment condition appear in the same column.

26

Data Analysis

1. Click Analyze on the tool bar. 2. Select the General Linear Model 3. Click on Repeated Measures. 4. SPSS will present a box titled Repeated-Measures Define Factors. Within the

box, the Within-Subjects Factor Name should already contain Factor 1. Change factor 1 to something meaningful (it is surprisingly difficult to remember what factor 1 refers to within a few weeks).

5. Enter the Number of Levels (number of different treatment conditions) in the next box.

6. Click on Add. 7. Click Define. 8. One by one, move the column labels for your treatment conditions into the

Within-Subjects Variables box. (Highlight the column label on the left and click the arrow to move it into the box.)

27

9. Click on the Options and select Descriptive Statistics if you want descriptive statistics for a treatment, select the factor name in the left (in the Factor(s)) box and move it to the right (Display Means for: ) box. Then click Continue.

10. Click OK.

28

If you selected the Descriptive Options, SPSS will produce a table showing the mean and standard deviation for each treatment condition.

Descriptive Statistics

1.00 1.732 55.00 2.236 56.00 1.871 5

first_treatsecond_treatthird_treat

Mean Std. Deviation N

The rest of the SPSS output is relatively complex and includes a lot of statistical information that you are not expected to understand. However, if you focus on the table showing the Test of Within-Subjects Effects it contains all the information that you need to hypothesis-test. The box is divided into two parts, with the upper level considering the Factor (in this example called treatment) and the lower the error associated with the factor (Error(treatment) in this example). The dfs, F-value, mean sum of squares for the numerator and denominator of the F-value, and p-value is reported across the two portions. In this example, there is a reliable effect of the treatment, F(2,8) = 10.000, p < .01

Tests of Within-Subjects Effects

Measure: MEASURE_1

70.000 2 35.000 10.000 .00770.000 1.413 49.524 10.000 .01770.000 1.959 35.731 10.000 .00770.000 1.000 70.000 10.000 .03428.000 8 3.50028.000 5.654 4.95228.000 7.836 3.57328.000 4.000 7.000

Sphericity AssumedGreenhouse-GeisserHuynh-FeldtLower-boundSphericity AssumedGreenhouse-GeisserHuynh-FeldtLower-bound

Sourcetreatment

Error(treatment)

Type III Sumof Squares df Mean Square F Sig.

Two-Factor Independent-Measures ANOVA

The Two-Factor Independent-Measures ANOVA is used to compare the means from a between-subjects research study using two independent variables (or quasi-independent variables). The structure of a two-factor study can be represented as a matrix with one IV defining the rows and the other IV defining the columns. Each cell in the matrix corresponds to a separate treatment and there is a separate sample for each cell.

29

Factor B B1 B2 B3

3 1 5 1 4 5 1 8 9 6 6 2 4 6 4 M = 3 M = 5 M = 5

A1

SD = 2.12 SD = 2.64 SD = 2.54 0 3 0 2 8 0 0 3 0 0 3 5 3 3 0 M = 1 M = 4 M = 1

Factor A

A2

SD = 1.41 SD = 2.24 SD = 2.24

The two-factor ANOVA consists of three separate tests for mean differences: (1) The main effect which consists of the mean differences between the rows of the matrix; (2) the main effect which consists of the differences between the columns of the matrix; (3) the interaction consists of any additional mean differences that are not accounted for by the two main effects. For each of the three tests, a significant difference indicates that there appears to be a consistent, systematic difference between at least two of the treatments, and that the observed mean differences are very unlikely (p < .05) to have occurred by chance.

Data Entry

The scores are entered into SPSS in a stacked format, which means that all the scores from all the different treatment conditions are entered into a single column.

1. Set up the first variable in the Variable View to hold all the observed data. 2. Set up the second variable to designate the levels of factor A for each score. If

Factor A defines the rows of a data matrix, enter the equivalent of 1 (as defined by the Variable Label) if the data occurs in the first row, the equivalent of 2 if it appears in the second etc.

3. Set up the third column to designate levels of Factor B. If Factor B defines the columns of a data matrix, enter the equivalent of 1 (as defined by the Variable Label) if the data occurs in the first column, the equivalent of 2 if it appears in the second etc.

30

Data Analysis

1. Click Analyze on the tool bar. 2. Select General Linear Model. 3. Click on Univariant. 4. Highlight the column for the scores in the left box. 5. Click the arrow to move the column label into the Dependent Variable box. 6. One by one, highlight the column labels for the two factors and click the arrow to

move the labels into the Fixed Factors box. 7. Click on Options and select Descriptives if you want descriptive statistics for

each sample and then click continue. 8. Click OK.

31

If you selected the Descriptive Option, SPSS will produce a table showing the means and standard deviations for each treatment condition.

Descriptive Statistics

Dependent Variable: obs_data

3.00 2.121 55.00 2.646 55.00 2.550 54.33 2.469 151.00 1.414 54.00 2.236 51.00 2.236 52.00 2.360 152.00 2.000 104.50 2.369 103.00 3.091 103.17 2.653 30

factor_Bfirst_columnsecond_colthird_colTotalfirst_columnsecond_colthird_colTotalfirst_columnsecond_colthird_colTotal

factor_Afirst_row

second_row

Total

Mean Std. Deviation N

The results of the ANOVA are shown in a summary table of Test of Between-Subjects Effects in which each effect is identified by its column label. It also includes some extra information including Correct Model and Intercept that are beyond the scope of this workshop.

32

Tests of Between-Subjects Effects

Dependent Variable: obs_data

84.167a 5 16.833 3.367 .019300.833 1 300.833 60.167 .000

40.833 1 40.833 8.167 .00931.667 2 15.833 3.167 .06011.667 2 5.833 1.167 .328

120.000 24 5.000505.000 30204.167 29

SourceCorrected ModelInterceptfactor_Afactor_Bfactor_A * factor_BErrorTotalCorrected Total

Type III Sumof Squares df Mean Square F Sig.

R Squared = .412 (Adjusted R Squared = .290)a.

The data for the example would be written up as follows. There was a reliable main effect of Factor A, F(1,24) = 8.167, p < .01. The effect of Factor B did not reach statistical significance, F(2,24) = 3.167, p = .06. There was no interaction between Factor A and factor B, F(2,24) = 1,167, p = .328. Extract this information from the table above and ensure that you understand the relationship between the statistical output and the text.

The Pearson Correlation

The Pearson correlation measures the strength and direction of the relationship between two variables. The data are numerical scores, with two separate scores representing two different variables for each individual. The two scores are identified as X and Y. A positive relationship between X and Y shows that these two variables tend to vary in the same direction. A negative relationship indicates that they vary in opposite directions. A correlation of 1.00 (either positive or negative) indicates a perfect relationship between X and Y. A correlation of 0.00 indicates no relationship. It is possible to evaluate the the statistical significance of a correlation by determining the probability that the sample correlation was obtained by chance, from a population in which there is no relationship (zero correlation).

Data Entry

1. Set up two variables in the variable View 2. The data is entered in two columns, with one score in each column for each

individual.

33

3.

Data Analysis

1. Click Analyze in the tool bar. 2. Select Correlate. 3. Select Bivariate. 4. One by one, move the labels for the two data columns into the Variables box.

(Highlight each label and click the arrow to move it into the box.) 5. The Pearson box should be checked, but you can click on the Spearman box if

you want to compute a Spearman correlation (SPSS will convert the scores to ranks).

6. Click OK

Correlations

1 .875.052

5 5.875 1.052

5 5

Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N

var_X

var_Y

var_X var_Y

SPSS will produce a correlation matrix, showing all the possible correlations. You want the correlation between X and Y, which is located in the top right of the panel. The output includes the significance level (p value) for the correlation.

The Chi-square Test for Independence

When our data consists of only the frequencies of various events, the most commonly used statistics is the chi square (X2). Below are the procedures for doing both one-way

34

and two-way chi square analyses. Other non-parametric statistics are appropriate when the variable being measured is at an ordinal rather than at an interval or ratio level of measurement.

One-Way Chi Square

I just bought a bag of M&Ms containing 56 candies. The maker of the candy claims to put an equal number of each color of candy in every bag. To test this claim we can use the following data:

Colour Number Blue 3 Brown 17 Green 6 Yellow 10 Orange 6 Red 14

We enter the data using the following format. Note that the colors are given numbers rather than their original names. Use the Define Labels option when you create the variable color to provide value labels for alternative, e.g., 1 = Blue, 2 = Brown, etc.

Unlike previous procedures, the Chi-Square requires that variables be weighted. From the Data menu, select Weight Cases. This window is shown below.

35

Select Weight Cases by and highlight the variable number and click the arrow button to the left of the "Frequency Variable" window. The text at the bottom of the window serves as a check for you to see if you weight the variables in the desired way.

Select Analyze/Nonparametric Tests/Chi-Square to see this window:

Highlight the categorical variable and click the arrow to the left of the "Test Variable List" window and then select Options. In the options window, select Descriptive, Exclude cases test-by-test, and click the Continue button.

36

You should be back to the original window, as shown. Click OK to run the analyses.

The Chi-Square output provides us with descriptive statistics, frequencies, and finally the test statistic.

37

Two-Way Chi Square

The two-way chi-square statistic is used in those situations where two variables are involved. For example, research suggests that males prefer more sporty car colors like red or black, while women prefer less flashy car colors such as white or blue. Test the phenomena that there is a sex difference in car color preference in the York University faculty.

Gender Males Females

Red 11 9 Blue 8 5 Black 7 8

Car Colour While 9 10

However, unlike a one-way chi-square, individual data must be entered, rather than the summary data from the above table.

Data entry

Create two variables, one called colour and the other called gender. The first will have values from 1 to 4 (1 = red; 2 = blue; 3 = black; and 4 = white) while the second will have values 1 (for male) and 2 (for female). Note below that since there are 11 males who have red cars, there are 11 separate entries 11 in the table.

38

Data analysis

1. Click Analyze in the tool bar. 2. Select Descriptive 3. Select Crosstable. 4. Move the variable color to the Row(s) area and the variable sex to the Column(s)

section as indicate below.

39

5. Press the Statistics option and choose Chi-square. You may also want some additional statistics such as the Contingency coefficient (a measure of relationship in nominal data).

6. Click Continue to return to the main panel. 7. Click OK.

The SPSS output will include a cross-tabulation table showing the matrix of observed frequencies, and a table of Chi-Square tests in which you should focus on the Pearson Chi-Square. The table includes the calculated chi-square value, the degrees of freedom and the level of significance (p-value).

7. Press OK will produce the results

40

A chi-square test of independence was performed to examine the relation between gender and car colour preference. The relation between these variables was not significant, X2 (3, N = 67) = .879, p = .83.

Using SPSS - York University SPSS.pdf · Using SPSS Starting SPSS 1. From the Windows start menu...

Documents

Transcript of Using SPSS - York University SPSS.pdf · Using SPSS Starting SPSS 1. From the Windows start menu...