Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

download Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

of 38

Transcript of Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    1/38

    pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    Chi Square Test ofAssociation

    Click above to start an Interactive Visual Presentation(Plugin Required)

    Click hereto go to our plugin download and plugin tutorial page

    Trouble Printing? Download Acrobat File

    Copyright 2000, Tom Malloy

    http://www.psych.utah.edu/stat/introstats/web-text/chi-square-Association/chi_assoc.pdfhttp://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdfhttp://presentation%28%29/http://www.psych.utah.edu/stat/introstats/resource/setupcomputer.htmlhttp://www.psych.utah.edu/stat/introstats/web-text/chi-square-Association/chi_assoc.pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    2/38

    pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    3/38

    pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    This map allows you to--

    1. Jump directly to a topic which interests you.2. Coordinate the dynamic visual Authorware presentations with the corresponding

    text available on this web page.

    1. To find a topic which interests you:Look at the map of menus above. Choose a menu

    that interests you. Notice that the menu buttons have topics printed on them. Click on any

    button (topic) on the menu; you will jump directly to the text that corresponds to the topic

    printed on the button.

    2. To coordinate this web page with Authorware presentations:The correspondingAuthorware program should already be open. Go to the menu of your choice in the

    Authorware program and click any button which interests you. Then on the topic locator

    map above click on the same button on the same menu; you will jump to the text thatcorresponds to the Authorware presentation.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    4/38

    pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    End of Topic Locator Map

    Top Test of Association

    There are two different Chi Square tools. We discussed the Goodness of Fit Chi

    Square in the previous lecture. Now we will discuss the Chi Square Test of

    Association.

    The Chi Square Test of Association was derived mathematically by Karl Pearson

    early in the century, and is often known as Pearson's Chi Square Test of

    Association. Pearson showed that the formulas for the Goodness of Fit statistic

    we learned last lecture and the Test of Association statistic we will learn in this

    lecture both have Chi Square as their sampling distribution. In this class we don't

    have sufficient mathematical prerequisites to follow these proofs, but suffice it to

    say they were great insights at their time, insights which have allowed substantial

    scientific sophistication to be brought to bear on frequency-categorical data.

    Quite often we have two categorical variables such as gender (male, female) and

    job status (managers, clerks). We wonder whether there is an association (or

    correlation) between the two categorical variables; that is, is there a relationship

    between a person's gender and their job status? As another example, we might

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    5/38

    pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    interested in the potential association of Political Party (Republican, Democratic,

    Independent, Other) and Environmental Attitudes (Preservation, Development,

    Other). The Chi Square Test of Association allows us to evaluate associations

    (i.e., correlations) between categorical variables such as these.

    We will begin our discussion of the Chi Square Test of Association with the threecriteria that are necessary for its appropriate use.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    6/38

    pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    Go To Top

    THREE CRITERIA

    1) Recall that in the Chi Square Goodness of Fit we needed a partition (a system of

    mutually exclusive and exhaustive categories). Now, in the Chi Square Test of

    Association, we need TWO partitions. 2) We also need some number of independent

    observations. 3) Finally, we need frequency data.

    CRITERION #1: 2 Partitions.In probability theory a partition is a mutuallyexclusive and an exhaustive set of categories. As you know, categories are pigeon holes,

    or places where we can put things conceptually. To be a partition, a set of categories have

    to be both mutually exclusive and exhaustive. Mutually exclusive means that observation

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    7/38

    pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    can go into one and only one category. No idea (nor thing, nor observation) may go in

    more than one category.

    Exhaustive means that the set of categories

    cover every possible case. It means that every

    object we observe can be put into one of thecategories. More detail on these terms is

    available in the Chi Square Goodness of Fit

    lecture.

    Often when you fill out a survey you'll find that

    there is an extra category called "Other.""Other" is a great category because it ensures

    that whatever set of choices is offered must be exhaustive, because if you don't fit in any

    of the existing categories then you're in "Other".

    Examples of Partitions

    As we discussed in the Goodness of Fit lecture,

    gender is a partition. It is a set of mutually

    exclusive and exhaustive categories. There are

    only two categories in gender--male and female.

    You cannot be both at the same time so so male

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    8/38

    pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    and female are mutually exclusive. Moreover,

    for human beings, gender is exhaustive

    because everybody goes in one of the two categories. Male and female exhaust the

    possibilities.

    Let's look at another example of a partition. Let's say at a particular corporation weclassify people according to their job status. At this particular corporation the job status

    is either managerial or clerical and there are no other categories. (This unrealistic, but it

    keeps the calculations in this example simple.) These categories are mutually exclusive.

    Any employee is either going to be a manager or they're going to be a clerk; they can't

    be both. And, since we've said that those are the only job types, those two job types

    exhaust all the possibilities.

    Example using 2 Partitions

    Now let's create a running example for this

    lecture. We will classify every employee of a

    business both by gender and by job status.

    That is, we will apply two partitions in

    categorizing people. As you can see from the

    graphic, that results in four possibilities: Male

    clerk, Female clerk, Male manager, and Female

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    9/38

    df d mi b PRO i Are you a developer? Try out the HTML to PDF API

    manager.

    Typically, in this kind of chi square you will make some kind of table. For this small

    example we have a two by two (2 x 2) table; but the table could be seven by four table (7

    x 4) or whatever, depending on how many pigeon holes are in each partition.

    CATEGORICAL VARIABLES. Another way to speak about this example is that we have

    two categorical variables: Gender and Job Status. When you classify by two categorical

    variables, thus creating all the possible combinations of the two (clerical male, clerical

    female, managerial male, managerial female) it is called "crossing" the variables.

    To do a chi squares test of association you must classify each of your observations not

    with just one but with two partitions.

    In this simple example, male and female is one dimension and job status is the other

    dimension. You'll notice that our requirement is that every person we observe in our

    study can be classified by each of the partitions. So everybody will go into one of these

    "cells" created by crossing these two partitions with each other.

    CRITERION #2. N

    Independent

    Observations

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    10/38

    df di b PRO i Are you a developer? Try out the HTML to PDF API

    The second criterion for the Chi Square Test

    of Association is that we have some number

    of observations, each independent of the

    others. In our running example, say that we

    classify 200 people who work at a corporation by gender and job status. We will assumethat these people are independent of each other and that they are just picked out at

    random.

    In the Goodness of Fit lecture we discussed the case of a rat in a T-maze. In a T-maze a

    rat has to right or left so you have a partition for each trial. However, if you observe the

    same rat over many trials, these observations would not be independent because the

    particular rat may have a turning bias.

    The important point is that you need to think about the observations that you are

    making and decide whether they're independent or not. For our example we are

    assuming that for each person who works at this particular company, their gender is

    independent of the next person. That fact that one person was born a man, doesn't

    have anything to do with someone else who works in a different part of the corporation

    being a woman. Their births are independent of each other and have no relationship to

    each other.The same must be true for job status.

    CRITERION #3: Frequency Data. The third criterion is frequency data.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    11/38

    df di b PRO i A d l ? T t th HTML t PDF API

    Frequency data means that we don't measure anything when we observe, all we do is put

    the observations in categories and count the number of observation that fall into each

    category.

    Each time someone falls into a category you can make a little hash mark in one of the cells

    of the table. We are just counting, not measuring.

    Think about how simply counting is different from the measuring we did for the t tests. Our

    dependent variable in t-tests has been things like someone's height or someone's weight,

    or some number of puzzles solved correctly. In those examples when we observe a person

    our dependent variable generated a measurement number - how many inches tall they are;

    how many pounds they weigh; how many puzzles they solve; and we actually assign thatnumber to that person. That kind of data is called measurement data; and it requires

    statistics like t-tests, correlation coefficients, and so on.

    For Chi Square, we do not measure the participants; we just count their frequency in

    various categories.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    12/38df di b PRO i A d l ? T t th HTML t PDF API

    Go To Top

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    13/38df di b PRO i

    Are you a developer? Try out the HTML to PDF API

    Go To Top

    Scientific Hypotheses. Let's say that the scientific hypothesis is that there isgender bias in hiring and promotion in this corporation. That means job status will depend

    upon gender. More specifically it is more likely that the men will be managers and the

    women will be clerks, which would be a classic kind of gender bias.

    In contrast, the skeptic (or maybe corporation's lawyer or public relations representative)

    will say that hiring and promotion are completely fair. They will note that there will be

    different proportions of men and women in different categories but this is simply due to

    chance and there is nothing systematic about how different genders fall into different job

    status categories.

    So we have our scientific hypothesis of gender bias, and our skeptical hypothesis which

    is saying hiring practices are fair and if the data shows differently then it is only due to

    chance.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    14/38df di b PRO iAre you a developer? Try out the HTML to PDF API

    What is the Question?The essential question being asked by the Chi Square Test ofAssociation is: "Is one way of categorizing things related to the other way of categorizing things, or are

    they independent?" Another way of putting it is, "Are the two partitions (categorical variables) correlated

    (associated) with each other or are they unrelated (independent)?"

    In our example we are trying to determine whether gender is related to job status. Is there a correlations

    (association) between a person's gender and her/his job status?

    To answer this question we collect some data. We go to the business and collect the relevant information

    on 200 employees. We put the information into a table like the one we've been showing which crosses

    job status with gender.

    ASSOCIATION MEANS PREDICTABILITY. So let's repeat our essential question in yet another way. Are

    the two classifications correlated or associated or are they independent? That is, can you predict one

    f h h ? M ifi ll di ' j b f h i d If h i

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    15/38df di b PRO iAre you a developer? Try out the HTML to PDF API

    from the other? More specifically, can you predict someone's job status from their gender. If there is an

    association between gender and status, it will mean that you can make a good guess about their job

    status by knowing their gender.

    Let's examine how the frequency data in the table would look in several cases from complete

    dependence to complete independence. We examine 200 people from the business. I've made theexample so that of these 200, 110 are women and 90 are men.

    Complete Dependence

    We'll start with the extreme case--complete

    dependence between gender and job status. If

    your study showed the observed frequencies in

    the table on the graphic (not a single male clerk

    and not a single female manager), that data would

    have to come from watching old sitcoms from the

    1950's. In any event, if your data looked like this,

    where all 90 of the men are managers and all 110 of the women are clerks, the data wouldindicate complete dependence or complete predictability. In this example, the data would

    demonstrate a case of extreme gender bias. The association between gender and job

    status is as high as it can be.

    If I told you that someone is a clerk and then asked you to guess what their gender is, you

    would be able to predict their gender perfectly If a person is a clerk then she must be

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    16/38

    Are you a developer? Try out the HTML to PDF API

    would be able to predict their gender perfectly. If a person is a clerk, then she must be

    female. If I said someone is a man, you could guess his job status perfectly; he must be a

    manager. This is complete dependence of gender and status.

    Strong Dependence

    Here is an example of observed frequency data

    indicating partial but strong dependence. This

    example isn't quite so clear cut, there is still a

    great deal of predictability between the two

    categorical variables.

    In the current graphic, out of 200 people, there

    were 20 male clerks, 70 male managers; there

    were 100 clerical females, and 10 females managers. In other words, 22% of the men are

    clerks whereas 91% of the women are clerks.

    Now you can't make a perfect prediction from gender to job status, but you can make apretty good guess. If I say that somebody is a manager and asked you to guess whether

    they're a male or a female, you could make a pretty good guess. You'd guess a manager

    would be male. Now you wouldn't be right all of the time, but you'd be right most of the

    time.

    In this example there's strong predictability from gender to job status and vice versa In

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    17/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    In this example there s strong predictability from gender to job status and vice versa. In

    other words there is a strong association between gender and job status.

    Independence

    What would the data look like if the partitions

    were independent or had no relationship

    whatsoever? In such a case, you would not

    be able to predict job status from gender any

    better than chance.

    I've changed the example such that half theemployees are managers and half are clerks.

    In line with this, you'll notice that half of the women are clerical and half are managers.

    The same goes for the men, half of them are clerical and half of them are managers. In

    other words, 50% of all employees are managers, AND 50% of the men are managers

    and 50% of the women are managers.

    I've made the example such that men and women are not equal in number. Out of every

    100 people there is 45 men and 55 women. So 45% of the employees are man and 55%

    of the employees are women. If we look at the 100 clerks, we find that 45% of them are

    men and 55% of them are women. That is the same as the percentage of men and

    women in the company. There appears to be no bias whatsoever, not even chance

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    18/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    variations.

    NO ASSOCIATION. In this case then, we have independence of the two partitions or the

    two category systems. The category system called gender is independent of the

    category system called job status. Or in the language used in the Chi Square, there is

    no association between gender and job status.

    So that's the question is for this kind of test statistic. It's a different kind of question than the one you

    would have for a t test which uses measurement data.

    The Research Data

    For our running example, let's suppose that our

    research yields the data shown in the current

    graphic.

    N = 200 total people. Males = 90 out of 200 or

    45%. Females = 110 out 200 or 55%. Clerks =

    120 out of 200 or 60%. Managers = 80 out of 200

    or 40%.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    19/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    DATA PATTERNIn this particular case, the data pattern fits the scientific hypothesis. You can argue it anynumber of ways. For instance, 120 out of 200, or 60% of all employees are clerical, but 90

    out of 110, or 82% of females are clerical, and 30 out of 90 or 30% of the males are clerical.

    These are the kinds of arguments that someone who thought there was gender biaswould make. They'd say "Look, 60% of all employees are clerical, but 82% of the females

    and only 33% of the males are clerical. There appears to be gender bias."

    CHANCE

    The skeptic would say, "Well I think that data

    pattern is just happening by chance." The PCH

    of chance says that these data could have

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    20/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    of chance says that these data could have

    come about by chance variations in hiring and

    promotion.

    EVALUATE THE PCH OFCHANCE

    To deal with the plausible competing hypothesis

    (PCH) of chance, we're going to do a Chi Square

    Test of Association.

    Go To Top

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    21/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    Go To Top

    Let's calculate the Expected Frequencies

    Review

    The slide to the left and the two slides below

    summarize the example. Review these three

    slides and then we'll start calculating the expected

    frequencies.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    22/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    FORMULA FOR EXPECTEDFREQUENCIES

    This formula will make most sense after you

    have worked through the example.

    Calculating the Expected

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    23/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    Calculating the Expected

    Frequencies

    We already have the observed frequencies for

    each cell in our table. Now we will calculate an

    expected frequency for each cell. The symbol

    for expected frequency is fe. We will learn how

    to calculate the expected frequencies by going

    through all the cells in the example.

    NOTATION. We want to be able to note and communicate which of the four cells we are

    talking about at any given moment. The table is 2-dimensional, so we need two

    dimensions to describe a location in it. By convention, these two dimensions we will call

    "j" and "k." As the blue arrow in the graphic shows, the indexjruns down rows; it tells

    us which row we are in. And the index kruns across columns; it tells us which column

    we are in. [It's arbitrary which dimension we call j and which we call k. We just have to

    agree on which is which.] The general symbol for the expected frequency in a particular,

    unspecified cell is fe(jk). This can be read as the expected frequency for the cell where

    row j intersects with column k. [Generally, "jk" is a subscript but that is currently difficult

    to write subscripts in web html text. So I'll use a parenthesis around jk when I need to be

    clear. In obvious cases I'll just write the indices without the parentheses.]

    Our notation is such that we put row (j) first and column (k) last when indexing a cell in a

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    24/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    table. So fe(11) is the expected frequency for the first row in the first column. fe(12)

    indicates the expected frequency in row 1, column 2. Fe(21) indicates the second row in

    the first column; and Fe(22) is the cell that is in the second row of the second column.

    CALCULATING THE EXPECTED FREQUENCY FOR CELL 1-1. To repeat, the symbol,

    fe11, is the designation we give for the cell where j =1 and k =1. In this example cell 1-1 is

    the upper left hand cell (which is where we placed clerical male employees). The

    expected frequency for that cell is determined by the total number in that row (which is

    120) times the total number in that column (which is 90) divided by the total number of

    observations in the whole table (which is 200).

    This is somewhat confusing to describe in words. But it's really simple if you look at the

    examples in the graphics. All you need to do is find the row total, multiply it times the

    column total, and divide by the total total.

    EXPECTED FREQUENCY FOR CELL 1,1. So you calculate the expected frequency for

    the first cell as 120 times 90 over 200. fe11is equal to 54.

    Attempt to find the expected frequencies for the other cells on your own before you look

    at the results below.

    Ex ected Fre uencies for

    h O h C ll

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    25/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    the Other Cells

    Go ahead and find the expected frequency

    for cell 1-2 (row 1, column 2) or the female

    clerks cell.

    One convenient way to summarize the

    information you will need when we

    eventually get to the formula is to write the

    expected frequency in the cell with the

    observed frequency.

    Fe(1,2).The expected frequency for row1, column 2 is 120 times 110, over 200 which

    you can see equals 66.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    26/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    Fe(2,1). For row 2, column 1 the expected frequency works out to be 90 times 80 over

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    27/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    ( , ) , p q y200, or 36.

    Fe(2,2). Then the expected frequency for the cell in row two, column two is 110 times 80over 200, or 44.

    In the final graphic of the series, the table has observed frequencies, which are the black

    colored numbers, and expected frequencies which are shown in blue.

    The observed frequencies are the data. The expected frequencies is what the data should

    have come out to be if Gender were NOT associated with Job Status.

    Go To Top

    The Formula

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    28/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    Go To Top

    Let's look at the formula. It tells you to sum up

    the squared differences between the expected

    and the observed frequencies and divide each

    squared difference by the expected frequency.

    This is the same as the Goodness of Fit, exceptthat it is for a 2-dimensional case, so the

    formula uses a double summation notation.

    DEGREES OF FREEDOM. The degrees of

    freedom are the number of rows minus one

    times the number of columns minus one. Interms of notation, capital J represents the

    number of rows, and capital K represents the

    number of columns.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    29/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    Go To Top

    Calculations

    Here is what our 2 x 2 table would be like like

    given our data and the calculated expected

    frequencies.

    Next we determine the deviation between

    observed frequency and the expected

    frequency for each cell. We square the

    deviation for each cell. Finally we divide the

    squared deviation by the expected frequency

    for that cell. The four graphics below show all

    the calculations.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    30/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    31/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    IN WORDS. Get a value for each cell by determining (fo - Fe) squared over Fe In the

    example the values for the four cells are 10.67, 8.73, 16, and 13.09.

    Then sum up the values for all the cells to get the final chi square value. In the example chi

    square = 10.67 + 8.73 + 16 + 13.09 = 48.58.

    Degrees of Freedom

    The formula for degrees of freedom is (the

    number of rows minus one) times (the

    number of columns minus one). It can be

    symbolized by J - 1 times K minus 1 or (J -

    1)(K - 1). In this case, the degrees of freedom

    l 2 i 1 ti 2 i 1 j t 1

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    32/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    equal 2 minus 1 times 2 minus 1, or just 1.

    [Note: Capital J is used to indicate the

    number of rows. As we've already said, little j

    is the index for some particular row. The

    same is true of K and k.]

    Now we can go on to the topic of statistical

    conclusion validity.

    STATISTICAL HYPOTHESES

    The observed frequencies (fo's) are the data we

    collect. The expected frequencies are what the

    data should be if the two categorical variablesare independent (not associated).

    NULL HYPOTHESIS. The skeptic thinks thatthere is no association between categorical

    variables (in this case, between gender andstatus) so the observed frequencies shouldequal the expected frequencies other than

    chance differences. So the corresponding null hypothesis is that we expect the

    difference between observed and expected frequency in each cell to equal zero.

    ALTERNATIVE HYPOTHESIS. The scientist things that there IS an association between

    the two categorical variables. So the scientist thinks the data will differ from the expectedfrequencies. So the corresponding null hypothesis is that the difference between the

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    33/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    frequencies. So the corresponding null hypothesis is that the difference between the

    observed and expected frequencies will NOT be zero.

    Go To Top

    Statistical Conclusion

    Validity

    Here we show the sampling distribution of chi

    square again The number line along the bottom

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    34/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    square again. The number line along the bottom

    of the graph goes from zero to positive infinity

    for chi square. Chi square is a squared entity -

    everything in it is squared. Even if you get

    negative numbers from your cell calculations,they are going to be squared and made into

    positive numbers. You cannot get a chi square

    below zero. If you calculate a chi square below

    zero, you made a mistake.

    So the range of the Chi Square test statisticgoes from zero to positive infinity.

    What H0 predicts

    If you picture the null hypothesis in your mind,

    you'll remember that we expect the difference

    between observed expected frequencies to be

    zero. If H0 is actually true, then every term (for

    every cell) in the Chi Square formula would be

    zero. That is, there would be no differences

    between the observed and the expected

    frequencies in any cell Therefore each of those

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    35/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    frequencies in any cell. Therefore, each of those

    differences would be zero, zero squared is zero,

    and the whole Chi Square would be equal to

    zero. So H0 is predicting values of Chi Square

    near zero. So high values of Chi Square are notwhat H0 is predicting.

    The Rejection Region

    To find the critical value you need to know the

    degrees of freedom and your selected alpha

    level. Then you just look the critical value up in

    your table. The critical value of chi square, with

    one degree of freedom and alpha of .05, is 3.84.

    Chi Square tables are available on the course

    web site.

    You draw your "Reject H0" and "Do not reject

    H0" regions based on this critical value of 3.84.

    We found that our calculated chi square of

    48.58 falls in the rejection region therefore, we

    would reject H0

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    36/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    would reject H0.

    Why we Reject H0

    H0 is predicting that there will be no differencebetween expected and observed frequencies

    and so you should get a chi square in the

    neighborhood of zero if Ho is true. By chance

    alone, you may get a Chi Square value bigger

    than zero. However, the chance of getting a

    value of Chi Square beyond 3.84 by chance

    alone is very small.

    Therefore using the logic we have used before

    with other statistical tests, we'll reject H0

    because it's very improbable that you would get

    a Chi Square of this magnitude by chance

    alone.

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    37/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    Go To Top

    Sampling Distribution of

    Chi Square

    Let's review the sampling distribution of chi-

    square. This slide shows the overall 4 step

    process and is the same slide you saw for the

    Goodness of Fit Chi Square.

    The sampling distribution of the test statistic

    you just calculated is called the Chi Square

    probability distribution. This distribution starts

    at zero and goes to positive infinity. Notice also

    that it is not symmetrical. It's different than a

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf
  • 8/13/2019 Www Psych Utah Edu Stat Introstats Web Text Chi Square Assoc

    38/38

    pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

    that it is not symmetrical. It s different than a

    bell curve, it has a big lump down by zero

    where most of the probability is and it has only

    one tail going off toward positive infinity.

    Go To Top

    Copyright 1997, 2000 Tom Malloy

    http://pdfcrowd.com/http://pdfcrowd.com/redirect/?url=http%3a%2f%2fwww.psych.utah.edu%2fstat%2fintrostats%2fweb-text%2fchi-square-Association%2f&id=ma-140221135833-f95c0134http://pdfcrowd.com/customize/http://pdfcrowd.com/html-to-pdf-api/?ref=pdf