Statistics Project - A Matter of Change

download Statistics Project - A Matter of Change

of 32

Transcript of Statistics Project - A Matter of Change

  • 8/14/2019 Statistics Project - A Matter of Change

    1/32

    A Matter of Change

  • 8/14/2019 Statistics Project - A Matter of Change

    2/32

    A Matter of Change

    AAPP SSttaattiissttiiccss

    PPeerriioodd 33

    NNaaddiimm IImmaamm BBrriiaann SShhii

  • 8/14/2019 Statistics Project - A Matter of Change

    3/32

    1

    Table of Contents

    Preface .......................................................................................................................................................... 2

    Experiment: Part 1........................................................................................................................................ 3

    Randomization.......................................................................................................................................... 4

    Visual Representation .............................................................................................................................. 5

    More Questions ........................................................................................................................................ 6

    Experiment: Part 2........................................................................................................................................ 7

    Contingency Tables................................................................................................................................... 8

    Bar Graphs ................................................................................................................................................ 9

    Experiment: Part 3...................................................................................................................................... 11

    2(Chi-Squared) Test of Independence ................................................................................................ 11

    Two Sample Z Hypothesis Test for Proportions .................................................................................... 14

    Power and Error ..................................................................................................................................... 17

    Two Sample Z Confidence Interval for Proportions ............................................................................... 18

    More Inferences .......................................................................................................................................... 20

    2(Chi-Squared) Test of Homogeneity .................................................................................................. 20

    Response Bias ............................................................................................................................................. 232

    (Chi-Squared) Test of Homogeneity .................................................................................................. 23

    Experiment: Part 4...................................................................................................................................... 26

    Conclusion and Error Report ................................................................................................................. 26

    Appendix ..................................................................................................................................................... 27

    Afterword .................................................................................................................................................... 30

  • 8/14/2019 Statistics Project - A Matter of Change

    4/32

    2

    PrefaceWe live in a world of bias.

    Since the beginning of mankind to the present day the concept of gender equality is just that, a concept.Although we must credit the progress we have achieved, a question arises; have we really progressed as

    far as we may believe?

    From zero to sixty in 3.5 seconds, we drive our cars while others zoom by, but eventually we face the red

    light. We look to the side and we see a Mercedes Benz but we look to the other side and we see a male

    solicitor. Then we wonder, why do they pander? How do they do survive? How much do they make?

    Green light; time to coast down the street; red light already? We look to the side and we see a Bentley

    but we look to the other side and we see a homeless woman. Tough life; HmmIs she more likely to

    receive money?

    In our society, based on democratic values of equality it would be easy to assume that they receive

    change at a similar frequency. But do they? Generalizing to a broader population, deviating away from

    the homeless population, a similar question arises; do males or females receive money at a similar

    frequency if they should chose to ask or are the gender of the solicitor a factor that determines how

    often he/she receives money from a random person.

    This question is a very interesting one and we have decided to conduct an experimental to explore any

    potential relations among the gender of a solicitor and the number people that agree to give some

    money. The experimental study1

    will be conducted in the cafeteria2

    from the time period 1:53 pm to

    2:28 pm for two days so any results that we have collected can only be generalized to people in the

    cafeteria at this time. However we hope that even in this small setting we may be able to test if there is

    any slight hint of gender inequality.

    In short, we will design an experiment to answer the following question:

    Does the gender of a solicitor affect the frequency that a random person would give him/her a

    requested amount of money in the school cafeteria3? While our experiment will be designed to answer

    the above question, there may be other potential questions that will arise. Such question will be

    discussed later on in the report.

    The following will be a detailed procedure of our experimental study and the individual stages that were

    executed. Before beginning this experiment we hypothesize that the female may receive more money.

    1The basics will be detailed later.

    2We will discuss why later.

    3Our question remains restricted to the school cafeteria as our population that we draw the data from reside only in the cafeteria. Our

    decision for choosing the cafeteria will be explained as we go on.

  • 8/14/2019 Statistics Project - A Matter of Change

    5/32

    3

    Experiment: PART 1Planning

    Outline and BasicsThe first step of our experiment was to decide on the basic mechanics of it. We planned to have two

    solicitors ask a random person for some money. One of the solicitors would be a male while the other

    would be a female. We would rotate the solicitors around record the responses.

    Based on these results we would try to investigate if there is any relationship between the gender of the

    solicitor and the amount of people that would give money to him/her. We would conduct these

    investigations using multiple statistical inference tests.

    Another important decision that was made was that we choose to ask for a quarter each time. The

    reason for this is that it seemed more common for a person to ask for a quarter as opposed to a dime or

    nickel. We did not plan to ask for any money value higher than a quarter because a larger amount of

    money would influence an individuals willingness to give any money, as opposed to a quarter which is

    not as significant in value compared to the dollar.

    Location and TimeThe next step of the experiment was to designate a time and location. We made several considerations:

    The Library

    The Gym

    The Main Mall4

    The Mall

    The Cafeteria

    Given that we required a large enough sample to conduct our experiment we ruled out the library. We

    considered the gym because there were usually a lot of people that would hang around there. However

    we realized that it was AP testing week and many students were cleared out. Furthermore we realized

    that in the gym most students were dressed out in their uniforms and were very unlikely to carry any

    bags/purses or wallets. Next we considered the main mall. At first we felt that the main mall offered a

    large sample of students. Therefore we would have been able to generalize any results to the school.

    However the amount of time that the large amount of students hanging out in the main mall is very

    brief about five to seven minutes. After considering this we felt that we would not have an adequate

    amount of time to conduct an experiment for a decent sized sample; so we ruled the main mall out as a

    suitable location to conduct our experiment. We also considered the shopping mall as a possible

    location to conduct our experiment. The mall provided a very large sample of people and we were able

    4The large hallway that is interconnected with the A wing and the B wing starting from the cafeteria and ending at the C wing hall way

  • 8/14/2019 Statistics Project - A Matter of Change

    6/32

    4

    to generalize any results to the population of people that went to the mall. Not only is this a larger

    population than the population of students at school it also included adults of all ages; which would

    have allowed us to generalize our results to a broad range of ages rather than students between the

    ages of 14-18. Unfortunately like the preceding considerations the mall also had its cons. We realized

    that the large and broad range of people could be dangerous 5; something bad could happen

    unexpectedly.

    After all these considerations we were left with the school cafeteria. The cafeteria offered a large

    sample of students and thanks to the hall monitors they were forces to remain there. By conducting our

    experiment in our cafeteria we were able to sample the students for about 40 minutes; an adequate

    amount of time we believed to collect sufficient data for our experiment.

    Since we settled on collecting data in the school cafeteria the time that we should conduct the

    experiment became relatively easy to decide; it would be time that we had lunch, from 1:53 to 2:28.

    However this would mean that we would have to sacrifice our lunch to conduct the experiment.

    RandomizationAfter deciding on a location we needed to

    decide on a way to randomize all process of

    the experiment as a way to reduce bias. We

    decided that our main source of randomness

    would be from a random number generator.

    We assigned the male the number zero and

    the female the number one; the male andfemale being the two different treatments. We

    used the random number generator from the

    Texas Instrument 89 Titanium6

    to randomly

    choose a random number from zero to one.

    When the randint( ) function outputs a zero we would ask the male solicitor to ask a person for a

    quarter. When the randint( ) function outputs a one we would ask the female solicitor to ask a person

    for a quarter. Both solicitors would ask the exact same question Do you have a quarter I can have? The

    question was standardized in order to reduce the effect of any response bias.

    5We were always told never talk to strangers

    6TI-89 Titanium Operating System v3.10, with the Statistics with List Editor App for the TI-89 Titanium is needed.

  • 8/14/2019 Statistics Project - A Matter of Change

    7/32

    5

    Some ConsiderationsWhen designing this experiment we took into account several considerations. First we decided that any

    volunteer for the solicitor would have to be an average person. They can neither be well liked nor

    despised. We also considered conducting a blinded experiment by not informing the solicitor why they

    are asking for a quarter. However realistically speaking if we had done that, no one would havevolunteered to be a solicitor for us. Therefore we decided to tell both solicitors that they were helping

    us in a statistical experiment.

    Another extremely important consideration we took into account was; what should we do with the

    money?We decided that if the random person agreed to give a quarter we would tell them it was a

    statistical experiment and return the quarter. Furthermore when the solicitors went to ask for the

    quarter we would stay at a distance in order not to influence a respondents decision.

    Each time a person is chosen at random we would ask the solicitors if they knew the person at all. If they

    did then we would skip the person and ask the next 12th

    person.

    After considering these factors we proceeded to one of the most crucial parts of the experiment; finding

    volunteers to be the average solicitor.

    VolunteersFinding the volunteers was a bit hard because not many of them were willing to be a solicitor. Some

    responses included: Yeah Im no hobo, Go ask yourself, I have work to do (dont we all), and

    Maybe later. After a while we found a sophomore male student in the library that would volunteer as

    a solicitor. We were also to find a female junior that agreed to help us also. The male was an AfricanAmerican while the female was Caucasian. With our volunteers found, we were able to execute the

    experiment.

    Make Inferences

    Treatment one: Male Solicitor

    Treatment one: Male Solicitor

    Males

    Females

    Treatment two: Female Solicitor

    Treatment two: Female Solicitor

    R

    A

    N

    D

    O

    M*

    Do you have aquarter I can

    have?

    Visual Representation of our experiment

    *To understand the randomization process please

    refer to the above section Randomization

  • 8/14/2019 Statistics Project - A Matter of Change

    8/32

    6

    More QuestionsAfter planning out our experiment we were met with two other questions that we decided to explore as

    well:

    Is there a relation between solicitors asking across genders? Such was male to male, male to

    female. Or female to female and female to male.Is there a relation between how the solicitor asks the question and the response that the get?

    In exploring the first question we decided just to further categorize the data that we had planned to

    take in our initial experiment into male-yes, male-no, female-yes and female-no. Doing it this way, we

    can explore the first question and combine the male-yes and female-yes and the male-no and female no

    for a total yes and no to explore our initial question regarding gender equality.

    To explore our second question, we have decided to have one of the solicitors ask two different

    questions: one that is biased and one that is not. In this case we will not consider the gender of the

    random person and focus our attention on the response of the person based on the question that

    he/she is asked. We will use the data that we will collect for the male solicitor and use that as a non-biased data7. Then we will ask the male solicitor to ask in a biased form afterwards and add that to our

    data table for a non bias vs. bias relation data table.

    The biased question will be: Do you have a quarter I can have? Im not paying you back.

    This question carries a negative bias therefore we hypothesize that they may be a difference in

    responses.

    Summing up our proposed procedure, we have several relations

    Initial Question-Variables: See relationship between gender overall to see who is more likely to get the

    quarter. The independent variable is the treatment of a male or female solicitor. The dependent variable

    is the response that is received.

    Further Questions 1-Variables: See relationships across genders to see how each gender responses to

    the question. The independent variable is the treatment of a male or female solicitor. The dependent

    variable is the response that is received.

    Further Questions 2-Variables: To see if biased question is less effective in soliciting a quarter. The

    independent variable is the treatment of a biased or non-biased question. The dependent variable is the

    response that is received.

    7We do not have time to conduct a separate experiment; therefore we decided to reuse data that are independent of one another.

  • 8/14/2019 Statistics Project - A Matter of Change

    9/32

    7

    Experiment: PART 2Execution

    Day OneOn our first day we gathered data on the question: Do you have a quarter I can have? The two subjects

    we had chosen would go up to a respondent ask that question many, many, times. As the experiment

    progressed we observed distinct patterns or characterizations in the responses of the respondents.

    As our solicitors cordially asked their questions, it appeared that women were more likely to give a

    definite answer. Unlike their male counterparts, women distinctively knew if they had the change or

    not, therefore they were quick to respond either no or yes. Men on the contrary often fumbled through

    their pockets, checked their wallets, or patted their backpack before uttering a definite statement.

    Rarely we had also come upon those that had offered lesser amounts of money, ie. dimes and nickels.

    Other behavioral characteristics included response methods. Of those men who had agreed to give the

    change, some quickly found it, handed it, and walked away. Others stopped to find it, handed it, and

    waited for a thanks. Interestingly enough most women who responded yes almost always stopped

    to look through their purse, unlike the drive by give of their male counterparts.

    There were also distinct patterns of those who said no. Most commonly men would check their pockets

    and provide a solemn look of sorry before they walked away. Others blatantly walked away as if they

    had not heard the question.

    Some gender to gender interactions were also noticeable. When our female solicitor had approached a

    male respondent, rarely did he ever walk away without any response. On the contrary, our male solicitor

    had fairly equal response rates across genders. When our female solicitor had approached a female

    respondent we incurred similar results; the female rarely irresponsive.

    During day one, one of use worked the number generator while the other recorded the data. The

    solicitors remained the same.

    Day Two

    On day two, we gathered data on the question: Do you have a quarter I can have? I cant pay you

    back. Contrary to the original question, we wondered how people would respond if they were blatantly

    told this wasnt a loan. Initially our subject asked the question awkwardly, as if it was scripted, but

    eventually his voice had reached a level of comfort with the question at hand. Although both questions

    would solicit for the same item: change, we predicted that the biased question would gather far less

    change than the original question although since both questions are essentially asking for the same

    result they shouldnt.

  • 8/14/2019 Statistics Project - A Matter of Change

    10/32

    8

    Many respondents, initially, did not understand the question, or had taken time to think about the

    question asked to them. Most seemed to look at our solicitor with an awkward attitude and slowly said

    in almost a questioning manner; No? Some stalled for time with a casual uh and as we had predicted

    there were less Yes responses. Behavioral characteristics noted above, applied as it did before on our

    biased question. At the conclusion of our experiment, our soliciting subject seemed exploited, and

    demanded to be relieved of the job.

    During day two, we switched roles; one of worked the number generator while the other recorded the

    data. The solicitors remained the same.

    Day Two ContinuedOrganizing the DataIn organizing our data we have considered several options. However since our data is categorical rather

    than quantitative we cannot use a histogram nor can we use a five number summary to accurately

    describe our data.

    Table 1Male Yes Male No Female Yes Female No Total

    Male 8 28 7 20 63

    Female 9 23 12 26 70

    Total 17 51 19 46 133

    Table 2

    Yes No Total

    Male 15 48 63

    Female 21 49 70

    Total 36 97 133

    Table 3

    Yes No Total

    Non-Biased 15 48 63

    Biased 3 34 37

    Total 18 82 100

    Table 1

    Question:Do you have a quarter I can have?

    Table 2

    Question:Do you have a quarter I can have?

    Table 3

    Question 1 Unbiased:Do you have a quarter I can have?

    Question 2 Biased:Do you have a quarter I can have? Im not paying you back.

  • 8/14/2019 Statistics Project - A Matter of Change

    11/32

    9

    Table 1 : Males vs. Females across gender

    Table 2 : Males vs. Females

    0

    5

    10

    15

    20

    25

    30

    Male Yes Male No Female Yes Female No

    Male

    Female

    0

    10

    20

    30

    40

    50

    60

    Yes Response No Response

    Males

    Females

  • 8/14/2019 Statistics Project - A Matter of Change

    12/32

    10

    Table 3: Biased vs. Non-Biased

    Initially we expressed our data in bar graphs but we decided on focusing on the contingency tables

    because they could further be used for the 2 (chi-square) test. Furthermore the contingency tables

    allow us to better look at the categorical data in numeric terms while the bar graphs give a visualrepresentation without specified numbers.

    Since our data our categorical we found that we had no way to find the mean or our data. The mean of

    our data would not provide any benefit in any case. As stated above the five number summaries would

    not provide any insight on the data as our data was categorical. Lastly as with the mean and five number

    summary, we decided that there was also no need for a standard deviation8.

    8However we did use the Standard Error for our Z-tests.

    0

    10

    20

    30

    40

    50

    60

    Yes No

    Non-Biased

    Biased

  • 8/14/2019 Statistics Project - A Matter of Change

    13/32

    11

    Experiment: PART 3Making Inferences

    2

    (Chi-Squared) Test of Independence

    Population of interest: Students in the Mc Neil high school cafeteria.

    Ho:The results of male and female solicitors asking the question: Do you have a quarter I can have?is independent of gender.

    Ha:The results of male and female solicitors asking the question: Do you have a quarter I can have?

    is NOT independent of gender.

    = 0.05Conditions:

    Counted data condition:

    The data must be in counts for the categories of a categorical variable.

    Independence Assumption:

    Randomization Condition: The individuals who have been counted and whose counts are

    available for analysis should have been randomly selected.

    Sample Size Assumption:

    Expected Cell Frequency: We should expect to see at least 5 individuals in each cell.

    Since:The Gathered data are in counts as we have counted the number of yes and no in our data table. The

    individuals were treated with a treatment that was randomized by the Texas Instrument 89 Titanium

    randint( ) function. It is reasonable to assume that the randomization assumption is met. All expected

    cell frequency counts are at least 5 individuals in each cell (we have calculated this in the calculations),

    the sample size is big enough and the assumptions are met.

    Then:

    It is reasonable to proceed with the hypothesis test: 2 (Chi Squared) Test of Independence, with

    degrees of freedom (Row 1) x (Column 1) = 1

  • 8/14/2019 Statistics Project - A Matter of Change

    14/32

    12

    Calculations:

    General Inference for IndependenceSince the PValue 0.4224 is greater than any reasonable alpha value ( = 0.05), we fail to reject Ho. There

    is not sufficient evidence to claim that the results of male and female solicitors asking the question: Do

    you have a quarter I can have? is not independent of gender.

    2 4 6 8 10

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

  • 8/14/2019 Statistics Project - A Matter of Change

    15/32

    13

    Two Sample Z Hypothesis Test for ProportionsPopulation of Interest:Students in the Mc Neil high school cafeteria.

    P1:The true proportion of Mc Neil High School students who answer Yes to the question; Do you

    have a quarter I can have? when asked by a female.

    P2:The true proportion of Mc Neil High School students who answer Yes to the question; Do you

    have a quarter I can have? when asked by a male.

    Ho:P1 P2 = 0

    Ha:P1 P2 0

    = 0.05

    Conditions:

    Independence Assumption:

    Randomization Condition: Participants must be randomly assigned to experimental treatment

    groups.

    Sample Size condition: Each sample must be reasonably less than 10% of their respective

    populations.

    Success Failure Assumption:

    There must be at least 10 successes and 10 failures in order for the sample size to be large

    enough.

    Since:

    The individuals asked, were randomly assigned to either the male or the female solicitor by the Texas

    Instrument 89 Titanium randint( ) function, it is reasonable to assume that the randomization

    assumption is met. Of the people asked, it is reasonable to assume that the samples are less than 10% of

    all students in the cafeteria. There are indeed 10 successes and 10 failures for both proportions. The

    number of successes for the male is 15 and the number of failures for the male is 48. The number of

    successes for the female is 21 and the number of failures for the female is 49.

    Then:It is reasonable to proceed and use the normal model to conduct a Two Sample Z Hypothesis Test for

    Proportions.

  • 8/14/2019 Statistics Project - A Matter of Change

    16/32

    14

    Calculations:

    4 2 2 4

    0.1

    0.2

    0.3

    0.4

  • 8/14/2019 Statistics Project - A Matter of Change

    17/32

    15

    OR

    General Inference for Difference in Gender

    Since the PValue 0.4313 is greater than any reasonable alpha value ( = 0.05), we fail to reject Ho. There

    is not sufficient evidence to claim that the true proportion of Mc Neil High School students who answer

    Yes to the question; Do you have a quarter I can have? when asked by a female is different from the

    true proportion of Mc Neil High School students who answer Yes when asked by a male.

  • 8/14/2019 Statistics Project - A Matter of Change

    18/32

    16

    Type I Error:We reject the null hypothesis when it is in fact true.

    In context: We conclude that there is a difference between the proportion of men and women who said

    yes to different genders begging the question that there is inequality due to gender when there is not.

    Type II Error:We fail to reject the null hypothesis when it is in fact false.

    In context: We conclude that there is no difference between the proportion of men and women who

    said yes to different genders begging the question that there is equality due to gender when there in

    fact may not.

    Power: (1-)

    The power of a test is the probability that it correctly rejects a false null hypothesis. The distance

    between the null hypothesis value Po, and the truth, P, is the effect size. By reducing type I error we

    increase type II error, this applies vice-versa. By increasing the a value we decrease the b value which

    increases power, the ability to decrease type II error, but as a result it increases type I error. To increase

    power the best course of action is to increase the sample size. When increasing the sample size we

    decrease the standard deviations and in turn decrease both type I and type II error.

  • 8/14/2019 Statistics Project - A Matter of Change

    19/32

    17

    Two Sample Z Confidence Interval for Proportions

    Population of Interest:Students in the Mc Neil high school cafeteria.

    P1:The true proportion of Mc Neil High School students who answer Yes to the question; Do you

    have a quarter I can have? when asked by a female.

    P2:The true proportion of Mc Neil High School students who answer Yes to the question; Do you

    have a quarter I can have? when asked by a male.

    We will use a 95% confidence interval

    Conditions:

    Independence Assumption:

    Randomization Condition: Participants must be randomly assigned to experimental treatment

    groups.Sample Size condition: Each sample must be reasonably less than 10% of their respective

    populations.

    Success Failure Assumption:

    Theremust be at least 10 successes and 10 failures in order for the sample size to be large enough.

    Since:

    The individuals asked, were randomly assigned to either the male or the female solicitor by the Texas

    Instrument 89 Titanium randint( ) function, it is reasonable to assume that the randomization

    assumption is met. Of the people asked, it is reasonable to assume that the samples are less than 10% of

    all students in the cafeteria. There are indeed 10 successes and 10 failures for both proportions. The

    number of successes for the male is 15 and the number of failures for the male is 48. The number of

    successes for the female is 21 and the number of failures for the female is 49.

    Then:

    It is reasonable then to proceed with the Two Proportion Z Confidence Interval using the normal model

    with 95% confidence.

    Calculations:

  • 8/14/2019 Statistics Project - A Matter of Change

    20/32

    18

    4 2 2 4

    0.1

    0.2

    0.3

    0.4

  • 8/14/2019 Statistics Project - A Matter of Change

    21/32

    19

    General Inference for Male vs. Female SolicitingBased on these samples, we are 95% confident that the true difference in proportions of Mc Neil High

    School students who answer Yes to the question; Do you have a quarter I can have? when asked by

    a female and the true proportion of Mc Neil High School students who answer Yes to the question;

    Do you have a quarter I can have? when asked by a male is from -0.2121 to 0.0883.

    If we randomly and independently sample form two populations many, many number of times, the true

    difference in the proportions of Mc Neil High School students who answer Yes to the question; Do

    you have a quarter I can have? when asked by a female and the true proportion of Mc Neil High School

    students who answer Yes to the question; Do you have a quarter I can have? when asked by a male

    would be captured in about 95 out of every 100 intervals.

    What Does it ALL Mean?From the above inferences we have observed several trends. First of all, we have successfully concludedthat the female and male solicitors asking the question: Do you have a quarter I can have? is

    independent of gender. This means that gender does not affect a respondents reaction nor does it

    affect their response, agreeing to give change. From the hypothesis test for proportions we have

    concluded that there is no true difference in the proportion of respondents who said Yes to either our

    male or female solicitor. In context this means that being a certain gender does not factor into receiving

    the change. More importantly, according to our data, males and females have an equal opportunity to

    receive the favorable answer: Yes. The confidence interval, further, provides evidence that the

    difference between the two populations includes zero. This means that it is plausible to assume that

    there may be no difference between the responses to our male and female solicitor. Based on these

    data, it is possible to assume that there may in fact be gender equality, or at least in the cafeteria.

  • 8/14/2019 Statistics Project - A Matter of Change

    22/32

    20

    More Inferences

    2(Chi-Squared) Homogeneity

    Population of interest: Students in the Mc Neil high school cafeteria.

    Ho:The results of male and female solicitors asking the question: Do you have a quarter I can have?

    is independent of gender.

    Ha:The results of male and female solicitors asking the question: Do you have a quarter I can have?

    is NOT independent of gender.

    = 0.05

    Conditions:

    Counted data condition:

    The data must be in counts for the categories of a categorical variable.

    Independence Assumption:

    Randomization Condition: The individuals who have been counted and whose counts are

    available for analysis should have been randomly selected. The samples must be independent.

    Sample Size Assumption:

    Expected Cell Frequency: We should expect to see at least 5 individuals in each cell.

    Since:

    The Gathered data are in counts (we counted the number of yes and no responses from the random

    person). The individuals asked, were randomized by the Texas Instrument 89 Titanium randint( )

    function therefore it is reasonable to assume that the randomization assumption is met. All expected

    cell frequency counts are at least in each cell therefore the sample size is large enough and the

    assumption is met.

    Then:

    It is reasonable to proceed with the hypothesis test: 2 (Chi Squared) Test of Homogeneity, with degrees

    of freedom (Row 1) x (Column 1) = 3

  • 8/14/2019 Statistics Project - A Matter of Change

    23/32

    21

    Calculations:

    2 4 6 8 10

    0.05

    0.10

    0.15

    0.20

  • 8/14/2019 Statistics Project - A Matter of Change

    24/32

    22

    General Inference of Gender To Gender SolicitingSince the PValue 0.5153 is greater than any reasonable alpha value ( = 0.05), we fail to reject Ho. There

    is not sufficient evidence to suggest that the results of male and female solicitors asking the question:

    Do you have a quarter I can have? across gender have different distributions.

    What Does it ALL Mean?From the above test we have concluded that across genders, meaning a male asking men and women; a

    female asking men and women; there is not enough evidence to suggest that the response distributions

    among the different categories are different. In context this means that between our male and female

    solicitor, they received similar proportions of responses in each of the categories: men and women

    respectively. So based on these result, it does not matter who a solicitor asked; he/she would probably

    get similar responses.

  • 8/14/2019 Statistics Project - A Matter of Change

    25/32

    23

    Response Bias

    2(Chi-Squared) Test of Homogeneity

    Population of interest: Students in the Mc Neil high school cafeteria.

    Ho:The results of a solicitor asking the question: Do you have a quarter I can have? or asking: Doyou have a quarter I can have? I cant pay you back. have the same distribution.

    Ha:The results of a solicitor asking the question: Do you have a quarter I can have? or asking: Do

    you have a quarter I can have? I cant pay you back. do not have the same distribution.

    =.05

    Conditions:Counted data condition:

    The data must be in counts for the categories of a categorical variable.

    Independence Assumption:

    Randomization Condition: The individuals who have been counted and whose counts are

    available for analysis should have been randomly selected. The samples must be independent.

    Sample Size Assumption:

    Expected Cell Frequency: We should expect to see at least 5 individuals in each cell.

    Since:The Gathered data are in counts (we counted the number of yes and no responses from the randomperson). The individuals asked, were randomized by the Texas Instrument 89 Titanium randint( )

    function therefore it is reasonable to assume that the randomization assumption is met. All expected

    cell frequency counts are at least in each cell therefore the sample size is large enough and the

    assumption is met.

    Then:

    It is reasonable to proceed with the hypothesis test: 2 (Chi Squared) Test of Homogeneity, with degrees

    of freedom (Row 1) x (Column 1) = 1

    Calculations:

  • 8/14/2019 Statistics Project - A Matter of Change

    26/32

    24

    General Inference for Response BiasSince the PValue 0.0484 is less than alpha of .05, we reject Ho. There is sufficient evidence to claim that

    The results of a solicitor asking the question: Do you have a quarter I can have? or asking: Do you

    have a quarter I can have? I cant pay you back. do not have the same distribution.

    2 4 6 8 1 0

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

  • 8/14/2019 Statistics Project - A Matter of Change

    27/32

    25

    Experiment: PART 4Conclusion and Error ReportBased on our experiment we found that there is considerable gender equality within the cafeteria. For

    the most part the male and female would have gotten the same amount of quarter if they had decided

    to keep it during the experiment. Contrary to our initial belief that the female would receive more

    money we discovered that the gender difference had no profound effect on the money received. In

    general we believe that it is not worth it to ask random people for money as it does not yield a

    substantial profit9. The Pvalue for our experiment were generally higher than any reasonable alpha,

    excluding the response bias test. Therefore our results were not statistically significant.

    Going back to our initial question (Incase it has been forgotten it is: Does the gender of a solicitor

    affect the frequency that a random person would give him/her a requested amount of money in the

    school cafeteria?) we feel that there is no difference in the frequency money that a person getsdepending on the gender. In short, we believe that our inference test done based on the data that we

    have collected, suggest that It does not matter what gender an individual is, he/she would probably be

    faced with the same amount of yes and no responses from any random person they should so happen to

    choose.

    When conducting this experiment several factors contribute to a list of potential errors that probably

    occurred. First of all we felt that we were too restricted in the cafeteria. We initially wanted to find out

    how equal males and females were, however, we had to settle for the males and females in the

    cafeteria and how equal they were.

    There were also a number of discrepancies that occurred:

    Towards the end of the lunch period our solicitor began to ask for quarters half-heartedly which

    pretty much translated as I dont need a quarter to the random person.

    Some of the people we asked were sitting in groups and were influenced by their peers. At first,

    when the solicitor asked for a quarter the random person responded no. However sometimes

    their peers commented on how they should give (rarely though!) and they ended up giving.

    At times some of the random people were too preoccupied eating or talking and totally ignored

    the solicitor.

    Some of the people that we told our solicitors to ask actually spotted us from a distance (they

    actually saw us writing stuff down). Though we cannot give direct proof as to the validity of thebias that it caused. We can assume it had an effect because almost immediately upon seeing us

    they shook their heads.

    9Do we see another experiment coming up!?

  • 8/14/2019 Statistics Project - A Matter of Change

    28/32

    26

    Our solicitors tried to cheat us at times and asked their friends for a quarter. We realized

    something was wrong and quickly asked the person if he/she know the solicitor. If they said yes,

    we would cross out the data collected from that trial.

    Our solicitors also tried to cheat us by pretending to ask someone. At times, they felt

    embarrassed to ask some people, whether it was due to awkwardness or intimidation factors.

    Also they would pass their own quarters as having received a quarter.

    We cannot guarantee that the random number generator on the Ti 89 Titanium is perfectly

    random as it itself follows an algorithm to implement the random number generation. Such

    algorithms contradict the definition of randomness since algorithms follow a certain procedure.

    For our purposes10

    we believe that the Ti 89 Titanium is sufficient enough.

    Due to time constraints we were forced to reuse data. Although we believe that the data we

    used are independent of one another regarding the different experiments. We would need to

    formally test that in order to make sure. So there is a potential that the data we reuse are

    related in the different experiments regarding gender bias and response bias.

    The week in which we conducted the experiment was during the AP testing week so there were

    some people that were gone during the lunch period. Therefore we had some of the population

    missing.

    However the BIGGEST problem for us was not anticipating the experiment to go as we had thought it

    would. We expected more random people to respond yes to our solicitor; however we discovered that a

    data was so one-sided. There just was not enough people saying yes, therefore we had to compensate

    by taking a larger sample until we felt that there was an adequate amount of success and failure.

    Overall, we felt that this experiment went relatively smooth. We were able to collect our data, and the

    solicitors cooperated somewhat. However like most experiments the reality of the process is far

    different from the theory of our planning. All in all we were able to conduct an investigation and as a

    result were able to find a direction for our question. Though we have not fully supported that the

    genders are equal, our original hypothesis that the female would have a more favor response was

    discredited.

    10Pseudo random, this is probably good enough for us.

  • 8/14/2019 Statistics Project - A Matter of Change

    29/32

    27

    APPENDIXNormal Distribution (Gaussian Distribution)

    The Normal Distribution, also known as the Gaussian Distribution is a probability distribution that

    describes data, numerical and categorical, around an average value with deviations. The function is bell

    shaped, with its peak at the mean

    known as the bell curve. The

    distribution as named after Carl

    Friedrich Gauss, who used it for

    the analysis of astronomical data.

    Its formula is defined by the

    probability density function. In

    order to use the normal model a

    distribution must be symmetrical.

    The empirical rule is the area

    under the curve of the function

    with intervals of 1 standard deviation from the mean. The first interval is approximated to include 67%

    of the distribution, the second interval includes 95% of the distribution, and the final interval includes

    99.7% of the distribution. The PDF function for the Normal Distribution is:

    Chi-Square Distribution

    Given that the assumptions are met, the Chi-Square Distribution is used in statistical significance tests.

    Though this method the quantities can be shown to have distributions that approxi mate with a heavy

    tailed Chi-Square distribution, given the null hypothesis is true. Common Chi-Square tests include the

    Chi-Square goodness of fit, homogeneity, and independence. The Chi-Square Distribution is a test that is

    conducted to make inferences on counts across several different categories. The Chi-Square test will

    always be a one-sided test. There will not be a two sided test as with the Normal Distributions and the T

    Distribution. The PDF function for the Chi-Square Distribution is:

  • 8/14/2019 Statistics Project - A Matter of Change

    30/32

    28

    General Conditions:

    Counted data condition:

    The data must be in counts for the categories of a categorical variable.

    Independence Assumption:

    Randomization Condition: The individuals who have been counted and whose counts are

    available for analysis should have been randomly selected. The samples must be independent.

    Sample Size Assumption:

    Expected Cell Frequency: We should expect to see at least 5 individuals in each cell.

    Gamma FunctionThough not explicitly used in our calculations, the Gamma Function plays an important role in many

    distributions such as the 2

    Distribution and the Students t Distribution. The gamma function is denoted

    by the capital Greek alphabet . The formal definition of the gamma function is:

    The Gamma function extends the notion of the factorial to all real numbers excluding the negative

    integers. Basically

    for all values of x excluding the negative integers. Below is a plot of the gamma function.

    4 2 2 4 6

    20

    10

    10

    20

    30

  • 8/14/2019 Statistics Project - A Matter of Change

    31/32

    29

    Error FunctionThe error function is defined as:

    The error function has many uses in Statistics and is also instrumental in calculations involving the

    normal model. The derivatives of the family of curves, for the error function, are the probability density

    functions for the normal model. Below is a plot of the error function.

    Degrees of Freedom

    By definition the number of degrees of freedom is the number of values in the final calculation of astatistic that are free to vary. The degrees of freedom for an estimate are congruent to the number of

    Independent data that insert into the estimation subtracted from the number of parameters estimated.

    Several statistical distributions such as the Students t and the Chi-Squared Distributions use the

    parameters degrees of freedom. The degrees of freedom (df) emerges from the residual sum of squares.

    Although the term is commonly used among the different distributions, often times they are calculated

    in several methods and may not have correlation to one another.

    PDFs

    The probability density function is a function that gives the probability corresponding to a given x-value.To find the probability that a random variable would fall in a given interval, one would simply take the

    integral of the probability density function. By definition the probability of a random variable falling

    within a given interval (take the interval [a, b] for instance) is equal to:

    The sum of all probabilities within the interval [a, b] = P(a) or P(a+x) or P(a+2x) or or P(b-x) or P(b)

    = P(a) + P(a+x) + P(a+2x) + .+ P(b-x) + P(b) =

    4 2 2 4 6

    1.0

    0.5

    0.5

    1.0

  • 8/14/2019 Statistics Project - A Matter of Change

    32/32

    30

    AfterwordWe live in a world of bias.

    Since the beginning of mankind to the present day the concept of gender equality is just that, a concept.Although we must credit the progress we have achieved, a question arises; have we really progressed as

    far as we may believe?

    From zero to sixty in 3.5 seconds, we drive our cars while others zoom by, but eventually we face the red

    light. We look to the side and we see a Mercedes Benz but we look to the other side and we see a male

    solicitor. Then we wonder, why do they pander? How do they do survive? How much do they make?

    Green light; time to coast down the street; red light already? We look to the side and we see a Bentley

    but we look to the other side and we see a homeless woman. Tough life; Hmm it seems reasonable to

    assume that she makes the same as her male counterpart. Doesnt it?

    Although our experiment answered that high school students may evenly give money without

    consideration on the basis of gender, can we make a judgment on the nature of gender influence on

    other mediums? In short, no, because we have to consider the confounding factors that may exist in the

    decision making process. First our experiment was done on high school students who are between a

    small range of age. The age of a person could be a significant factor in how they respond to gender

    influenced questions partly because the brain develops during the teenage age and matures during

    adulthood causing us to think in different ways. Furthermore our data is based on students cordially

    asking for money, not asking as a necessity to survive as the pan handlers on the road may.

    Were we to do this experiment again, we could better investigate gender influence and biased

    questioning. During this experiment we were in short of time allotted for gathering data. Furthermore

    we were constricted to the cafeteria of McNeil High School. Since our data was collected within a

    narrow margin, we cannot generalize past what we sampled. Our generalizations are constrained to the

    population of McNeil High School students rather than students in general or around the country. Other

    sources to better conduct our experiment may have been public parks or the mall because they include

    a better representation of the population at large. Due to time constraints and potential liability issues

    we strained away from those places.

    Now that we have summarized our results from this experiment we look to the broader question. We

    wonder how people would react if they were faced with a minor injured person. How would they react?

    How would men react compared to women? Does gender have a place in this question? Its too bad we

    have run out of time in this edition but maybe next time we will investigate this new issue in future

    editions.