Statistical Reasoning For Communication Majors. Mean ► This is a common statistic, and it’s...

36
Statistical Statistical Reasoning Reasoning For Communication Majors For Communication Majors

Transcript of Statistical Reasoning For Communication Majors. Mean ► This is a common statistic, and it’s...

Statistical ReasoningStatistical ReasoningFor Communication MajorsFor Communication Majors

MeanMean

► This is a common statistic, and it’s simple.This is a common statistic, and it’s simple.► When we refer to the “average,” this is When we refer to the “average,” this is

usually what we mean.usually what we mean.► Add the values and divide by the number Add the values and divide by the number

of values you have.of values you have.

Mean ExampleMean Example

► A weekly newspaper has seven employees. A weekly newspaper has seven employees. What’s the mean salary? Here are their What’s the mean salary? Here are their salaries:salaries:

► Editor -- $37,000Editor -- $37,000► Assistant Editor -- $32,000Assistant Editor -- $32,000► Reporter -- $28,000Reporter -- $28,000► Ad Sales Manager -- $38,000Ad Sales Manager -- $38,000► Ad Sales Agent -- $31,000Ad Sales Agent -- $31,000► 2 Circulation People -- $22,000 each2 Circulation People -- $22,000 each

Mean ExampleMean Example

► Calculation: Add 37,000 + 32,000 + 28,000 Calculation: Add 37,000 + 32,000 + 28,000 + 38,000 + 31,000 + 22,000 + 22,000 = + 38,000 + 31,000 + 22,000 + 22,000 = 210,000. Then divide by 7 = 210,000. Then divide by 7 = MeanMean salary is salary is $30,000$30,000

► NOTE: NOTE: Mean can be deceptiveMean can be deceptive if there is a if there is a wide spread in the numbers. For example, if wide spread in the numbers. For example, if the editor and ad sales manager made the editor and ad sales manager made $60,000 each, the sales agent made $40,000, $60,000 each, the sales agent made $40,000, and each of the other workers made $12,500, and each of the other workers made $12,500, the mean would be the same, but the picture the mean would be the same, but the picture of the average salary at the newspaper would of the average salary at the newspaper would be much different.be much different.

MedianMedian

► The median means the middle.The median means the middle.► It is the value in the dead center of the list It is the value in the dead center of the list

of values when they are lined up from of values when they are lined up from largest to smallest.largest to smallest.

► It represents the average person or group. It represents the average person or group. For example, if we say “the average For example, if we say “the average household” or “the average worker,” then household” or “the average worker,” then what we are looking for is the median, as in what we are looking for is the median, as in “ordinary” or “most common.” We aren’t “ordinary” or “most common.” We aren’t really talking about the “average” or mean.really talking about the “average” or mean.

Median ExampleMedian Example

► Consider the newsroom salaries used in the Consider the newsroom salaries used in the previous example lined up from largest to previous example lined up from largest to smallest: 38,000, 37,000, 32,000, 31,000, smallest: 38,000, 37,000, 32,000, 31,000, 28,000, 22,000, 22,000. 28,000, 22,000, 22,000.

► The salary in the middle, the “median,” is The salary in the middle, the “median,” is $31,000.$31,000.

► If the halfway lies between two numbers, If the halfway lies between two numbers, split them.split them.

Percent ChangePercent Change

► If the city increased parking fines from $If the city increased parking fines from $1010 to $to $1515, by what percentage did the fines , by what percentage did the fines increase?increase?

► This is simple, too. Subtract the old value This is simple, too. Subtract the old value from the new value (from the new value (15-10=515-10=5), then divide ), then divide by the old value (by the old value (5/10=0.55/10=0.5). Multiply the ). Multiply the result by 100 (result by 100 (0.5x100 = 50 percent0.5x100 = 50 percent ) and ) and that’s the percent change.that’s the percent change.

► 15-10=5 5/10=0.5 0.5x100 = 50 15-10=5 5/10=0.5 0.5x100 = 50 percent.percent.

Tax ExampleTax Example► If the average property tax increased by If the average property tax increased by

$2,000 a year (We’re using median here to $2,000 a year (We’re using median here to find $2,000), what is the average percent find $2,000), what is the average percent change?change?

► New value = $10,000New value = $10,000► Old value = $8,000Old value = $8,000► 10,000 – 8,000 = 2,00010,000 – 8,000 = 2,000► 2,000/8,000 = .252,000/8,000 = .25► 100x.25 = 25 percent100x.25 = 25 percent► So the percent change is +25 percentSo the percent change is +25 percent

Per capita, Rates and Per capita, Rates and ComparisonsComparisons

► Per capita refers to the rate per person. It Per capita refers to the rate per person. It helps make comparisons among large helps make comparisons among large groups, like cities.groups, like cities.

► To get per capita, simply divide the number To get per capita, simply divide the number of incidents by the number of people.of incidents by the number of people.

► A Southern city with a population of A Southern city with a population of 450,000 experienced 16 murders during 450,000 experienced 16 murders during 2009. What is the city’s murder rate per 2009. What is the city’s murder rate per 100,000 population?100,000 population?

► 450,000/100,000 = 4.5 16/4.5 = 3.5 per 450,000/100,000 = 4.5 16/4.5 = 3.5 per 100,000100,000

Per capita examplePer capita example

► If a city has a population of 600,000 and If a city has a population of 600,000 and experiences 12 murders a year, the per experiences 12 murders a year, the per capita murder rate would be 12 divided by capita murder rate would be 12 divided by 600,000.600,000.

► To avoid tiny decimals, divide 600,000 by To avoid tiny decimals, divide 600,000 by 100,000 and report the rate as a number 100,000 and report the rate as a number per 100,000 population.per 100,000 population.

► 600,000/100,000 = 6 12/6 = 2, so the 600,000/100,000 = 6 12/6 = 2, so the murder rate is 2 per 100,000 people.murder rate is 2 per 100,000 people.

► You can also find the percent change of the You can also find the percent change of the per capita rate over time to discover the per capita rate over time to discover the trend in the murder rate.trend in the murder rate.

Comparison ExampleComparison Example

► Suppose you want to know how dangerous Suppose you want to know how dangerous the city is compared to other cities. Our the city is compared to other cities. Our example city has a population of 600,000 example city has a population of 600,000 with 12 murders. A nearby city has 26,000 with 12 murders. A nearby city has 26,000 and 4 murders. Which is more dangerous? and 4 murders. Which is more dangerous? Find the per capita murder rate of each to Find the per capita murder rate of each to know.know.

► Per capita rate for City 1 is 2 per 100,000; per Per capita rate for City 1 is 2 per 100,000; per capita rate for City 2 is 4 per 26,000. City 2 is capita rate for City 2 is 4 per 26,000. City 2 is more dangerous because it has 15.4 murders more dangerous because it has 15.4 murders per 100,000 (4/.26 = 15.38) people compared per 100,000 (4/.26 = 15.38) people compared to City 1’s 2 murders per 100,000.to City 1’s 2 murders per 100,000.

Standard DeviationStandard Deviation

► In most situations, most people or values In most situations, most people or values will group toward the middle.will group toward the middle.

► Those that don’t are different. Those that don’t are different. ► If many group outside the middle, then that If many group outside the middle, then that

tells you something about the situation – it tells you something about the situation – it tells you that whatever you’re looking at tells you that whatever you’re looking at isn’t expected.isn’t expected.

Standard DeviationStandard Deviation

► For normal For normal situations, the situations, the “curve” will look “curve” will look bell-shaped, like bell-shaped, like this:this:

Standard DeviationStandard Deviation► Most healthy women will Most healthy women will

eat between 1,700 and eat between 1,700 and 2,000 calories a day. If you 2,000 calories a day. If you plot how many calories plot how many calories women eat, each woman’s women eat, each woman’s intake will be one value. intake will be one value. Plot them on a sheet of Plot them on a sheet of paper along a line and most paper along a line and most of the of the valuesvalues (number of (number of calories) will land in the calories) will land in the middle of the spread. That middle of the spread. That will be what is called a will be what is called a “normal distribution.”“normal distribution.”

Normal distribution

Standard DeviationStandard Deviation

► In a normal distribution, about 68% of the In a normal distribution, about 68% of the women will gather in the middle. They are “women will gather in the middle. They are “one one standard deviationstandard deviation” away from the middle on ” away from the middle on either sideeither side. (The . (The blueblue area on the graph.) area on the graph.)

► Two standard deviations away will account Two standard deviations away will account for about 95%. (The for about 95%. (The blueblue areas and the areas and the brownbrown areas.) areas.)

► So, So, 95% of the values95% of the values in most situations in most situations will be considered “will be considered “normalnormal.” However, all .” However, all but the middle 68% will be somewhat but the middle 68% will be somewhat abnormal, but not excessively abnormal.abnormal, but not excessively abnormal.

► Three standard deviations away from the Three standard deviations away from the middle will account for about 99% of the middle will account for about 99% of the values. (The values. (The blueblue, , brownbrown, and , and greengreen areas). areas). The values in the green areas are more The values in the green areas are more abnormal, but we expect about 4% of values to abnormal, but we expect about 4% of values to fall into these areas, because life is not perfect.fall into these areas, because life is not perfect.

Standard DeviationStandard Deviation

► If a scientific study concludes that 99% of If a scientific study concludes that 99% of the values fall within three standard the values fall within three standard deviations, then you have a normal situation deviations, then you have a normal situation and the conclusions can be trusted.and the conclusions can be trusted.

► A good public opinion survey, for example, A good public opinion survey, for example, that concludes Americans support the that concludes Americans support the President’s policies can be trusted if the President’s policies can be trusted if the values (support for the president) fall in a values (support for the president) fall in a normal bell curve with most of the people normal bell curve with most of the people saying they support the policies.saying they support the policies.

► But what about the situations where the But what about the situations where the values don’t fall in a normal bell curve? values don’t fall in a normal bell curve?

► Then you have Then you have untrustworthyuntrustworthy results, or at least results, or at least you know that you know that more than you would expect don’t more than you would expect don’t fit the normal patternfit the normal pattern. In the graph at top, most of . In the graph at top, most of the values fell to the left of center. In other words, the values fell to the left of center. In other words, most of the values are outside the normal range.most of the values are outside the normal range.

Margin of ErrorMargin of Error

► Margin of ErrorMargin of Error deserves better than the deserves better than the throw-away line it gets in the bottom of throw-away line it gets in the bottom of stories about polling data. Writers who stories about polling data. Writers who don't understand margin of error, and its don't understand margin of error, and its importance in interpreting scientific importance in interpreting scientific research, can easily embarrass themselves research, can easily embarrass themselves and their news organizations. and their news organizations.

Margin of ErrorMargin of Error

► The margin of error is what statisticians call a The margin of error is what statisticians call a confidence intervalconfidence interval. The math behind it is . The math behind it is much like the math behind the much like the math behind the standard deviation. So you can think of the . So you can think of the margin of error at the 95 percent confidence margin of error at the 95 percent confidence interval as being equal to two standard interval as being equal to two standard deviations in your polling sample. deviations in your polling sample. Occasionally Occasionally you will see surveys with a 99 percent confidence you will see surveys with a 99 percent confidence interval, which would correspond to 3 standard interval, which would correspond to 3 standard deviations and a much larger margin of error deviations and a much larger margin of error because the more you include the fringe, the because the more you include the fringe, the more likely your results will be untrustworthy.more likely your results will be untrustworthy.

Margin of ErrorMargin of Error► LetLet’’s consider a particular week's poll as a s consider a particular week's poll as a

repeat of the previous week's. In the first repeat of the previous week's. In the first week, Candidate A received support from week, Candidate A received support from 57% of those polled. Candidate B received 57% of those polled. Candidate B received 43%, a 14 point difference. In the second 43%, a 14 point difference. In the second week, Candidate A received 53% support and week, Candidate A received 53% support and Candidate B received 47%, a 6 point Candidate B received 47%, a 6 point difference. difference. Both polls had a margin of Both polls had a margin of error of 4 pointserror of 4 points. So, is Candidate B gaining . So, is Candidate B gaining on Candidate A? on Candidate A?

► No. Statistically, there is No. Statistically, there is no changeno change from the from the previous week's poll. Politician B has made previous week's poll. Politician B has made up no measurable ground on Politician A up no measurable ground on Politician A because the movement for both politicians is because the movement for both politicians is within the within the 4 point margin of error.4 point margin of error.

Questions Journalists Should Questions Journalists Should AskAsk

► Where did the data come from?Where did the data come from? Always ask this one first. You always want Always ask this one first. You always want to know who did the research that created to know who did the research that created the data you're going to write about. Just the data you're going to write about. Just because a report comes from a group with because a report comes from a group with a vested interest in its results doesn't a vested interest in its results doesn't guarantee the report is a sham. But you guarantee the report is a sham. But you should always be extra skeptical when should always be extra skeptical when looking at research generated by people looking at research generated by people with a political agenda. At the least, they with a political agenda. At the least, they have plenty of incentive NOT to tell you have plenty of incentive NOT to tell you about data they found that contradict about data they found that contradict their organization's position.their organization's position.

QuestionsQuestions

► Have the data been peer-reviewed?Have the data been peer-reviewed? If it If it was, you know that the data you'll be was, you know that the data you'll be looking at are at least minimally reliable looking at are at least minimally reliable because other pollsters have given their because other pollsters have given their blessing on the data. If it wasnblessing on the data. If it wasn’’t, thatt, that’’s a s a sign that it might not be valid data.sign that it might not be valid data.

QuestionsQuestions

► How were the data collected?How were the data collected? This one This one is real important to ask, especially if the is real important to ask, especially if the data were not peer-reviewed. If the data data were not peer-reviewed. If the data come from a survey, for example, you come from a survey, for example, you want to know that the people who want to know that the people who responded to the survey wereresponded to the survey were selected at selected at random.random.

QuestionsQuestions

► Be skeptical when dealing with Be skeptical when dealing with comparisons.comparisons. Researchers like to do Researchers like to do something called a "regression," a process something called a "regression," a process that compares one thing to another to see if that compares one thing to another to see if they are statistically related. They will call they are statistically related. They will call such a relationship a "correlation." Always such a relationship a "correlation." Always remember that a remember that a correlation DOES NOT correlation DOES NOT mean causation. mean causation.

QuestionsQuestions

► Finally, be aware of numbers taken Finally, be aware of numbers taken out of context.out of context. Again, data that are Again, data that are "cherry picked" to look interesting might "cherry picked" to look interesting might mean something else entirely once it is mean something else entirely once it is placed in a different context.placed in a different context.

Survey Sample SizesSurvey Sample Sizes

► The population of a study is everyone who The population of a study is everyone who could have been included. For a national poll, could have been included. For a national poll, then, the population would include every then, the population would include every adult in the U.S. – a number that would be adult in the U.S. – a number that would be impractical to poll. Some researchers take a impractical to poll. Some researchers take a random sample. The larger the sample the random sample. The larger the sample the more likely it will be representative of the more likely it will be representative of the population. But a sample of 400 is usually population. But a sample of 400 is usually good enough for most surveys. Most national good enough for most surveys. Most national polls, though, survey 1,500 to 2,500 people. polls, though, survey 1,500 to 2,500 people. The margin of error in a sample = 1 The margin of error in a sample = 1 divided by the square root of the divided by the square root of the number of people in the samplenumber of people in the sample

Survey Sample SizesSurvey Sample Sizes

► The margin of error in a sample = 1 divided The margin of error in a sample = 1 divided by the square root of the number of people by the square root of the number of people in the samplein the sample

► In a survey ofIn a survey of 2,500 2,500 people, the square root people, the square root is 50. So, 1/50 =is 50. So, 1/50 = .02 .02

► In a survey ofIn a survey of 400 400 people, the square root is people, the square root is 20. So, 1/20 =20. So, 1/20 = .05 .05

► This shows the margin of error increases This shows the margin of error increases significantly as the number surveyed significantly as the number surveyed decreases.decreases.

Picking the Right Statistical Picking the Right Statistical TestTest

► There are different kinds of stats tests and There are different kinds of stats tests and the correct one will be the one that provides the correct one will be the one that provides the best answers based on the type of data the best answers based on the type of data you have collected. you have collected.

► It is best to enlist the help of a statistics pro It is best to enlist the help of a statistics pro to analyze your data.to analyze your data.

► You can also use SPSS, a computer program You can also use SPSS, a computer program that conducts the statistical computations that conducts the statistical computations for you when you enter the data. So by for you when you enter the data. So by knowing what type of test to run, you can knowing what type of test to run, you can enter the data into SPSS and run the test. enter the data into SPSS and run the test.

Use of StatisticsUse of Statistics► Statistical tests allow researchers to find out Statistical tests allow researchers to find out

whether their findings are “whether their findings are “significantsignificant” – ” – i.e. What is the probability that what we i.e. What is the probability that what we think is a relationship between two variables think is a relationship between two variables is really just a chance occurrence? The lower is really just a chance occurrence? The lower the probability of chance, the more the probability of chance, the more believable the results.believable the results.

► Researchers hypothesize. They write a Researchers hypothesize. They write a statement that they believe will be true from statement that they believe will be true from the data they collect. They base this on the data they collect. They base this on previous research and on common sense. previous research and on common sense. Then, they write the “null hypothesis.” The Then, they write the “null hypothesis.” The null is the exact opposite of the hypothesis null is the exact opposite of the hypothesis the researcher has chosen. The statistical the researcher has chosen. The statistical tests are done to test whether the null tests are done to test whether the null hypothesis is correct. If it is WRONG, then hypothesis is correct. If it is WRONG, then the researcher’s hypothesis must be correct.the researcher’s hypothesis must be correct.

Use of StatisticsUse of Statistics

► Researchers use statistics to determine the Researchers use statistics to determine the probability of the data being correct. They probability of the data being correct. They usually want a usually want a confidence level of .05confidence level of .05 and and it is written: p = .05 That means that the it is written: p = .05 That means that the data will be 95 percent accurate. (In other data will be 95 percent accurate. (In other words, words, if the data were collected 100 if the data were collected 100 more times, the results would fall more times, the results would fall within the range of the current study within the range of the current study 95 times95 times.) That means the data are pretty .) That means the data are pretty reliable. reliable.

ANOVAANOVA► Most common statistical test: Analysis of Most common statistical test: Analysis of

VarianceVariance (ANOVA) (ANOVA) is a statistical technique is a statistical technique that is used to compare the means of more that is used to compare the means of more than two groups. There are One-way ANOVA than two groups. There are One-way ANOVA (one dependent variable and one (one dependent variable and one independent variable) and Two-way ANOVA independent variable) and Two-way ANOVA ((one dependent and two independent one dependent and two independent variablesvariables). ). [Note about variables: the [Note about variables: the dependent variable (say, choice of candidate) dependent variable (say, choice of candidate) is what will be affected by the question or the is what will be affected by the question or the experiment; the independent variables are experiment; the independent variables are controlled by the researcher (say, choosing controlled by the researcher (say, choosing gender or income as factors that affect the gender or income as factors that affect the dependent variable – choice of candidate).]dependent variable – choice of candidate).]

► Use the ANOVA test only if you are Use the ANOVA test only if you are comparing data from at least 3 groupscomparing data from at least 3 groups..

T-testT-test

► Another common statistical test: t-test uses Another common statistical test: t-test uses the standard deviation of the sample to help the standard deviation of the sample to help determine interesting stuff about the larger determine interesting stuff about the larger population.population.

► Use when you have only 2 groups of dataUse when you have only 2 groups of data, , say results from men and women and you say results from men and women and you want to know whether their answers are want to know whether their answers are significantly different or just from random significantly different or just from random chance.chance.

Other types of testsOther types of tests

► There are many other types of tests for There are many other types of tests for interpreting data that require a rather high interpreting data that require a rather high level of skill in statistics. If your data are level of skill in statistics. If your data are complicated and you want to find out as complicated and you want to find out as much about the data as possible, you may much about the data as possible, you may want to consult a stats pro for help.want to consult a stats pro for help.