6 - 5 - Week 6 Part 5

download 6 - 5 - Week 6 Part 5

of 5

Transcript of 6 - 5 - Week 6 Part 5

  • 8/13/2019 6 - 5 - Week 6 Part 5

    1/5

    Welcome back.This is week six, and this is module fiveof five.In this module, we're going to talk aboutstatistics.Now, obviously, you can take a whole Ph.Dor more in this topic.The point of this module, the point Iwant to get across to you, is the ideathat P-values.I'll explain what those are.That p-values are not an indicator ofquality research.That's the take-home message.P-values are not an indicator of qualityresearch.Let's explore what that means.So first, let's start with some basicjargon, some basic understanding.We can imagine a target population, agroup we wish to study.Sometimes epidemiologists call this thesource population.

    This might be a group of persons livingin America in the year 2013.It might be schoolchildren in a school atsome given time.It might be persons living in SouthAfrica.Whatever it might be, the targetpopulation is what we want to learnabout.And the target population has somecharacteristics.Those characteristics could betheoretically measured if we measured

    everybody in the target population, did acensus if you will.And that kind of measure is called aparameter.That is, it's a summary measure from thewhole target population, a parameter.But, it's very rare that we can measureeveryone in a whole target population, soinstead we take a sample.The best samples, of course, arerepresentative of the target population.Representative.The best way to do that is from random

    sampling.In any case, we draw a sample from thetarget population.And the summary measure we take from thatsample is called a statistic.So here's the big distinction.A statistic comes from a sample.And a parameter comes from the wholetarget population.The key idea of inference, more

  • 8/13/2019 6 - 5 - Week 6 Part 5

    2/5

    technically statistical inference, is anidea of taking repeated samples from thetarget population.Importantly, this is not usually done.Instead, we imagine it.We imagine taking repeated samples fromthe same target population.This is statistical theory.So, here we can see the target populationand we take sample number one.We take another independent sample fromthe target population; call it two.Yet another, three.And so on and so forth.Each of these samples Either real orhypothetical, has a summary measure.Maybe an average health measure.This is a statistic.And in statistics we have the letter y,the outcome variable, and subscript onesays from the first sample and the barabove it, that horizontal bar representsan average.Sometimes called the mean.

    So here we have four means, fouraverages, from four independent samples.The idea is to do this over and overagain.Frankly, infinitely many times.Then if we took those averages andgraphed them, on a piece of graph paper.What we'd see is the well known bellshaped curve.Sometimes called a normal distribution.At the center here, at the center wouldbe the mean or the average of all theaverages.

    There's symmetry here that happens a lot,but not always.That is it's a shape, is the same bothsides.You could flip this distribution on apiece of paper and fold it onto itself,and it would be the same.That's one way to think about symmetry.In any case, this is a very specialdistribution because it's thedistribution of the sample statistics Ofthe averages from the repeated samples.This distribution has some variability,

    some spread.That variability of the sample statisticsis called the standard error.Technically, the standard error is thestandard deviation of the samplingdistribution.That's a lot of jargon.But all you need to know is that, thereis some variation in averages if we keeptaking averages from the target

  • 8/13/2019 6 - 5 - Week 6 Part 5

    3/5

    population.One sample after another.If we compare two groups, one exposed toa virus, the other not.One gets a treatment program.The other does not.And we say that these two groups have anaverage difference.That is, we compare the average in thetreated group, the average in the controlor comparison group.And they're the same or different.That's really what we're doing inepidemiology.This group has a lot of social capital,this group doesn't.How does their health differ?This group got exposed to the virus.This group didn't.What's the prevalence of a cold?some sort of symptom?If the difference, the averagedifference, between these two groups is0, we say there's no difference.

    On the other hand, if the averagedifference is above or in fact less than0, it doesn't really matter, but if it'sdifferent from 0, then we.But the rub in statistics, the importantpoint, is that there's variation aroundeach of those summary measures, thoseaverages.So the first group could have an averageof five, the second group could have anaverage of seven, but that five and seveneach have some variability.So we use some statistical calculations

    to discern, to figure out, if 5 minus 7is actually a difference, or if it's dueto chance alone.Chance alone is the key idea.To do this, we often calculate thestatistic called the p-value.A p-value.And there's some math behind it, and youcan Google it if you want.I'm not going to go into that here.the point I want to make is that thep-value has no bearing on the quality ofthe study.

    So if we calculate a p-value associatedsome statistical test, and averagedifference.And we see that the k- -- p- -- heh --and we see that the p-value is less thansome threshold that we set.Usually, it's .05, so it's small.Then we can say, hey, we can reject thenull hypothesis.All that means is we say that there is

  • 8/13/2019 6 - 5 - Week 6 Part 5

    4/5

  • 8/13/2019 6 - 5 - Week 6 Part 5

    5/5

    So this is more jargon.Again, I want to emphasize, the point is,the p value does not indicate quality ofthe study.Importantly, for over 50 years,methodologists have clearly rejected theuse of naive, if you will, use ofp-values.P-values without appreciating the otheraspects of the study.But the practice goes on today, which iswhy I'm spending this module to remindyou, or perhaps, first time teach you,but if you see a study that says thisdifference has a small p-value, that doesnot mean it's true.It also doesn't mean it's a quality orunbiased estimate.And we talked about that earlier in thislecture.Some cautions, some take home messages.P-values don't imply the strength of arelationship.It could be 5 minus 7, or 5 minus 700.

    There's just some difference.P-value does not imply or connote how bigthat difference is.We don't want to compare p-values.From tests across different studies.A P-value is within a study.There's some exceptions, but for nowwe'll say, don't go across studies.Furthermore, we don't want to compareP-values within the same study, this isthe effect for race, this is the effectfor SCS, this is the effect for gender,that is a very complicated analysis.

    And so it takes some caution.If you're an expert, of course.If you're new to this, I urge caution.The important point I'm going to comeback to again is that p-values Don't tellus that the effect is identified.This idea we talked about in the lastlecture.Nor does it say it is not confounded, andin epidemiology that is the key point.So this means that identification,confounding bias, potential alternativeexplanations of these things, far more

    than statistics with a p-value are themost important parts of judging a socialepidemiologic study.