Statistics for Librarians, Session 3: Inferential statistics

INFERENTIAL STATISTICS

GOALS OF SERIES

Comfort

Fears

SESSION OBJECTIVES

Purpose of Inferential Statistics

Probability

Elements of Significance Testing

Three key tests• T-test• Chi-squared• Correlation (or binomial)

Effect Measures

PURPOSE OF INFERENTIAL STATISTICS

• Infer results•Draw conclusions• Increase the Signal-Noise ratio

Signal

Noise

INFERENTIAL STATISTICS

Tests of hypotheses• Expectations• AssociationsAccounts for uncertainty• Random error• Confidence interval

HYPOTHESES

Your Hypothesi

s(H1)

Null Hypothesis(H0)

NOT TO PROVE, BUT TO FALSIFY

H1Difference

H0No Difference

NOT TO PROVE, BUT TO FALSIFY

H1>=10%

Increase

H0<10% Increase

REVIEW OF DESCRIPTIVE STATISTICS

LEVELS OF MEASUREMENT (NOIR)

Nominal• Counts by

category• Binary (Yes/No)• No meaning

between the categories (Blue is not better than Red)

Ordinal• Ranks• Scales• Space between

ranks is subjective

Interval• Integers• Zero is just

another value – doesn’t mean “absence of”

• Space between values is equal and objective, but discrete

Ratio• Interval data with

a baseline• Zero (0) means

“absence of” • Space between is

continuous• Includes simple

counts

Central Tendency

ErrorSpread

DESCRIPTIVE STATISTICAL ANALYSIS

CENTRAL TENDENCY BY LEVELS OF MEASUREMENT

Interval or Ratio

Mean

Median

Nominal or Rank

Mode

Median (rank only)

SPREAD

Interval & Ratio

• Range• Quantiles• Standard

Deviation

Nominal & Rank

• Distribution Tables

• Bar Graphs

How variable is the data?

RANGE & QUARTILES

FORMULAS

Mean Standard Deviation

PROBABILITYWHAT’S PROBABIL ITY GOT TO DO WITH

STATISTICS?

WHAT IS PROBABILITY?

Chance of something happening (x)

Expressed as P(x)=y

Between 0 and 1

Based on distribution of events

STEM-AND-LEAF

Stem

Leaf

0 01112222222222222233333344445556666677788899

1 0000000011122223333356778899

2 00122234444799

3 0245

Groups Last digit

Years at UNT

0 5 131 6 131 6 131 6 132 6 152 6 162 7 172 7 172 7 182 8 182 8 19

3 11 294 11 294 12 304 12 324 12 345 12 355 13

Stem

Leaf Count

0 1122223334445555666666677777899

31

1 000011122222222333346677889 27

2 0122234468 10

3 1112355888 11

4 12 2Range Count

0-9 31

10-19 27

20-29 10

30-39 11

40-49 2

0-9 10-19 20-29 30-39 40-490

10

20

30

40

Histogram of Years at UNT

NORMAL DISTRIBUTIONS

PROBABILITY DISTRIBUTION

Set the mean to 0Standard Deviations above

and below the mean

DEMONSTRATION OF DISTRIBUTIONS

Distribution of the PopulationThe “Truth”

N is the # of samples

n is the number of items in each

sample

Watch the cumulative mean & medians slowly merge to the population

http://www.stat.tamu.edu/~west/ph/sampledist.html

ACTIVITIES

CASE STUDY

• Background: Info-Lit course is meeting resistance from skeptical faculty.• Research Questions:• Does the IL course improve grades on final

papers?• Can the IL course improve passing rates for

the course?• Do students in different majors respond

differently to the IL training?• Is the final score related to the number of

credit hours enrolled for each student?

METHODOLOGY

Selection

• Two sections of same course with different instructors.

• Random Assignment

Outcome

• Blinded scoring by 2 TAs• Scores range from 1-100• Passing grade: 70

ACTIVITIES

Table 1• Distributio

n of scores

Table 2• Distributio

n of passing rates by major

Table 3• Correlation

of scores with credit hours

DESCRIPTIVE STATISTICSOF CASE STUDY

DISTRIBUTION OF SCORES

Table 1• Distribution

of scores

Table 2• Distribution

of passing rates by broad field of major

Table 3• Correlation

of scores & credit hours

SIGNIFICANCE TESTING

SIGNIFICANCE TESTING

• Groups against each other• A group against the population or

standard

Comparing significance of

differences

• Risk of being wrong• Alpha (α)• Set in advance

What is “significant”?

• The value that the statistic must meet or exceed to be statistically significant.

• Based on statistic and αCritical Value

STEPS IN SIGNIFICANCE TESTING

Which Test?

Calculate Statistic

Critical Value of Statistic?

Probability (p-

value)

KEY ELEMENTS OF SIGNIFICANCE TESTING

Null Hypothesis

Measure of Central Tendency

Standard deviations

Risk of being wrong (alpha)• Usually .05 or .025 or .01 or .001

Degrees of freedom (df)

DEGREES OF FREEDOM

Number of values in the final calculation of a statistic that are free to vary.

DEGREES OF FREEDOM EXPLAINED

• All these have a mean of 5:• 5, 5, 5• 2, 8, 5• 3, 2, 10• 7, 4, & ?

• If 2 values are known and the mean is known, then the 3rd value is also known.• Only 2 of the 3

values are free to vary.

CALCULATING DEGREES OF FREEDOM (DF)

For a single sample:• Degrees of freedom (df) for t-test = n-1

For more than one group: • df=∑(n-1) for all groups (k)• OR, ∑ n-k

For comparing proportions in categories (k):• df= ∑k-1 (# of categories minus 1)

COMPARING VALUEST-TEST

T-TEST

Used with interval or ratio data

Based on normal distribution

Four Decisions• Paired or un-paired samples?• Equal or unequal variances (standard deviations)?• Risk? • One- or two-tail?

• Direction of expected difference• Best to bet on difference in both directions (2-tail)

One-Tail

Two-Tail

T-TEST FORMULA FOR UNPAIRED SAMPLES

𝑡=𝑥1−𝑥2𝑆𝑥1−𝑥2

Signal

Noise

Difference Between Group Means 𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝐺𝑟𝑜𝑢𝑝𝑠

ELEMENTS OF T-TEST USING EXCEL DATA ANALYSIS TOOLPAK

• UnpairedPaired or Unpaired samples?

• Equal*Equal or Unequal Variances?

• Data for intervention group• Data for control groupData

• 0Hypothesized difference

• 0.025 (for a 2-tail test)Alpha

T-TEST IN EXCEL

READING T-TEST RESULTS

∑(n-1) = (51-1)+(50-1) =50+49=99

<=0.025?

IS THE DIFFERENCE SIGNIFICANT?

p=0.0005

TESTING DISTRIBUTION OF NOMINAL DATA

PEARSON’S CHI-SQUARED (Χ2)GOODNESS OF FIT TEST

Does an observed frequency distribution differ from an expected distribution• Observed is the sample or the intervention.• Expected is the population or the control or a

theoretical distribution.• Will depend on your Null Hypothesis

Nominal or categorical data

• Counts by category

EXPECTED RATIOS FOR CASE STUDY

Research Question:• Do students in different majors respond differently to the IL training?

Null Hypothesis• The ratio of students who passed will be the same for all majors.

WHEN TO USE PEARSON’S CHI-SQUARED GOODNESS OF FIT TEST

Nominal Data

Sample Size• Not too large:

• Sample is at most 1/10th of population• Not too small:

• At least five in each of the categories for the expected group.

OBSERVED PASSING RATES BY MAJOR

Major PassedNot Passed

Grand Total

Arts 6 7 13

Humanities 8 5 13Social Sciences 17 10 27

STEM 20 5 25

Undeclared 16 7 23

Total 67 34 101

EXPECTED RATIOS OF PASSING RATES BY MAJOR

• H0: Rates of passing will be the same for all majors.• Expected rates: 70% of class passes.• Expected ratios: 70% of each major passes.Major Passed Not Passed Grand Total

Arts 11.2 (16*.7) 4.8 16Humanities 11.2 (16*.7) 4.8 16Social Sciences 18.2 (26*.7) 7.8 26STEM 16.1 (23*.7) 6.9 23Undeclared 14 (20*.7) 6 20

CHI-SQUARED GOF TEST FORMULA

• Critical value of Chi-squared depends on degrees of freedom.•Degrees of freedom• Based on the number of categories or

table cells (k)• df=k-1

CHI-SQUARED IN EXCEL

What is Null Hypothesis?

There is no difference between the majors regarding passing rates.

What is your alpha (risk)?

0.05

Data in a summary tables?

Actual Ratios

Expected Ratios

Excel function:

=CHISQ.TEST(actual range1,expected range2)

Provides a p-value

0.0000172

Is p-value <= alpha?

Yes

CORRELATION OF SCORE & SEMESTER HOURS

ENROLLED

STATISTICAL CORRELATION

Quantitative value of relationship of 2 variables

• -1 represents a perfect indirect correlation• 0 represents no correlation• +1 represents a perfect direct correlation

Expressed in range of -1 to +1

• How much two variables change together

Based on co-variance

PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT

Most commonly used statistic

Normally distributed interval or ratio data only

Labeled as r

Multiplication = Interaction

Signal

Noise

𝑟 𝑥𝑦=∑ (𝑥−𝑥 ) ( 𝑦− 𝑦 )

(𝑛−1 )𝑠𝑥 𝑠𝑦

CORRELATION IN EXCEL

• No correlationNull Hypothesis?

• =PEARSON(range1,range2)Coefficient function (r):

Does NOT have a single function to test for significance

Calculate Probability:

n # in sample 101

df # in sample - 2 99

alpha 0.025 for 2-Tail Test 0.025

r =PEARSON(range1,range2) 0.362287

t =r*SQRT(alpha)/SQRT(1-r^2) 3.867434

p =T.DIST.2T(t,df) 0.000197

CORRELATIONS FOR ORDINAL DATA

Spearman’s ϱ (rho)• Use if there are limited ties in rank.

Kendall’s τ (tau)• Use if you have a number of ties.

SELECTING THE TESTS

KNOW THE TESTS

Assumptions

Limitations

Appropriate data type

What the test tests

FACTORS ASSOCIATED WITH CHOICE OF STATISTICAL METHOD

Level of Measurement

What is being compared

Independence of units

Underlying variance in the

populationDistribution Sample size

Number of comparison

groups

USE A FLOW CHART

GOING BEYOND THE P-VALUEEFFECT SIZES

AND THE P-VALUE SAYS…

Much about the

distributions

More about the H0 than

H1

Little about size of

differences

MORE USEFUL STATISTICS

Effect Sizes• Tell the real story

Confidence Intervals• State your certainty

EFFECT SIZES OF QUANTITATIVE DATA

Differences from the mean

• Standardized• weighted against the

pooled (average) standard deviation

• Cohen’s d

Correlations

• Cohen’s guidelines for Pearson’s r

• r = 0.362

Effect Size

r>

Small .10

Medium .30

Large .50𝑑=

𝑥1−𝑥2𝑠𝑥1 , 𝑥2

EFFECT SIZES OF QUALITATIVE DATA

Based on Contingency

table

• Uses probabilitiesRelative risk

• RR = 1.608• The passing rate for the intervention

group was 1.6 times the passing rate for control group.

RR of Case Study

Pass No Pass Total

Intervention a (41) b (24) a+b (65)

Control c (26) d (10) c+d (36)

Totals a+c (67) b+d (34) a+b+c+d (101)

CONFIDENCE INTERVALS

Point estimates

Intervals

Based on

Expressed as:

• Single value• Mean

• Degree of uncertainty• Range of certainty around the

point estimate

• Point estimate (e.g. mean)• Confidence level (usually .95)• Standard deviation

• The mean score of the students who had the IL training was 79.5 with a 95% CI of 76.4 and 82.5.

CASE STUDY CONCLUSIONS

• Research Questions:• Could the IL course improve grades on final

papers?• Could the IL course improve passing rates for

the course?• Do students in different majors respond

differently to the IL training?• Is the final score related to the number of

credit hours enrolled for each student?

• Control for external variables

STATISTICAL ANALYSIS

Signal

Noise

Statistics for Librarians, Session 3: Inferential statistics

Data & Analytics

Transcript of Statistics for Librarians, Session 3: Inferential statistics