Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data...

12
1 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear ) This dataset contains the results of the students from the following study disciplines some years ago: Chemistry, Biology and Geography. The variables are as follows: length : length of students in cm; gender : Male (M) of Female (F); high_school : study results of the last year in high school (in percentages); bachelor : study results of the first bachelor year (in percentages); study_direction : Chemistry (Ch), Biology (B) or Geography (G); color : preferable color of the car Light (L), Dark (D) or Red (R). Check if the bachelor score is significantly higher than the high school score.

Transcript of Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data...

Page 1: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

1

Extra Exercises Basic Statistics:

Exercise 1:

Data results.txt (see Results on web-page: Goodyear)

This dataset contains the results of the students from the following study

disciplines some years ago: Chemistry, Biology and Geography. The

variables are as follows:

length : length of students in cm;

gender : Male (M) of Female (F);

high_school : study results of the last year in high school (in

percentages);

bachelor : study results of the first bachelor year (in percentages);

study_direction : Chemistry (Ch), Biology (B) or Geography (G);

color : preferable color of the car Light (L), Dark (D) or Red (R).

Check if the bachelor score is significantly higher than the high school score.

Page 2: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

2

Exercise 2:

Data chol.txt

The data contains information about the cholesterol level of 200 persons.

AGE : age of a person;

HEIGHT : height of a body;

WEIGHT : weight of a body;

CHOL : cholesterol level;

SMOKE : nosmo/pipe/sigare;

BLOOD : blood group a/ab/b/o

MORT : alive/dead

and other variables.

a. Make a box-plot of the cholesterol level of the smokers- and non-

smokers groups.

b. Check if the average cholesterol level of the smokers is significantly

different from that of the non-smokers:

H0: µsm = µnon-sm vs. H1: µsm ≠ µnon-sm

Page 3: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

3

Solution:

Exercise 1:

H0: µbach = µh_sch vs. H1: µbach > µh_sch

Step 1:

Create a new variable D=bachler-high_school

Step 2: Reformulate the hypothesis

H0: µD = 0 vs. H1: µD > 0

Step 3: Check the normality of the variable D:

On the data window:

Analyze Distribution: Select Y, Columns: D

On D menu: Continuous Fit Normal

On D menu: Normal Quantile plot

On Fitted Normal menu: Goodness of Fit

Page 4: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

4

On the histogram and Q-Q plot we do not see a departure from normality.

From the Shapiro-Wilk test we obtain p=0.7028 > 0.05. Hence, we do not

have a reason to reject normality.

Step 4:

Since the data is normal, we can apply a one sample t-test to test the

significance of the mean:

On the data window:

Analyze Matched paires

Page 5: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

5

JMP took high_school-bachler difference. Hence, we reformulate H1 as

follows:

H1: µh_sch < µbach or, equivalently, H1: µD < 0 .

Page 6: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

6

Then, the corresponding p-value will be “Prob<t” and equal 1.00. So, we will

not reject H0.

Remark: the same result could be obtained if you apply a sample test based

on the difference D. Here, we will test H0: µD = 0 vs. H1: µD > 0, as it was

formulated.

On the Distribution window:

In D menu: Test Mean

Page 7: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

7

Exercise 2:

(a. )

Step 1: Create a variable Sm_Status with 2 levels: Smoker/Non-Smoker:

Make a new column Sm_Status .

On the variable window:

Column Properties Formula

Edit Formula

On the formula window:

Functions (grouped) Conditional : Select: Match

Table Columns: Select: Smoke

Bottom: ^ (= insert)

Page 8: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

8

Make a grouped Box plot:

Analyze Fit Y by X: Y, Response: Chol; X, Factor: Sm_Status

In the Oneway menu: Display Options: Box Plots

Page 9: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

9

(b.)

Step 1:

Make a bar plot to get an idea of the sample sizes:

Graph Chart: Statistics: N(Sm_Status)

In the data we have more than 40 (=49, from the data) non-smokers and

more than 150 (= 151, from the data) smokers:

Sample sizes are larger than 30, but their difference is also large;

In this case, if the distributions are skewed, then a t-test is not suited

for a mean comparison. Hence, we will check normality.

Page 10: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

10

Step 2: H0: µsm = µnon-sm vs. H1: µsm ≠ µnon-sm

First, split the column:

Tables Split: Split Columns: Chol; Split By: Sm_Status

Then, test normality :

The group of the smokers is skewed to the right. We will try to transform

the data to improve the normality. Try a square root transformation:

Page 11: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

11

The normality is satisficatory. Hence, we will apply a t-test on the

transformed data.

Step 3: Transform Chol data to sqrt(Chol): Sqrt_Chol.

Now, we will test the following:

H0: µsqrt_sm = µsqrt_non-sm vs. H1: µsqrt_sm ≠ µsqrt_non-sm.

We can reach a conclusion only about the equality of the means of the

transformed measurements of the samples.

Step 4: Check the equality of the variances.

H0: σsqrt_sm = σsqrt_non-sm

Analyze Fit Y by X

Page 12: Extra Exercises Basic Statistics: Exercise 11 Extra Exercises Basic Statistics: Exercise 1: Data results.txt (see Results on web-page: Goodyear) This dataset contains the results of

12

In Oneway window: Unequal Variances

p=0.0126 < 0.05 we reject the equality of the variances. We will apply a

t-test for unequal variances.

Step 5: t-test for unequal variances

p=0.1337 > 0.05, we will not reject H0: µsqrt_sm = µsqrt_non-sm . The square root

transformed cholesterol measurements of the groups of smokers and non-

smokers are not significantly different.