Study Guides for Exams...42. Use the following data on ranked scores on a keyboarding skills test to...

1

Study Guide for Exams

Concepts:

A. Course objectives

B. The process and philosophy of science (observations, questions, hypotheses, theories, prediction, if-

then, correlational and experimental tests, data, facts, scientific “proof”, relationship of ideas and data -

ID, limitations of science)

C. Statistical basics (variables [measured, derived, dependent, independent, response, predictor], data,

case, observation)

data collection (population, sample, error, random, independence, sample size)

measurement scales and kinds of variables (nominal/categorical, ranked/ordinal, interval/ratio,

continuous, discrete)

practical: identifying variables and measurement scales

D. Frequency distributions (histogram)

E. Description of data

central tendency (mode, median, mean, weighted mean)

dispersion (maximum, minimum, range, interquartile range, sum-of-squares, standard deviation,

variance, coefficient of variation)

parameters and statistics

reporting sample means (necessity of measure of dispersion, error bars)

calculating descriptive statistics with a calculator and with SYSTAT (raw data file, frequency data

file)

F. Goodness-of-fit

G. Probability distributions

Discrete (binomial, Poisson)

a. binomial (mutually exclusive [either/or] categories; defined by p, n

b. Poisson (rare and random events; defined by mean)

c. calculating terms of binomial and Poisson distributions

d. comparison of observed and expected distributions (influence of sample size)

H. SYSTAT

File Menu (New, Open, Save, Save As, Print, Exit)

Edit Menu (Undo, Cut, Copy, Paste, Copy Graph, Delete, Options

Data Menu (Variable Properties, Transform, By Groups, Select Cases, Case Weighting By

Frequency)

Utilities Menu (Probability Calculator)

Graph Menu (Bar, Dot, Histogram, Box, Scatterplot).

Analyze Menu (One-Way Frequency Tables, Basic Statistics, Tables)

_______________________________________

Test-Taking Strategies For Multiple-Choice Exams: When taking multiple-choice exams, one often hears

the claim, “It’s better to stay with your first choice; don’t go back and change your answers.” Is there any

evidence for this claim? Surprisingly, psychological research has shown that, overall, about 50% of changes

go from wrong to right; 25% go from right to wrong; and 25% go from wrong to wrong. Women change their

answers more often than men, and women are more likely than men to go from right to wrong. Thus, there is

no evidence to support the claim but no study can actually prove the claim wrong either because we don’t

know how those who stay with their first answers fare.

The following questions have been taken from old exams in Biostats classes over the

last 10-15 years. Because Biostats changes to some degree each semester, some

questions may address items not covered during the latest semester. In addition, new

material covered in the latest Biostats class may not have representative questions

.below.

2

Exam 1 Questions 1. The mode, median, and mean should be very nearly equivalent in this type of frequency distribution.

(a) binomial; (b) skewed; (c) normal; (d) Poisson; (e) probability

2. A frequency distribution is also known as a _____. (a) histogram; (b) central tendency; (c) table; (d)

plot; (e) parametric distribution

3. The shape of a Poisson distribution is determined by the _____. (a) mean, standard deviation; (b) p, n;

(c) geometric mean, SD, n; (d) mode, coefficient of variation; (e) mean

4. Which is the quickest (i.e., fewest steps) SYSTAT method of determining the sample size of several

categories? (a) scatterplot; (b) transform; (c) select cases; (d) Tables; (e) K-S test

5. This is the sum-of-squares divided by the sample size. (a) mode; (b) mean; (c) range; (d) median; (e)

variance

6. Which is not a “statistic” (term used in the narrow, technical sense)? (a)x; (b) s; (c) SD; (d) ; (e)

sample variance

7. This is the probability of obtaining a “4” with one role of a single die. (a) 1 in 6; (b) 1 in 4; (c) 2 in 3;

(d) 1 in 8; (e) 1 in 2

8. This is an important measure of data dispersion. (a) mean; (b) variance; (c) mode; (d) median; (e)

Goodness-of-Fit

9. On a SYSTAT dot graph, these graphics portray variation around the mean. (a) parameters; (b)

interquartile plots; (c) z-scores; (d) descriptive statistics; (e) error bars

10. The position of a team in major league baseball standings is measured on this scale. (a) categorical; (b)

ordinal; (c) ratio; (d) interval; (e) continuous

11. This is the square root of the sum-of-squares divided by the sample size. (a) standard deviation; (b)

mean; (c) range; (d) median; (e) variance

12. This standardized expression permits one to directly compare the relative amount of variation

associated with two or more means of one variable. (a) average deviation; (b) variance; (c) median; (d)

coefficient of variation; (e) z-score

13. This is a measure of dispersion. (a) mean; (b) variance; (c) mode; (d) median; (e) regression plot

14. This is a measure of central tendency. (a) variance; (b) mode; (c) standard deviation; (d) range; (e) z-

score

15. In an interval scale of measurement, values are neither quantitative nor ranked, and there is no

mathematical or value relationship among them. (a) true; (b) false

16. The only time scientists will use a theory is if they know for sure that the theory is correct. (a) true; (b)

false

17. The basic reason scientific knowledge has advanced so remarkably through the years is because many

dedicated scientists have proved thousands of hypotheses and theories (a) true; (b) false

18. A “statistic” is _____. (a) a numerical property of a sample; (b) a numerical property of a population;

(c) a normal distribution; (d) a single case; (e) a single observation

19. Which is not an “essential descriptive statistic” as used in class? (a) weighted mode; (b) sample size;

(c) average; (d) mean; (e) standard deviation

20. The temperature of a human body in Celsius should be measured on a ratio scale. (a) true; (b) false

21. The various species contained within a particular genus of birds should be measured on a ranked scale.

(a) true; (b) false

22. Assuming one had the proper instrument, which could be measured as a continuous variable? (a) size

ranking; (b) number of RBCs; (c) frequency of predation events; (d) species; (e) color

23. This is the probability of obtaining two heads with one flip of two coins. (a) 0.50; (b) 0.25; (c) 0.10;

(d) 1.0; (e) 0.75

24. The term “random” refers to a condition where the value of one case does not affect the value of other

cases. (a) true; (b) false

25. These are the measured values of variables for individual cases. (a) tables; (b) observations; (c) data;

(d) error; (e) expected values

26. This is a very important measure of central tendency especially for continuous and many discrete

variables. (a) variance; (b) standard deviation; (c) range; (d) mean; (e) sum-of-squares

3

27. Scientists will only use a theory if they know for sure that the theory is absolutely true. (a) true; (b)

false

28. This term represents all of the individuals of a specified part of the statistical universe. (a) sample; (b)

range; (c) population; (d) mode; (e) frequency distribution

29. This is the difference between the maximum and minimum in a data set. (a) median; (b) variance; (c)

range; (d) geometric mean; (e) coefficient of variation

30. An observed frequency distribution of a given type will more closely conform to a theoretical

frequency distribution of the same type under this condition. (a) decreased N; (b) increased N; (c)

decreased range; (d) increased mean; (e) decreased mean

31. In this type of frequency distribution, the value of the mean is generally very low. (a) Poisson; (b)

discrete; (c) bimodal; (d) normal; (e) skewed

32. Statistical “error” often refers to the level of confidence that one has regarding how well the statistics

of _____ estimate the statistics of _____. (a) samples, populations; (b) populations, samples; (c)

parametrics, nonparametrics; (d) precision, accuracy; (e) accuracy, precision.

33. On a SYSTAT data sheet, individual variables are usually found in ______ whereas individual cases

are found in _____. (a) rows, columns; (b) rows, rows; (c) columns, columns; (d) columns, rows

34. An error bar usually illustrates _____. (a) a measure of central tendency; (b) a measure of dispersion;

(c) sample size; (d) a measure of ; (e) the coefficient of variation

35. In this scale of measurement, values are neither quantitative nor ranked, and there is no mathematical

or value relationship among them. (a) ordinal; (b) interval; (c) continuous; (d) categorical; (e) ratio

36. The volume of blood (ml) is measured on this scale. (a) categorical; (b) ordinal; (c) ratio; (d) interval;

(e) continuous

37. The species of tree is measured on this scale. (a) categorical; (b) ordinal; (c) ratio; (d) interval; (e)

continuous

38. The age of a viral particle is measured on this scale. (a) categorical; (b) ordinal; (c) ratio; (d) interval;

(e) continuous

Identify the variables and respective scales of measurement

39. It is thought that endotherms from northern areas have shorter appendages than endotherms from

southern areas. Test this hypothesis using the following data on wing lengths (mm) of house sparrows.

northern: 120, 113, 125, 118 variables_________________________________

southern: 116, 117, 121, 114 respective scales___________________________

40. Use the following data on number of ladybird beetles collected from sunflowers in different seasons to

test the hypothesis that sex ratio of beetles is unrelated to season.

Spring Sum Fall

male 163 135 71 variables___________________________________

female 86 77 40 respective scales_____________________________

41. A mammalogist was interested in possible relationships of prey size and predator size. Use the data

below on otters and their prey to determine if there a relationship between predator and prey masses

(g).

otter wgt. - 1500, 500, 750, 1000; variables____________________________

prey wgt. - 128, 190, 75, 125; respective scales______________________

42. Use the following data on ranked scores on a keyboarding skills test to test the hypothesis that high

school training improves keyboarding skills of college students.

with HS training: 44, 48, 36, 32, 51; variables____________________________

without training: 32, 40, 44, 44, 34; respective scales______________________

43. In a study of snake hibernation, fifteen pythons of similar size and age were randomly assigned to three

groups. One group was treated with drug A, one group with drug B, and the third group was not

treated. Their systolic blood pressure (mmHg) was measured 24 hours after administration of the

treatments. Does either drug affect blood pressure?

control: 130, 135, 132, 128, 130

drug A: 118, 120, 125, 119, 121 variables_______________________________

drug B: 105, 110, 98, 106, 105 respective scales_________________________

44. Use the following data on mean adult body weight (mg) and larval density (no./mm3) of fruit flies to

determine if there is a functional relationship between adult body mass and the density at which it was

reared.

4

Density 1.000 3.000 5.000 6.000 variables _____________________________

Weight 1.356 1.356 1.284 1.252 respective scales________________________

45. The following data are frequency of individuals with different hair colors according to sex. According

to these data, is human hair color dependent on sex? (Protocol link)

sex black brown blond red

male 32 43 16 9 variables_______________________________

female 55 65 64 16 respective scales_________________________

46. Use the following data on human blood-clotting times (min.) of individuals given one of two different

drugs to test the hypothesis that drug B induces clotting at a faster rate.

Drug B: 8.8, 8.4, 7.9, 8.7; variables__________________________________

Drug G: 9.9, 9.0, 11.1, 9.6; respective scales____________________________

47. Calculate the essential descriptive statistics of systolic blood pressures (mgHg) in the following sample

of HU students: 144, 136, 163, 117, 133, 141, 152, 140, 140, 138, 127, 120, 161, 124, 137.

mean = _______ standard deviation = _______ sample size = _______

48. AmIV is a disease caused by a certain pathogenic amoeba that invades human red blood cells. The rate

and pattern of infection is known for some populations but unknown for most. A medical

parasitologist had some knowledge that infection in the Mexican population was uncommon, but he

did not know whether infection was random with respect to which individuals were infected. IF the

infection rate was in fact random, THEN the observed distribution of infection rates should closely

correspond to a theoretical distribution that describes the occurrence of a rare and random event.

Complete the following table showing the observed number and the expected number (to 3 decimal

places) of humans infected with amoebas in a sample collected in Mexico City. Based on your

analysis of these data, answer the questions and support your answers.

No. amoebas Observed Expected

per 1 x 106 no. infected no. infected

cells humans humans

0 317 ______

1 41 ______

2 1 ______

3 12 ______

4 7 ______

5 0

Total _____ Total ______

Based on your analysis of the data, was the infection rate rare? Why?

Based on your analysis of the data, was the infection rate random? Why?

49. Calculate the probability of having a total of four boys and one girl in a family of five children.

50. The following sample contains data for the 5th digit claw length (mm) for two species of Asiatic bats.

Road-winged bat - 0.45, 0.39, 0.39, 0.42, 0.23

Flap-winged bat - 0.69, 0.99, 0.85, 0.98, 0.87, 0.95, 0.92, 0.81

Calculate the mean, standard deviation, and coefficient of variation for claw length in each bat

species.

Species Mean SD CV___

R-W bat _____ _______ ______

F-W bat _____ _______ ______

In which species is claw length relatively more variable? _______ Support your answer ___-

_______

51. A particularly severe strain of brucelosus was detected in low frequencies in the early 21st century

Rwandan cattle herds. A government veterinarian needed to know whether the disease was being

spread by a correctable human ranching practice or if the disease was just occurring randomly among

herds. Data collected on frequency of infection in herds from throughout the country are shown below.

Determine if there is evidence that the disease was being spread by anything other than random

chance.

http://www.harding.edu/plummer/biostats/protocolanswers/independence2.htm

5

Choose the appropriate probability distribution formula and enter the value(s) of the variables

necessary for its solution. Show formula here ________________________________

Complete the following table showing the observed number and the expected number of cattle

infected with brucelosus in the following sample.

No. infected Observed Expected

cattle per no. no.

herd herds herds__

0 523 ______

1 71 ______

2 22 ______

3 11 ______

4 5 ______

Total _____ Total ______

Based on your analysis of the data, do you think there is evidence that the disease was being

spread by poor ranching practices? Support your answer.

52. Calculate the essential descriptive statistics of systolic blood pressures (mgHg) in the following sample

of Japanese fisherman: 134, 146, 143, 117, 123, 124, 142, 147, 130, 138, 123, 120, 131, 134

mean = _______ SD = _______ n = _______

53. Calculate the essential descriptive statistics of the number of flourescent worms per petri dish in the

following sample:

no. of no. of no. of no. of mean = _______

worms dishes worms dishes

0 3 40 92 SD = _______

10 41 50 114

20 53 60 292 n = _______

30 52 70 7

54. Calculate the probability of having a total of four boys and four girls in a litter of eight gerbils. Choose

the appropriate probability distribution formula and enter the value(s) of the variables necessary for

solution.

Show formula here ____________________________________

Answer = ________________

55. The following data are maximum sprint speeds (m/sec) of gravid females measured from two species

of captive lizards.

Coal skink - 0.45, 0.39, 0.59, 0.55, 0.23, 0.51, 0.28, 0.47, 0.34, 0.50, 0.65

Broad headed skink - 0.69, 0.99, 0.85, 0.98, 0.87, 0.95, 0.92, 0.81, 0.79, 1.09, 0.65,

a. Calculate the mean, SD, and CV for sprint speed in each laboratory sample.

Laboratory Standard Coefficient

sample Mean deviation of variation

Coal skink _____ _______ ______

B-H skink _____ _______ ______

b. In which species is sprint speed relatively more variable? Support your answer.

Use the file GINMOVE.SYD which contains data on a population of spiny softshell turtles (Apalone

spinifera) inhabiting Gin Creek in Searcy, AR. Data were collected in 1995. Variables are: no = turtle

number, sex$ = sex of turtle (M, F), date = date (year [yy], month [mm], day [dd]), time = 24 hr time,

tb = body temperature (°C), tamb = ambient temperature (°C), tair = air temperature (°C), twat = water

temperature (°C), wlev = water level, clar$ = water clarity (clear, turbid, muddy), sky = sky condition,

wind = wind speed, hab$ = habitat (P, pools; R, riffles; Q, backwater), beh = behavior (buried,

basking, moving ), loc = location along the stream (m), loc2 = location 24 hr later (m), dist = distance

moved the previous 24 hours (m). N = 973

56. Calculate the descriptive statistics of body temperature for all female cases in May.

http://www.harding.edu/plummer/biostats/data/Ginmove.syd

6

mean = _______ SD = _______ n = _______

57. In how many cases were turtles found in pool habitat? _______

Prepare graphs for the following (hognose.syz)

58. Daily movements associated with courtship range between 50-100 m. Construct a single graph

that illustrates the descriptive statistics of distance moved the previous 24 hr separately for males

and females when they are courting.

59. Construct a graph that illustrates the frequency of male residents located in May over all years by

date.

60. Graph the cases of Tb against cases of Tair when Tair exceeds 10 C

61. Construct a single graph that illustrates separately for active and inactive snakes the descriptive

statistics of body temperature when air temperature is below 30 C.

7

Exam 2 Updated 9 February 2016

In addition to being comprehensive, the

following new concepts are covered:

importance of normal distribution in

statistics

properties of normal distribution (defined by

mean and standard deviation,)

areas of normal curve

standard normal distribution (z-scores)

testing for normality (Probability plot and

Kolmogorov-Smirnoff test); skewness

test statistic

parametric and non-parametric tests

data transformation (logarithmic, square

root, arcsine)

statistical inference

major categories of statistical inference

sampling distribution

central limit theorem

Student’s t-distribution

standard error of mean

95% confidence limits

reporting sample means

graphical error bars

hypothesis testing

research hypothesis

null hypothesis

test statistic

critical value

alpha level

one and two-tailed tests

type I & II errors

relationship of type I & II errors

power of a test

significance level

statistical significance

parametric and nonparametric tests

assumptions of a test

Bartlett’s test

Levene’s test

robust test

testing for differences

independent samples t-test

paired samples t-test

repeated measures tests

Mann-Whitney test

Wilcoxon test

graphical analysis of differences between

means

Exam 2 Questions

1. In a standard normal distribution, a z-score of _____ on each side of the mean encloses 95% of the

cases. (a) 0.68; (b) 1.96; (c) 1.0; (d) 0.05; (e) 0.0

2. This is the most common data transformation used in biology. (a) square root; (b) Lilliefors; (c)

arcsine; (d) logarithm; (e) interquartile

3. In SYSTAT, this is the preferred quantitative method for students to determine if data are

normally distributed. (a) histogram; (b) Tables; (c) dot graph; (d) probability plot; (e)

Komolgorov-Smirnov test

Test-Taking Strategies For Multiple-Choice Exams: When taking multiple-choice exams, one often hears

the claim, “It’s better to stay with your first choice; don’t go back and change your answers.” Is there any

evidence for this claim? Surprisingly, psychological research has shown that, overall, about 50% of changes

go from wrong to right; 25% go from right to wrong; and 25% go from wrong to wrong. Women change their

answers more often than men, and women are more likely than men to go from right to wrong. Thus, there is

no evidence to support the claim but no study can actually prove the claim wrong either because we don’t

know how those who stay with their first answers fare.

The following questions have been taken from old exams in Biostats classes over the

last 10-15 years. Because Biostats changes to some degree each year, some questions

may address items not covered during the latest semester. In addition, new material

covered in the latest Biostats class may not have representative questions below.

8

4. Statistical tests of this type are very powerful but have relatively rigid assumptions that must be

met. (a) parametric; (b) nonparametric; (c) normality; (d) distribution-free; (e) interval plot

5. The specific shape of a normal distribution is determined by these. (a) mean, sample size; (b)

mean, median; (c) mean, standard deviation; (d) mean; (e) variance, sample size

6. Data that are influenced by many small and unrelated random effects are frequently normally

distributed. As a consequence, normally distributed data are widespread and common in nature.

(a) true; (b false)

Use the attached SND Table (1-tailed, showing proportion included) to determine the percent of the area of

the standard normal distribution that is either included or excluded by each of the following z-scores.

Each question asks for the percent in either one or two tails of the distribution.

7. 1.00 (one-tail, included) (a) 99.0; (b) 95.0; (c) 68.0; (d) 47.5; (e) 34.0; (ab) 5.0; (ac) 2.5; (ad) 1.0

8. 1.96 (two-tails, included) (a) 99.0; (b) 95.0; (c) 68.0; (d) 47.5; (e) 34.0; (ab) 5.0; (ac) 2.5; (ad) 1.0

9. 1.96 (one-tail, excluded) (a) 99.0; (b) 95.0; (c) 68.0; (d) 47.5; (e) 34.0; (ab) 5.0; (ac) 2.5; (ad) 1.0

10. 2.58 (two-tails, excluded) (a) 99.0; (b) 95.0; (c) 68.0; (d) 47.5; (e) 34.0; (ab) 5.0; (ac) 2.5; (ad) 1.0

11. Statistical tests of this type are not very powerful but have relatively few assumptions. (a)

parametric; (b) nonparametric; (c) tests based on the normal distribution; (d) cross-tabulation; (e)

sum-of-squares

12. When using SYSTAT to quantitatively test whether data in a frequency distribution table are

normally distributed, this function from the data menu must be enabled. (a) select cases; (b)

transform; (c) case weighting by frequency; (d) by groups; (e) variable properties

13. Use the attached Table (1-tailed, showing proportion included) to determine the percent (to the

nearest 0.1) of the area of the standard normal distribution which is enclosed by each of the

following z-scores. Each question asks for the percent in either one or two tails of the distribution.

a. 0.34 (one-tail) = __________% b. 1.96 (two-tails) = __________%

14. Use the attached Table (1-tailed, showing proportion included) to determine the percent (to the

nearest 0.1) of the area of the standard normal distribution which is excluded by each of the

following z-scores. Each question asks for the percent in either one or two tails of the distribution.

a. 1.96 (one-tail) = __________% b. 2.58 (two-tails) = __________%

Use HOGNOSE.SYD - data on radiotracked hognose snakes (Heterodon platirhinos) recorded near

Riverside Park north of Searcy, AR. Variables are: yr = year, date (month, day), no = snake ID, sex$ = sex,

rc = recapture status, xloc = x-axis location, yloc = y-axis location, hab1$ = habitat1, hab2$ = habitat2,

grdcov$ = groundcover, act$ = activity (inactive or active), beh$ = behavior, dist = distance moved in the

last 24 hr (m), tb = body temperature, tair = air temperature, status$ = residency status (resident or

nonresident). N = 783

15. For resident snakes, quantitatively test variable DIST to determine if it is normally distributed.

Is the distribution normal? _________ sample size = ______

Support your answer ________________________________________________

16. Which is false regarding data that are suitable for parametric tests? (a) sampled data are

independent of each other; (b) data are randomly sampled; (c) sampled data are normally

distributed; (d) sampled data are measured on an ordinal scale; (e) sampled data are measurements

of continuous

17. This is the mathematical relationship between the standard deviation and the standard error of the

mean.

18. This frequency distribution could be described as a normal distribution whose shape varies with

sample size.

19. H0: σa2 = σb

2 is the proper null hypothesis for this statistical test.

20. This is the result when a true null hypothesis is rejected.

21. This term describes a general property of statistical tests in which the probability of rejecting a

false null hypothesis is relatively high.

22. This process is an example of statistical inference.

23. This is the value of the alpha level that most scientists use when testing a null hypothesis.

http://www.harding.edu/plummer/biostats/data/Hognose.syd

9

24. This is an example of a population parameter.

25. This is a general hypothesis of no difference or no relationship.

26. The goal of this statistical test is to detect differences among variances of normally distributed

data sets.

27. The goal of this statistical test is to detect differences between the means of two separate groups.

Data are not normally distributed nor are the group variances equal.

28. This term describes a general property of statistical tests that are relatively insensitive to

deviations from their assumptions.

29. Statistical tests of this type are very powerful but have relatively rigid assumptions.

30. H0: μa ≤ μb is a proper null hypothesis for this non-parametric statistical test.

31. How many asterisks indicate a significance level of P<0.01?

32. This calculated value is used in conjunction with a statistical table to determine the probability of a

null hypothesis being true.

33. If the research hypothesis is A>B, the null hypothesis is ____.

34. The shape of this theoretical probability distribution is determined by the mean and standard

deviation.

35. The cases of this distribution consist of individual sample means taken from a population.

36. How is the chance of making a type 2 error affected when alpha is decreased?

37. The ability to consistently apply this attribute of a good scientific research hypothesis contributes

to differentiating everyday scientists from great scientists.

38. The goal of this statistical test is to determine if the means of two separate groups are different.

Data are not normally distributed but the group variances are equal.

39. If the probability of rejecting a false null hypothesis is relatively high for a given test, we would

say the test is ____.

40. This is a general hypothesis of no difference between groups.

41. In the early 1900s, this biologist contributed greatly to the areas of population genetics and

statistical applications in biological research.


43. This general term describes the conclusion about any null hypothesis that has been statistically

rejected.

44. This is a quantitative test for determining if sample data are normally distributed.

45. These two values represent the approximate 95% confidence intervals for this mean:x = 7.4, SD

= 0.40, N =

46. H0: σa2 = σb


47. This is the alpha level that most biologists use when testing a null hypothesis.


49. The risk of making a Type 2 error can be reduced by ________.

50. If the null hypothesis is A=B, the research hypothesis is ____.

51. The goal of this statistical test is to detect differences between the means of repeated

measurements on individuals. Data are skewed and the variances are unequal.

52. This is the test statistic for a Mann-Whitney test.

53. Nonparametric tests address either questions of differences or questions of _____.

54. This parametric test is considered to be robust.

55. This is the name of the tabled value of a test statistic at the specified alpha level.

56. This is the result when a false null hypothesis is not rejected.

57. The goal of this statistical test is to detect differences between two dependent means when the data

meet parametric test assumptions.

58. Statistical tests of this general type are not very powerful but they are easy to use because they

have relatively few assumptions.

59. This is an example of a statistic.

60. T-tests assume that variances between groups are homogeneous. How would you test this

assumption?

61. H0: μa ≤ μb is a suitable null hypothesis for this nonparametric test.

62. Data that are suitable for this category of statistical tests must be normally distributed and

continuous

10

63. This is the quantitative relationship between the standard error of the mean and the standard

deviation.

64. This mathematical theorem predicts that sample means from a non-normally distributed

population will have a normal distribution if the sample size is large enough.

65. This is the symbol for a parametric variance.

66. The goal of this statistical test is to detect differences among variances of skewed data sets.

67. This process is an example of statistical inference.

68. The goal of this statistical test is to detect differences between two dependent means when the data

meet parametric test assumptions.

69. Statistical tests of this general type are not very powerful but they are easy to use because they

have relatively few assumptions.

70. The goal of this statistical test is to detect differences between the means of repeated

measurements on individuals. Data are not normally distributed; the variances are equal.

71. This principle states that means of samples from a non-normally distributed population will have a

normal distribution if the sample size is large enough.

72. This term describes the statistical conclusion regarding a null hypothesis that has been rejected at a

probability level of 0.05.

73. This is a nonparametric test for determining whether data are normally distributed.

74. Statistical tests address either questions of differences or questions of _____.

75. What are the primary attributes of a good scientific research hypothesis?

_______________________________

Problems

76. Use the following data on salivary gimetz concentration (mg/100 ml) in male and female college

students to test the research hypothesis that sex affects gimetz levels.

males: 220.1, 218.6, 229.6, 228.8, 222.0, 224.1, 226.5

females: 23.4, 221.5, 230.2, 224.3, 223.8, 230.8

77. Use the data in HOGNOSE.SYS to test the hypothesis that active (act$=act) male resident snakes

(status$=res) move greater distances each day than do active female residents.

78. Density of voles is hypothesized to vary differently from year to year in grassland habitats that

have either been unburned, burned annually, or burned every 4-5 years. To test this idea, an

ecologist measured the population density of voles each year for 10 years in each of three different

habitats. The data are below. Does variation in population density of voles differ among the

habitats?

Density (number per hectare)

burned periodically 348, 244, 198, 321, 276, 239, 287, 311, 302, 271

burned 4-5 years 147, 172, 133, 111, 109, 113, 096, 115, 110, 107

unburned 167, 231, 098, 177, 216, 179, 195, 154, 163, 134

79. Use the following data on ranked scores on a keyboarding skills test to test the hypothesis that

high school training improves keyboarding skills of college students.

with training: 44, 48, 36, 32, 51, 45, 54, 56

without training: 32, 40, 44, 44, 34, 30, 26

Common errors when working exam problems (listed in approximate

decreasing order of potential point loss)

Incorrect or incomplete reading of the problem

Not knowing what the variables and respective measurement scales are

Not knowing the assumptions of chosen test

Incorrect correspondence between stated variables and null hypothesis

Rejecting H0 when P>0.05 or not rejecting H0 when P≤0.05

Not knowing how to work with logarithms

Incorrect data entry

11

80. Use the data in GINMOVE.SYD to test the hypothesis that male and female turtles move different

distances each day.

81. Using the following data on volume (cubic microns) of avian erythrocytes taken from normal

(diploid) and intersex (triploid) individuals, test the hypothesis that ploidy affects erythrocyte

volume.

diploid: 248, 236, 269, 254, 249, 251, 260, 245, 239, 255

triploid: 380, 391, 377, 392, 398, 374

82. Use the following data on clutch size to test the hypothesis that variability in clutch size differs

between zoo-bred and wild-bred snow geese.

zoo: 10, 11, 12, 11, 10, 11, 11

wild: 9, 8, 11, 12, 10, 13, 11, 10, 10

83. Five sophomores volunteered for an exercise physiology class project. Maximum oxygen

consumption (ml O2/min) was measured twice in each of the five sophomores over a period of one

month. One measurement was taken two days before a vigorous cardio exercise program began

and the second two days after the program ended. Use the following data to test the hypothesis

that training produced a greater ability to consume oxygen.

Individual no. 1 2 3 4 5_

Before treatment 1920 2020 2060 1960 1960

After treatment 2250 2410 2260 2200 2360

84. Crop yields were measured in each of nine experimental plots over two successive years, one

using “old” fertilizer and one using “new” fertilizer. Use the following data to test the research

hypothesis that the new fertilizer produced greater yields.

plot 1 2 3 4 5 6 7 8 9

old fertilizer 1920 2020 2060 1960 1960 2140 1980 1940 1790

new fertilizer 2250 2410 2260 2200 2360 2320 2240 2300 2090

12

Exam 3

Correlation and Regression Contenttesting for relationships

correlation

positive correlation

negative correlation

causation

correlation coefficient

strength of relationship

coefficient of determination (r2)

Pearson’s correlation

Bonferroni probabilities

Spearman’s correlation

regression

dependent variable

independent variable

residual

least squares fit

intercept

slope

regression coefficient

predicting Y from X

inverse prediction

extrapolation

semilog and log-log regressions

exponential form of log-log equation

Questions from old exams

1. This statistical procedure is used when one desires to predict the value of a dependent variable

from knowledge of the value of an independent variable. (a) regression analysis; (b) correlation

analysis; (c) goodness-of-fit; (d) data transformation; (e) analysis of variance 2. This term refers to the prediction of “Y” from a known value of “X” that is beyond the range of

the actual data. (a) extrapolation; (b) guessing; (c) transformation; (d) goodness-of-fit; (e) type II

error

3. R-squared (r2) is also known as the _____. (a) coefficient of variation; (b) coefficient of

determination; (c) parametric measure; (d) critical value; (e) measure of statistical power

4. The strength of the relationship in a correlation analysis is shown by this value. (a) intercept; (b)

correlation coefficient; (c) slope; (d) probability; (e) regression coefficient

5. In a regression analysis, “Y” is the independent variable and “X” is the dependent variable. (a)

true; (b) false 6. In a regression analysis, the regression line is fitted to the data points by this method. (a)

Kolmogorov-Smirnov; (b) extrapolation; (c) ANOVA; (d) data transformation; (e) least squares

7. How heart rate relates to oxygen consumption varies from person to person. Age, weight, sex,

body composition, fitness level, and other factors all play a role. Drawing from population models

and their own research, the companies that manufacture heart rate monitors have developed

formulas that couple heart rate with those different variables and massage it all into an estimate of

calorie usage. The onboard calculators found on treadmills, elliptical trainers and other devices use

basically the same approach. Depending on the machines, however, they typically don’t allow you

to enter as much information about yourself as a heart monitor. The machine might ask for your

weight and age, for example, but not your sex or an estimate of your fitness level. Fewer variables

mean a rougher guess. In statistical terms, what is the meaning of the last sentence, “Fewer

variables mean a rougher guess?” (a) lower CV; (b) higher CV; (c) lower r2; (d) higher r2; (e) lower

probability

Problems

Common errors when working exam problems (listed in approximate

decreasing order of potential point loss)

Incorrect or incomplete reading of the problem

Not knowing what the variables and respective measurement scales are

Not knowing the assumptions of chosen test

Incorrect correspondence between stated variables and null hypothesis

Rejecting H0 when P>0.05 or not rejecting H0 when P≤0.05

Not knowing how to work with logarithms

Incorrect data entry

13

Example problems from lectures:

1. Use the following data on wing length (cm) and tail length (cm) in cowbirds to determine if there is a

relationship between the two variables. (Protocol link)

wing 10.4 10.8 11.1 10.2 10.3 10.2 10.7 10.45 10.8 11.2

10.6

tail 7.4 7.6 7.9 7.2 7.4 7.1 7.4 7.2 7.8 7.7

7.8

2. Use the following data taken from crabs to determine if there is a relationship between weight of gills

(g) and weight of body (g) and between weight of thoracic shield (g) and weight of body. (Protocol

link)

body 159 179 100 45 384 230 100 320 80 220

320

gill 14.4 15.2 11.3 2.5 22.7 14.9 11.4 15.81 4.19 15.39

17.25

thorax 80.5 85.2 49.9 21.1 195.3 111.5 56.6 156.1 39.0 108.9

160.1

3. The following data are ranked scores for ten students who took both a math and a biology aptitude

examination. Is there a relationship between math and biology aptitude scores for these students?

(Protocol link)

math 53 45 72 78 53 63 86 98 59 71

biology 83 37 41 84 56 85 77 87 70 59

4. Test the following data to determine if there is a relationship between the total length of aphid stem

mothers and the mean thorax length of their parthenogenetic offspring. (Protocol link)

mother 8.7 8.5 9.4 10.0 6.3 7.8 11.9 6.5 6.6 10.6

offspring 5.95 5.65 6.00 5.70 4.40 5.53 6.00 4.18 6.15 5.93

5. The following data are rate of oxygen consumption (ml/g/hr) in crows at different temperatures (C).

Does temperature affect oxygen consumption in crows? Determine the equation for predicting oxygen

consumption from temperature. (Protocol link)

temp -18 -15 -10 -5 0 5 10 19

oxygen 5.2 4.7 4.5 3.6 3.4 3.1 2.7 1.8

6. Use the following data on mean adult body weight (mg) and larval density (no./mm3) of fruit flies to

determine if there is a functional relationship between adult body mass and the density at which it was

reared. Determine the equation for predicting body weight from larval density. (Protocol link)

density 1 3 5 6 10 20 40

weight 1.356 1.356 1.284 1.252 0.989 0.664 0.475

Practice problems

Nos. 5, 7, 10, 12, 14, 25, 26, 28, 29, 31, 38, 39, 46, 52, 55, 58, 62, 65, 66, 72

ANOVA - Content In addition to being comprehensive, the following new concepts are covered:

analysis of variance

F-ratio

F-distribution

Between-group variance

Within-group (error) variance

Post-hoc pairwise tests

one-way ANOVA

two-way ANOVA

Tukey test

factor

http://www.harding.edu/plummer/biostats/protocolanswers/pearson1.pdf



http://www.harding.edu/plummer/biostats/protocolanswers/spearman1.pdf

http://www.harding.edu/plummer/biostats/protocolanswers/spearman2.pdf

http://www.harding.edu/plummer/biostats/protocolanswers/regression1.pdf

http://www.harding.edu/plummer/biostats/protocolanswers/regression2.pdf

14

interaction

synergism

antagonism

residuals

Kruskal-Wallis test

DSCF test

ANCOVA

interaction plot

covariate

least squares means

the problem of multiple comparisons

Circular statistics

Principal components

MANOVA

Repeated measures ANOVA

Logistic regression

Non-linear regression

Multiple regression

Questions from old exams

1. In an ANCOVA, the covariate is a _____ variable. (a) dependent; (b) multivariate; (c) categorical; (d)

continuous; (e) derived

2. Which test is least powerful? (a) ANOVA; (b) Pearson’s correlation; (c) independent-samples t-test; (d) paired

t-test; (e) Mann-Whitney test

3. To determine the effect of two independent variables on a dependent variable, what is the advantage of doing a

single two-way ANOVA as opposed to two separate one-way ANOVAs? (a) a two-way ANOVA is more

robust; (b) a two-way ANOVA calculates the effect of a covariate; (c) a two-way ANOVA is easier to use on a

calculator; (d) a two-way ANOVA assesses possible interaction between the independent variables; (e) a two-

way ANOVA provides a test statistic

4. In an ANOVA, this is the normal variation expected in individuals that is not a result of being part of a “group.”

It results from such things as individual genetic makeup and environmental history. (a) standard deviation; (b)

SE; (c) between group variance; (d) error variance; (e) coefficient of variation

5. The goal of this test is to detect differences between >2 independent means. Data are not normally distributed

nor are the variances among groups equal.

Problems

Example problems from lectures 1. Random samples of a certain species of zooplankton were collected from five lakes and their selenium content

(ppm) was determined. Was there a difference among lakes with respect to selenium content? (Protocol link)

lake A: 23, 30, 28, 32, 35, 27, 30, 32

lake B: 34, 42, 39, 40, 38, 41, 40, 39

lake C: 15, 18, 12, 10, 8, 16, 20, 19

lake D: 18, 15, 9, 12, 10, 17, 10, 12

lake E: 25, 20, 22, 18, 30, 22, 20, 19

2. The following data are amount of food (kg) consumed per day by adult deer at different times of the year. Test

the null hypothesis that food consumption was the same for all the months tested. (Protocol link)

February May August November

4.7 4.6 4.8 4.9

4.9 4.4 4.7 5.2

5.0 4.3 4.6 5.4

4.8 4.4 4.4 5.1

4.7 4.1 4.7 5.6

4.2 4.8

3. In a study of snake hibernation, fifteen pythons of similar size and age were randomly assigned to three groups.

One group was treated with drug A, one group with drug B, and the third group was not treated. Their systolic

blood pressure (mmHg) was measured 24 hours after administration of the treatments. Do the drugs affect

blood pressure? If so, do they have similar effects? (Protocol link)

control: 130, 135, 132, 128, 130

http://www.harding.edu/plummer/biostats/protocolanswers/anova1.pdf

http://www.harding.edu/plummer/biostats/protocolanswers/anova2.pdf

http://www.harding.edu/plummer/biostats/protocolanswers/bonferroni1.pdf

15

drug A: 118, 120, 125, 119, 121

drug B: 105, 110, 98, 106, 105

4. Fourteen hucksters were assigned at random to one of three experimental groups and fed a different diet for six

months. Use the following data on huckster mass (kg) at the end of the experiment to determine if diet affected

body size. Which diet produced the heaviest hucksters? (Protocol link)

diet 1 diet 2 diet 3

60.8 68.7 102.6

57.0 67.7 102.1

65.0 74.0 100.2

58.6 66.3 96.5

61.7 69.8

5. Twenty-four freshwater clams were randomly assigned to four groups of six each. One group was placed in

deionized water, one group was placed in a solution of 0.5 mM sodium sulfate, and one group was placed in a

solution of 0.74 mM sodium chloride. At the end of a specified time period, blood potassium levels (M K+)

were determined. Did treatment affect blood potassium levels? (Protocol link)

pond water: 0.518, 0.523, 0.499, 0.502, 0.520, 0.507

deionized water: 0.308, 0.385, 0.301, 0.390, 0.307, 0.371

sodium sulfate: 0.393, 0.415, 0.351, 0.390, 0.385, 0.397

sodium chloride: 0.383, 0.405, 0.398, 0.352, 0.381, 0.407

6. An entomologist interested in the vertical distribution of a fly species collected the following data on numbers

of flies (no. flies/m3) from each of tree different vegetation layers. Use these data to test the hypothesis that fly

abundance was the same in all three vegetation layers. (Protocol link)

herbs shrubs trees

14.0 8.4 6.9

12.1 5.1 7.3

5.6 5.5 5.8

6.2 6.6 4.1

12.2 6.3 5.4

7. Use USOPHEO.SYD to determine if body size is affected by sex and/or location. Read the description of the

data file before proceeding. (Protocol link)

8. Qualime epithelial cancer is hypothesized to result from either genotype or several environmental factors that

vary by season. To address this hypothesis, use the data below on QSA level (g/g; the diagnostic test indicator

of qualime cancer) that were collected on 20 individuals in different seasons. (Protocol link)

QSA Genotype Season QSA Genotype Season QSA Genotype Season QSA Genotype Season

478 ZZ Winter 425 ZW Summer 428 ZZ Summer 466 ZW Winter





Practice problems

Nos. 3, 21, 22, 27, 54, 56, 57, 69, 70

http://www.harding.edu/plummer/biostats/protocolanswers/bonferroni2.pdf

http://www.harding.edu/plummer/biostats/protocolanswers/kruskalwallis1.pdf

http://www.harding.edu/plummer/biostats/protocolanswers/kruskalwallis2.pdf

http://www.harding.edu/plummer/biostats/datafiles.pdf

http://www.harding.edu/plummer/biostats/datafiles.pdf

http://www.harding.edu/plummer/biostats/protocolanswers/twoway1.pdf

http://www.harding.edu/plummer/biostats/protocolanswers/twoway2.pdf

16

Final Exam

Questions

1. A “powerful” statistical test is a test in which _____. (a) the probability of rejecting a false null hypothesis is

high; (b) the probability of rejecting a true null hypothesis is high; (c) the probability of accepting a false

null hypothesis is high; (d) the probability of accepting a true null hypothesis is high

2. This statistical procedure is used when one desires to predict the value of a dependent variable from

knowledge of the value of an independent variable. (a) regression analysis; (b) correlation analysis; (c)

goodness-of-fit; (d) data transformation; (e) analysis of variance

3. One can get a general idea of whether two means are significantly different if, on a graph, the values of these

do not overlap. (a) mean1SE; (b) mean2SD; (c) 95% confidence limits; (d) ranges; (e) means1SD

4. This term refers to the prediction of “Y” from a known value of “X” that is beyond the range of the actual

data. (a) extrapolation; (b) guessing; (c) transformation; (d) goodness-of-fit; (e) type II error

5. Which is false regarding data that are suitable for parametric tests? (a) sampled data are independent of each

other; (b) data are randomly sampled; (c) sampled data are normally distributed; (d) sampled data are

measured on an ordinal scale; (e) sampled data are measurements of continuous variables

6. This is the sum-of-squares divided by the sample size. (a) mode; (b) mean; (c) range; (d) median; (e)

weighted mean

7. R-squared (r2) is also known as the _____. (a) coefficient of variation; (b) coefficient of determination; (c)

parametric measure; (d) critical value; (e) measure of statistical power

8. The central limit theorem states that _____. (a) the means of samples from a normally distributed population

have a normal distribution; (b) the means of samples from a normally distributed population are always

skewed; (c) the means of samples from a normally distributed population are not normally distributed; (d)

the means of samples from a normally distributed population have no variance; (e) the means of samples

from a normally distributed population are significantly different from one another

9. The outcomes of statistical tests are usually found in this section of a primary literature paper. (a)

introduction; (b) materials and methods; (c) results; (d) discussion; (e) literature cited

10. A t-distribution with infinite degrees of freedom is identical to this distribution. (a) Poisson; (b) binomial; (c)

F; (d) Chi-square; (e) normal

11. An observed frequency distribution of a given type will more closely conform to a theoretical frequency

distribution of the same type under this condition. (a) decreased N; (b) increased N; (c) decreased range; (d)

increased mean; (e) decreased mean

12. Statistical “error” often refers to the level of confidence that one has regarding how well the statistics of

_____ estimate the statistics of _____. (a) samples, populations; (b) populations, samples; (c) parametrics,

nonparametrics; (d) precision, accuracy; (e) accuracy, precision.

13. The volume of blood (ml) is measured on this scale. (a) categorical; (b) ordinal; (c) ratio; (d) interval; (e)

continuous

14. Which of the following is a type I error? (a) rejection of a false null hypothesis; (b) rejection of a true null

hypothesis; (c) acceptance of a true null hypothesis; (d) acceptance of a false null hypothesis

15. The percentage results of political polls as reported on television usually have a “margin of error”

accompanying the percentages. What is a “margin of error?” (a) 95% confidence interval; (b) error

variance; (c) z-score; (d) normality test; (e) r2

16. In an ANCOVA, the covariate is a _____ variable. (a) dependent; (b) multivariate; (c) categorical; (d)

continuous; (e) derived

17. This calculated value is used in conjunction with a statistical table to determine the probability of a null

hypothesis being true.

18. The shape of this theoretical probability distribution is determined by the mean and standard deviation.

19. The cases of this distribution consist of individual sample means taken from a population.

The following questions have been taken from old exams in Biostats classes over the last 10-15 years.

Because Biostats changes to some degree each year, some questions may address items not covered

during the latest semester. In addition, new material covered in the latest Biostats class may not have

representative questions below.

17

20. The strength of the relationship in a correlation analysis is shown by this value. (a) intercept; (b) correlation

coefficient; (c) slope; (d) probability; (e) regression coefficient

21. The goal of this statistical test is to determine if the means of two separate groups are different. Data are not

normally distributed but the group variances are equal.


23. This general term describes the conclusion about any null hypothesis that has been statistically rejected.

24. H0: σa2 = σb


25. This is the alpha level that most biologists use when testing a null hypothesis.

26. This principle states that sample means from a normally distributed population will be normally distributed

regardless of sample size.


28. The risk of making a Type 2 error can be reduced by ________.

29. If the null hypothesis is A=B, the research hypothesis is ____.

30. The goal of this statistical test is to detect differences between the means of repeated measurements on

individuals in one group. Data are skewed and the group variances are unequal.

31. This is the test statistic for a Mann-Whitney test.

32. Nonparametric tests address either questions of differences or questions of _____.

33. This parametric test is considered to be robust.

34. This is the name of the tabled value of a test statistic at the specified alpha level.

35. The goal of this statistical test is to detect differences between two dependent means when the data meet

parametric test assumptions.

36. T-tests assume that variances between groups are homogeneous. How would you test this assumption?

37. H0: μa ≤ μb is a suitable null hypothesis for this nonparametric test.

38. This is the numerical relationship between the standard error of the mean and the standard deviation.

39. This mathematical theorem predicts that sample means from a non-normally distributed population will have

a normal distribution if the sample size is large enough.

40. This frequency distribution is basically a normal distribution whose shape varies with sample size.

41. The goal of this statistical test is to detect differences among variances of skewed data sets.

42. Which test is least powerful? (a) ANOVA; (b) Pearson’s correlation; (c) independent-samples t-test; (d)

paired t-test; (e) Mann-Whitney test

43. To determine the effect of two independent variables on a dependent variable, what is the advantage of

doing a single two-way ANOVA as opposed to two separate one-way ANOVAs? (a) a two-way ANOVA is

more robust; (b) a two-way ANOVA calculates the effect of a covariate; (c) a two-way ANOVA is easier to

use on a calculator; (d) a two-way ANOVA assesses possible interaction between the independent variables;

(e) a two-way ANOVA provides a test statistic

44. In a standard normal distribution, a z-score of _____ on each side of the mean encloses 95% of the cases. (a)

0.68; (b) 1.96; (c) 1.0; (d) 0.05; (e) 0.0

45. In a regression analysis, “Y” is the independent variable and “X” is the dependent variable. (a) true; (b) false

46. The shape of a Poisson distribution is determined by the _____. (a) mean, standard deviation; (b) p, n; (c)

geometric mean, SD, n; (d) mode, coefficient of variation; (e) mean

47. This is an important measure of data dispersion. (a) mean; (b) variance; (c) mode; (d) median; (e) Goodness-

of-Fit

48. On a SYSTAT dot graph, these graphics portray variation around the mean. (a) parameters; (b) interquartile

plots; (c) z-scores; (d) descriptive statistics; (e) error bars

49. This is the square root of the sum-of-squares divided by the sample size. (a) standard deviation; (b) mean;

(c) range; (d) median; (e) variance

50. In an interval scale of measurement, values are neither quantitative nor ranked, and there is no mathematical

or value relationship among them. (a) true; (b) false

51. This is an important measure of data central tendency. (a) scatterplot; (b) variance; (c) standard deviation;

(d) median; (e) sum-of-squares

52. The basic reason scientific knowledge has advanced so remarkably through the years is because many

dedicated scientists have proved thousands of hypotheses and theories (a) true; (b) false

53. The temperature of a human body in Celsius should be measured on a ratio scale. (a) true; (b) false

54. The various species contained within a particular genus of birds should be measured on a ranked scale. (a)

true; (b) false

55. The most common data transformation used in biology is the logarithmic transformation. (a) true; (b) false

18

56. This is the probability of obtaining two heads with one flip of two coins. (a) 0.50; (b) 0.25; (c) 0.10; (d) 1.0;

(e) 0.75

57. Data that are influenced by many small and unrelated random effects are frequently normally distributed.

As a consequence, normally distributed data are widespread and common in nature. (a) true; (b false)

58. The discipline of statistics concerns _____. (a) using quantitative properties of samples to answer questions

about populations; (b) tallying sports information; (c) how to confuse and frustrate students; (d) how to

support one’s predetermined ideas with numbers; (e) how to maximize business profit

59. This standardized expression permits one to directly compare the relative amount of variation associated

with two or more means of one variable. (a) average deviation; (b) variance; (c) median; (d) coefficient of

variation; (e) z-score

60. In SYSTAT, this is the preferred quantitative method for students to determine if data are normally

distributed. (a) histogram; (b) Tables; (c) dot graph; (d) probability plot; (e) Komolgorov-Smirnov test

61. In an ANOVA, this is the normal variation expected in individuals that is not a result of being part of a

“group.” It results from such things as individual genetic makeup and environmental history. (a) standard

deviation; (b) SE; (c) between group variance; (d) error variance; (e) coefficient of variation

62. These provide a graphical portrayal of variation around the mean. (a) error bars; (b) sampling distributions;

(c) z-scores; (d) essential descriptive statistics; (e) parameters

63. In this scale of measurement, values are neither quantitative nor ranked, and there is no mathematical or

value relationship among them. (a) ordinal; (b) interval; (c) continuous; (d) categorical; (e) ratio

64. The age of a viral particle is measured on this scale. (a) categorical; (b) ordinal; (c) ratio; (d) interval; (e)

continuous

65. In a regression analysis, the regression line is fitted to the data points by this method. (a) Kolmogorov-

Smirnov; (b) extrapolation; (c) ANOVA; (d) data transformation; (e) least squares

66. This is a measure of dispersion. (a) mean; (b) variance; (c) mode; (d) median; (e) regression plot

67. Who made this statement, “Isn’t that what science is all about...eliminating possibilities?” (a) Student; (b)

Sean Connery; (c) Dr. Who; (d) William Gossett; (e) Ronald Fisher

68. A robust statistical test is a test which _____. (a) is sensitive to deviations from the assumptions; (b) is

insensitive to deviations from the assumptions; (c) has no assumptions; (d) has a high probability of

accepting a true null hypothesis; (e) has a low probability of rejecting a true null hypothesis

69. Which is not an example of statistical inference? (a) calculating a sample mean; (b) estimating a population

mean; (c) estimating a population variance; (d) testing a statistical hypothesis; (e) estimating a population

median

70. “How heart rate relates to oxygen consumption varies from person to person. Age, weight, sex, body

composition, fitness level, and other factors all play a role. Drawing from population models and their own

research, the companies that manufacture heart rate monitors have developed formulas that couple heart rate

with those different variables and massage it all into an estimate of calorie usage. The onboard calculators

found on treadmills, elliptical trainers and other devices use basically the same approach. Depending on the

machines, however, they typically don’t allow you to enter as much information about yourself as a heart

monitor. The machine might ask for your weight and age, for example, but not your sex or an estimate of

your fitness level. Fewer variables mean a rougher guess.” In statistical terms, what is the meaning of the

last sentence, “Fewer variables mean a rougher guess?” (a) lower CV; (b) higher CV; (c) lower r2; (d) higher

r2; (e) lower probability

71. The computer output below contains the results of a log-log regression analysis of hemoglobin concentration

on blood volume. What is the proper exponential regression equation for these results? (a) Hemo =

1.45Vol2.56; (b) Hemo = 2.56Vol1.45; (c) Hemo = 28.2Vol 2.56; (d) Hemo = 0.33Vol1.45; (e) Hemo = 505Vol0.14

Predictor Coef SE Coef T P

Constant 1.45 0.33 1.091 0.317

Volume 2.56 0.14 22.469 0.000

Analysis of Variance

Source DF SS MS F P

Regression 1 156.661 156.661 504.866 0.000

Residual Error 6 1.862 0.310

Total 234 582.46

19

Choose the most appropriate statistical test

a. independent t-test ab. Kruskal-Wallis bcd. Pearson correlation ad. 2-way ANOVA

b. paired t-test bc. Bartlett’s cde. Spearman correlation ae. Willcoxon

c. Mann-Whitney cd. goodness-of-fit abcd. Tukey test

d. Levene’s de. test of independence bcde. Regression

e. 1-Way ANOVA abc. Kolmogorov-Smirnov ac. ANCOVA

72. The concentration of unicellular algae (measured as chlorophyll concentration per liter) was measured in two

independent samples each at two different depths in each of four lakes. We wish to know if there is a

difference in algae concentration between the two depths. The data are normally distributed and the group

variances are equal.

lake surface 1 m lake surface 1 m

1 425 130 3 100 30

1 433 147 3 113 29

2 500 215 4 312 103

2 488 221 4 325 100

73. White-throated sparrows occur in 2 distinct color morphs, referred to as brown and white. It was suspected

that females select mates of the opposite morph (i.e., white females select brown males and vice versa).

This phenomenon is known as negative assortive mating. In 30 mated pairs, the color combinations were as

follows. Do these data support the assumption that negative assortive mating occurs in this species?

males

white brown

females white 27 43

brown 44 25

74. Four tomato plants were treated with chlorogenic acid to determine if this would influence the activity (% of

maximum) of the enzyme 0-diphenol oxidase in the leaves. A control group of four plants were not treated.

We do not know if the variable (activity) is normally distributed, nor is it possible to determine this. Does

the treatment affect activity of the enzyme?

treated untreated treated untreated

35 10 38 11

45 18 29 8

75. We suspect that a certain strain of laboratory rates has a genetic tendency to make left turns in a “T” maze.

Of 12 rats that were tested in such a maze, 8 chose to go into the left arm and 4 chose the right arm. Do

these results support our suspicion about a left-turning tendency?

76. Determine if the following data on sprint speed (m/sec) in five-lined skinks are normally distributed: 1.7,

0.8, 1.1, 0.9, 1.2, 1.6, 0.9, 0.8, 1.0, 1.4, 0.7, 1.1, 0.7.

77. Random samples of a certain species of zooplankton were collected from 5 randomly selected lakes and

their selenium content was determined. We wish to know if there is a difference among lakes with respect

to selenium content in this species (i.e., is there a significant “lake effect”?). The data are normally

distributed and the group error variances are equal.

lake A lake B lake C lake D lake E

23 34 15 18 25

30 42 18 15 20

35 38 8 10 30

27 41 16 17 22

30 40 20 10 20

32 39 19 12 19

78. The goal is to detect differences among variances of normally-distributed data.

79. The goal is to detect differences among >2 means. Data are ranked.

80. The goal is to determine if each of 2 data sets are normally distributed.

81. The goal is to detect whether frequency data differ from a theoretical distribution.

20

82. The goal is to detect differences between >2 independent means. Data are not normally distributed nor are

the variances among groups equal..

83. The goal is to detect relationships between 2 or more ordinal variables. Data are not normally distributed.

84. The goal is to detect differences between the means of 2 separate groups. Data are not normally distributed

nor are the group variances equal.

Computer output interpretation

_____________________________________________________

Using the output below, answer the following questions:

TABLE OF SEX (ROWS) BY LOCATION (COLUMNS)

FREQUENCIES

AR KS NFL SFL STX VA TOTAL

-------------------------------------------------------------

F |90 21 28 70 67 42 | 318

| |

M |90 20 47 47 79 48 | 331

-------------------------------------------------------------

TOTAL 180 41 75 117 146 90 649

TEST STATISTIC VALUE DF PROB

PEARSON CHI-SQUARE 10.489 5 0.063

85. The value of the test statistic is _____. (a) 649; (b) <0.10; (c) 0.063; (d) 5; (e) 10.489

86. The null hypothesis should _____. (a) be rejected; (b) not be rejected

87. The conclusion is _____. (a) mean SEX is different from mean LOC; (b) SEX is unrelated to LOC; (c)

observed LOC = expected LOC; (d) AR = KS = NFL = SFL = STX = VA; (e) males are larger than female

_________________________________________________

Using the output below, answer the following questions:

H2OOUT versus H2OIN Predictor Coef SE Coef T P

Constant 0.433 0.472 1.091 0.317

HS0IN 0.317 0.912 22.469 0.000

S = 0.988 R-Sq = 98.8% R-Sq(adj) = 98.6%

Analysis of Variance

Source DF SS MS F P

Regression 1 156.661 156.661 504.866 0.000

Residual Error 6 1.862 0.310

Total 234 582.46

88. The regression equation is _____. (a) Y=0.433+0.317X; (b) Y=0.472+0.912X; (c) Y=0.317-0.433X; (d)

Y=156.661+0.317X; (e) Y=504.866+0.433X

89. The value of Y (H2OOUT) when X (H2OIN) = 9.4 is _____. (a) 9.04; (b) -2.55; (c) 507.78; (d) 159.58; (e)

3.400

90. The dependent variable is _____. (a) RESIDUAL; (b) H20IN; (c) H20OUT; (d) X; (e) CONSTANT

21

Terms and concepts

process and philosophy of science (observations,

questions, hypotheses, theories, prediction, if-

then, correlational and experimental tests, data,

facts, scientific “proof”, relationship of ideas and

data - ID, limitations of science)

variables (measured, derived, dependent,

independent, response, predictor), data, case,

observation

data collection (population, sample, error,

random, independence, sample size)

measurement scales and kinds of variables

(nominal/categorical, ranked/ordinal,

interval/ratio, continuous, discrete)

practical: identifying variables and measurement

scales

Frequency distributions (histogram)

central tendency (mode, median, mean, weighted

mean)

dispersion (maximum, minimum, range,

interquartile range, sum-of-squares, standard

deviation, variance, coefficient of variation)

parameters and statistics

reporting sample means (necessity of measure of

dispersion, error bars)

calculating descriptive statistics with a calculator

and with SYSTAT (raw data file, frequency data

file)

Goodness-of-fit

Probability distributions

binomial (mutually exclusive [either/or]

categories; defined by p, n

Poisson (rare and random events; defined by

mean)

calculating terms of binomial and Poisson

distributions

comparison of observed and expected

distributions (influence of sample size)

importance of normal distribution in statistics

properties of normal distribution (defined by

mean and standard deviation)

areas of normal curve

standard normal distribution (z-scores)

testing for normality (Probability plot and

Kolmogorov-Smirnoff test)

skewness

test statistic

parametric and non-parametric tests

data transformation (logarithmic, square root,

arcsine)

statistical inference

major categories of statistical inference

sampling distribution

central limit theorem

Student’s t-distribution

standard error of mean

95% confidence limits

reporting sample means

graphical error bars

hypothesis testing

research hypothesis

null hypothesis

test statistic

critical value

alpha level

one and two-tailed tests

type I & II errors

relationship of type I & II errors

power of a test

significance level

statistical significance

parametric and nonparametric tests

assumptions of a test

Bartlett’s test

Levene’s test

robust test

testing for differences

independent samples t-test

paired samples t-test

repeated measures tests

Mann-Whitney test

Wilcoxon test

graphical analysis of differences between

means

correlation, causation

regression (assumptions; null hypothesis,

intercept, slope)

residuals

regression equation (linear, semi-log, log-

log; exponential)

prediction

extrapolation

model building

use of exponential regression in biology

analysis of variance

F-ratio

F-distribution

Between-group variance

Within-group (error) variance

Post-hoc pairwise tests

one-way ANOVA

two-way ANOVA

Tukey test

factor

22

interaction

synergism

antagonism

residuals

Kruskal-Wallis test

DSCF test

ANCOVA

interaction plot

covariate

least squares means

the problem of multiple comparisons

circular statistics

Principal components

MANOVA

Repeated measures ANOVA

Logistic regression

Polynomial regression

Multiple regression

analysis of covariance (ANCOVA)

Study Guides for Exams...42. Use the following data on ranked scores on a keyboarding skills test to...

Documents

Transcript of Study Guides for Exams...42. Use the following data on ranked scores on a keyboarding skills test to...