Life after linear regression

31
Life after linear regression A survey of Penn State applied statistics graduate courses

description

Life after linear regression. A survey of Penn State applied statistics graduate courses. The courses. Stat 500: Applied Statistics Stat 501: Regression Methods Stat 502: Analysis of Variance & Design of Expts Stat 503: Design of Experiments Stat 504: Analysis of Discrete Data - PowerPoint PPT Presentation

Transcript of Life after linear regression

Page 1: Life after linear regression

Life after linear regression

A survey of Penn State applied statistics graduate courses

Page 2: Life after linear regression

The courses

• Stat 500: Applied Statistics• Stat 501: Regression Methods• Stat 502: Analysis of Variance & Design of Expts• Stat 503: Design of Experiments• Stat 504: Analysis of Discrete Data• Stat 505: Applied Multivariate Statistical Analysis• Stat 506: Sampling Theory and Methods• Stat 509: Biostatistical Methods• Stat 510: Applied Time Series Analysis

Page 3: Life after linear regression

Stat 500: Applied Statistics

• Topics covered:– Descriptive statistics

– Hypothesis testing and power

– Estimation and confidence intervals

– Regression

– One- and two-way ANOVA

– Chi-square tests

• Prerequisites– 2 credits of algebra

Page 4: Life after linear regression

Stat 501: Regression Methods

• Topics covered:– Analysis of research data through simple and multiple

regression and correlation

– Polynomial models

– Indicator variables

– Stepwise and piecewise regression

– Logistic regression

• Prerequisites– 6 credits of statistics or Stat 500; matrix algebra

Page 5: Life after linear regression

Stat 502: Analysis of Variance and Design of Experiments

• Analysis of data when:– the response y is continuous– the predictors (called factors or treatments) are

all qualitative– have same error assumptions as for regression

• Do the means differ among the groups defined by the factor combinations?

Page 6: Life after linear regression

Stat 502: Analysis of Variance and Design of Experiments

• Topics covered:– Analysis of variance and design concepts

– Factorial, nested and unbalanced data

– Analysis of covariance

– Blocked designs

– Latin-square, split-plot, repeated measures designs

– Multiple comparisons

• Prerequisites– Stat 501 (or undergraduate version Stat 462)

Page 7: Life after linear regression

A Stat 502 Example:Intertidal Seaweed Grazers

• To study influence of ocean grazers on regeneration rates of seaweed in intertidal zone, a researcher scraped square rock plots free of seaweed and observed the seaweed regeneration when certain types of seaweed-grazing animals were denied access.

• Research questions:– Which grazer consumes most seaweed?– Do different grazers influence impact of each other?– Are grazing effects similar in all microhabitats?

Page 8: Life after linear regression

A Stat 502 Example:Intertidal Seaweed Grazers

• The grazers were limpets (L), small fishes (f), and large fishes (F):– LfF: all three grazers were allowed access– fF: limpets were excluded using caustic paint– Lf: large fish were excluded using coarse net– f: limpets and large fish were excluded– L: small, large fish excluded using fine net– C: the control group, all excluded

Page 9: Life after linear regression

A Stat 502 Example:Intertidal Seaweed Grazers

• Intertidal zone is a highly variable environment. Researcher applied treatments in 8 blocks of 12 plots each:– #1: Just below high tide, exposed to heavy surf– #2: Just below high tide, protected from surf– #3: Midtide, exposed– #4: Midtide, protected– #5: Just above low tide level, exposed– #6: Just above low tide level, protected– #7: On near-vertical rock wall, midtide, protected– #8: On near-vertical rock wall, above low tide, protected

Page 10: Life after linear regression

A Stat 502 Example:Percent of regenerated seaweed on intertidal

plots with some grazers excluded

Block Control L f Lf fF LfF

1 14, 23 4, 4 11, 24 3, 5 10, 13 1, 2

2 22, 35 7, 8 14, 31 3, 6 10, 15 3, 5

3 67, 82 28, 58 52, 59 9, 31 44, 50 6, 9

4 94, 95 27, 35 83, 89 21, 57 57, 73 7, 22

5 34, 53 11, 33 33, 34 5, 9 26, 42 5, 6

6 58, 75 16, 31 39, 52 26, 43 38, 42 10, 17

7 19, 47 6, 8 43, 53 4, 12 29, 36 5, 14

8 53, 61 15, 17 30, 37 12, 18 11, 40 5, 7

Page 11: Life after linear regression

Stat 503: Design of Experiments

• The key word is “experiments”• When you can control the values of your

predictors (factors), you should ensure you can answer your research question by:– Collecting the appropriate measurements– Setting the values of your factors appropriately– Reducing extraneous variation by “blocking”– Having an appropriate sample size

Page 12: Life after linear regression

Stat 503: Design of Experiments

• Topics covered:– Design principles– Optimality– Confounding in split-plot designs– Repeated measures designs, fractional factorial designs,

response surface designs– Balanced/partially balanced incomplete block designs

• Prerequisites:– Stat 501 (or undergraduate Stat 462)– Stat 502

Page 13: Life after linear regression

A Stat 503 Example:The BARGE Study

• Current standard treatment for patients with mild to moderate asthma is scheduled daily use of inhaled albuterol.

• Now hypothesized that such regular use has a negative effect on lung function in patients with B16Arg/Arg genotype, but not in those with B16Gly/Gly genotype.

Page 14: Life after linear regression

A Stat 503 Example:The BARGE Study

• The BARGE Study concerns comparing the regular use of inhaled albuterol (A) to placebo (P) in patients with the B16Arg/Arg genotype (R) and in patients with the B16GlyGly genotype.

• The primary hypothesis concerns inference about whether (μRA- μRP)- (μGA- μGP) is 0.

Page 15: Life after linear regression

A Stat 503 Example:BARGE Study’s Paired Crossover

OrderPeriod

1Wash

outPeriod

2

GenotypeR

1 (AP) Y1jRA --- Y1jRP

2 (PA) Y2jRP --- Y2jRA

GenotypeG

1 (AP) Y1jGA --- Y1jGP

2 (PA) Y2jGP --- Y2jGA

Page 16: Life after linear regression

Stat 504: Analysis of Discrete Data

• Analysis of data when:– the response y is binary or discrete– the predictors are qualitative or quantitative

• Summarized data are frequency counts

• How do the predictors affect the response?

Page 17: Life after linear regression

Stat 504: Analysis of Discrete Data

• Topics covered:– Models for frequency arrays

– Goodness-of-fit tests

– Two-, three- and higher-way tables

– Latent models

– Logistic and Poisson regression models

• Prerequisites– Stat 502 (or undergraduate Stat 460 or major Stat 512)

– Matrix algebra

Page 18: Life after linear regression

A Stat 504 Example:Survival in the Donner Party

• In 1846, Donner and Reed families traveled from Illinois to California by covered wagon.

• Group became stranded in eastern Sierra Nevada mountains when hit by heavy snow.

• 40 of 87 members (45 adults over age 15) died from famine and exposure.

• Are females better able to withstand harsh conditions than are males?

Page 19: Life after linear regression

A Stat 504 Example:Survival in the Donner Party

655545352515

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

Age

Pro

babi

lity

of

surv

ival

Female

Male

Page 20: Life after linear regression

A Stat 504 Example:Survival in the Donner Party

Link Function: Logit

Response Information

Variable Value CountSTATUS SURVIVED 20 (Event) DIED 25 Total 45

Logistic Regression Table Odds 95% CIPredictor Coef SE Coef Z P Ratio Lower UpperConstant 1.633 1.110 1.47 0.141AGE -0.07820 0.03729 -2.10 0.036 0.92 0.86 0.99Gender 1.5973 0.7555 2.11 0.034 4.94 1.12 21.72

Page 21: Life after linear regression

Stat 505: Applied Multivariate Statistical Analysis

• Analysis of data when you have several correlated, continuous responses is called multivariate data analysis.

• A repeated measure is a special kind of multivariate response obtained by measuring the same variable on each subject several times, possibly under different conditions.

Page 22: Life after linear regression

Stat 505: Applied Multivariate Statistical Analysis

• Topics covered:– Multivariate data: matrix review, graphical displays, probability

theory, multivariate normal distribution, partial correlations– Inferences about multivariate means: Hotelling’s T2 tests,

multivariate analysis of variance, repeated measures experiments and growth curves, discriminant analysis

– Data reduction: Principal components, factor analysis, canonical correlation analysis, cluster analysis

– Structural equation modeling

• Prerequisites:– 6 credits in statistics– Matrix algebra

Page 23: Life after linear regression

A Stat 505 Example: Pottery Data

• Pottery samples were collected from four sites in the British Isles: Llanedyrn, Caldicot, Isle Thornes, and Ashley Rails.

• Each piece analyzed for its aluminum, iron, magnesium, calcium, and sodium content.

• Do the pottery samples from the four sites differ with respect to their composition?

Page 24: Life after linear regression

A Stat 505 Example:Pottery Data

Page 25: Life after linear regression

Stat 506: Sampling Theory and Methods

• Topics covered:– Basic methods: simple random sampling, selecting sample sizes,

unequal probability sampling, ratio and regression estimation, stratified sampling, cluster and systematic sampling, multistage designs, double sampling

– Special topics: sampling hidden human populations, environmental sampling, sampling to study cause-and-effect relationships, resampling of data, measurement errors and nonresponse in surveys, adaptive sampling, network and snowball sampling

• Prerequisites:– Calculus– 3 credits in statistics

Page 26: Life after linear regression

A Stat 506 Example:A Water Pollution Survey

• Study region of interest has 320 lakes.• Take random sample of the lakes by:

– Drawing a rectangle of length l and width w around study region.

– Generate pairs of (0,1) random numbers. Multiple first number by l, second by w to get random location coordinates within region.

– If location is a lake, then lake is selected.

– Continue until required number of lakes selected.

Page 27: Life after linear regression

Stat 509: Biostatistics

• Topics covered:– An introduction to the design and statistical

analysis of randomized and observational studies in biomedical research

• Prerequisites:– Stat 500

Page 28: Life after linear regression

Stat 510: Applied Time Series Analysis

• Topics covered:– Identification of models for empirical data collected

over time

– Use of models in forecasting

• Prerequisites:– Stat 501 (or undergraduate Stat 462 or major Stat 511)

Page 29: Life after linear regression

A Stat 510 Example:Measuring Global Warming

• Temperature (in degrees Celsius) averaged for the northern hemisphere over a full year.

• Temperature series collected from 1880 to 1987.• All measurements expressed as differences from

their 108-year mean.• Research questions:

– Is the mean temperature increasing over the 88 years?– What is the rate of increase in global temperature over

the past century?

Page 30: Life after linear regression

A Stat 510 Example:Measuring Global Warming

YEAR

TEM

P

2000198019601940192019001880

0.4

0.3

0.2

0.1

0.0

-0.1

-0.2

-0.3

-0.4

-0.5

Scatterplot of TEMP vs YEAR

Page 31: Life after linear regression

A Stat 510 Example:Measuring Global Warming

Observation Order

Resi

dual

1009080706050403020101

0.3

0.2

0.1

0.0

-0.1

-0.2

-0.3

Residuals Versus the Order of the Data(response is TEMP)