Statistical Analysis of Potential Causes of Obesity in the U.S.

21
A Statistical Model to Explain Potential Causes of Obesity in the U.S. By Amber Oldfield St. John’s University- MBA New York, NY [email protected] (570) 407-0224 Submitted on May 9, 2009

description

This research evaluates factors that may contribute to the growing obesity rate in the U.S.

Transcript of Statistical Analysis of Potential Causes of Obesity in the U.S.

Page 1: Statistical Analysis of Potential Causes of Obesity in the U.S.

A Statistical Model to Explain Potential Causes of Obesity in the U.S.

By

Amber OldfieldSt. John’s University- MBANew York, [email protected](570) 407-0224

Submitted on May 9, 2009

Page 2: Statistical Analysis of Potential Causes of Obesity in the U.S.

I. Introduction

The statistical research presented is used to help discover potential causes of obesity throughout the United States. The National Institute of Health (NIH) defines obesity as body mass index (BMI) greater than 30. The study will be a cross-sectional analysis evaluating each of the 50 states of the U.S. and potential contributing factors to obesity. All of the information obtained is from year end 2007. Six factors will be evaluated to discover if they do indeed contribute to a growing obesity rate. These independent variables that will be evaluated are per capita income, unemployment rate, percent of graduates from High School (25 years and older), diabetes rate, population density, and percentage of uninsured individuals. After evaluating the effect of each of these variables on the obesity rate, it will be clear the degree to which they actually affect the obesity rate in the U.S., if indeed they have any affect at all. This research is relevant and will prove valuable to doctors, dieticians, trainers, and health care insurers. The research may also prove to be valuable to those that are currently obese and are trying to determine what factors are contributing to their condition. The results of this research can help all of these individuals understand obesity to a greater degree and may change the action they take in trying to alleviate the condition.

II. Prior Research

Prior research has been conducted on the study of potential causes of obesity. Some of this research has proven to more successful than others in determining what may be contributing to a growing obesity rate. Below is a list of this prior research detailing the independent variables used, with the corresponding functional specifications, and the resulting coefficient of determination (R2).

- Analysis of Obesity Across the U.S.: + -Obesity Rate= f (Unemployment Rate, Income)R 2 = .363

- Analysis of Obesity Across the U.S.:+ + -

Obesity Rate=f (# fast food restaurants, commute time, % Bachelor degrees)R 2 = .694

- Analysis of Obesity Across the U.S.: + +

Obesity Rate=f (per capita income, unemployment rate) R 2 = .431

2

Page 3: Statistical Analysis of Potential Causes of Obesity in the U.S.

- Analysis of Obesity Across the U.S.: - - -

Obesity Rate=f (% Bachelor Degrees, Age, Income) R 2 = .575

By evaluating this prior research it is possible to build on what has already been done or to attempt new independent variable combinations in the hopes of increasing the coefficient of determination (R2).

III. Methodology

As previously mentioned the research is a cross-section analysis evaluating six independent variables that may contribute to obesity. The hypothesis stated concludes that the connection between the obesity rate and per capita income will be negative; this means that as per capita income increases, the obesity rate will decrease. The assumption between the obesity rate with the unemployment rate, diabetes rate, and percent uninsured will be positive. This means that as these independent variables increase it will be assumed that the obesity rate will increase as well. As in the case of the obesity rate and per capita income; the percent of High School graduates (over age 25), and population density will have a negative effect on the obesity rate. The data for this research was obtained from statemaster.com, U.S. Department of Commerce, the Bureau of Labor Statistics, the Center for disease Control (CDC), and the U.S. Census Bureau. A more detailed description of these sources can be found in the appendix of this report. All of the data analysis was performed using SPSS. The techniques that will be used in this research are Graphical presentations- scatterplots and histograms, Descriptive Statistics, Correlation and Regression Analysis.

The functional specification for this research is as follows:

Eqn. 1- + - + - +

Obesity Rate= f (Per Cap. Income, Unemployment %, % Grads HS, Diabetes Rate, Population Density, % Uninsured)

IV. Results

Figure 1- Histogram of Obesity RateFigure 1, below, shows a histogram of the dependent variable, the Obesity Rate. The

histogram appears to be approximately normally distributed with a slight skewness to the left.

3

Page 4: Statistical Analysis of Potential Causes of Obesity in the U.S.

32.030.028.026.024.022.020.018.0

14

12

10

8

6

4

2

0

Fre

qu

ency

Mean =25.656Std. Dev. =2.8188

N =50

Histogram of Obesity Rate across the U.S.

Table 1- Descriptive Statistics Table 1, below, confirms what was shown in the histogram that the dependent variable,

Obesity Rate, is skewed to left with a skewness equal to -.194. Also, the kurtosis for the population density shows that the data is leptokurtic, meaning that the data for population density if thin in the mid-region but is greater in the tail regions – where there is high and low population density.

Mean StdDev Variance Skewness Kurtosis

Obesity Rate 25.66 2.82 7.95 -.194 -.243

Per Capita Income 35328.66 5155.68 26581060.00 .898 .674

4

Page 5: Statistical Analysis of Potential Causes of Obesity in the U.S.

Unemployment Rate 4.39 1.10 1.215 1.027 1.44

% Grad from HS (25 years and older)

85.28 3.89 15.13 -.430 -.985

Diabetes Rate 6.84 1.25 1.56 .452 1.452

Population Density 181.90 250.15 62577.43 2.44 5.89

Percent Uninsured 14.20 3.99 15.98 .491 -.277

Table 2- Correlation Matrix Table 2, below, shows the correlation between the six independent variables in relation to

the dependent variable, the Obesity Rate. All of the correlations presented agree with Eqn.1- the functional specification. The per capita income has a moderately strong negative correlation with the obesity rate at -.542. The unemployment rate has a moderately weak positive correlation with the obesity rate at .413. The percent of graduates from HS (25 years and older) has a moderately strong negative correlation at -.513. The diabetes rate has the strongest correlation with the obesity rate of all the independent variables evaluated at .685. The population density has a weak negative correlation with the obesity rate at -.321. The percent uninsured also has a rather weak correlation with the obesity rate, but is positive, at .237. It is important to note that high multi-collinearity does exist in a few places in the correlation matrix as indicated with an asterisk (*). Multi-collinearity is when there is high correlation between the independent variables. This may result in biased coefficients in the estimated sample regression line equation.

Obesity Rate

Per Capita Income

Unemployment Rate

% Grads from HS (25 years and older)

Diabetes Rate

Population Density

Percent Uninsured

Obesity Rate 1 -.542 .413 -.513 .685 -.321 .237

Per Capita Income -.542 1 -.136 .396 -.385 .661 * -.293

Unemployment Rate .413 -.136 1 -.342 .260 .083 .190

% Grads from HS (25 -.513 .396 -.342 1 -.721 * .002 -.568 *

5

Page 6: Statistical Analysis of Potential Causes of Obesity in the U.S.

years and older)

Diabetes Rate .685 -.385 .260 -.721 * 1 .040 .243

Population Density -.321 .661 * .083 .002 .040 1 -.262

Percent Uninsured .237 -.293 .190 -.568 * .243 -.262 1

Figure 2- Scatterplot of Obesity Rate v. Per Capita IncomeFigure 2, below, presents a scatterplot of the obesity rate v. per capita income. The

scatterplot appears to possess a moderately strong, negative, linear relationship.

5000045000400003500030000

PerCapitaIncome

32.0

30.0

28.0

26.0

24.0

22.0

20.0

18.0

Ob

esit

yRat

e

Scatterplot of Obesity Rate v. Per Capita Income, r = -.542

Figure 3- Scatterplot of Obesity Rate v. Unemployment Rate

Figure 3, below, presents the scatterplot of the obesity rate v. the unemployment rate. The scatterplot appears to possess a moderately weak, positive, linear relationship.

6

Page 7: Statistical Analysis of Potential Causes of Obesity in the U.S.

8.07.06.05.04.03.02.0

UnemploymentRate

32.0

30.0

28.0

26.0

24.0

22.0

20.0

18.0

Ob

es

ity

Ra

teScatterplot of Obesity Rate v. Unemployment Rate, r = .413

Figure 4- Scatterplot of Obesity Rate v. % High School Grads (25 years and older) Figure 4, below, presents the scatterplot of the obesity rate v. percent of graduates from

High School (25 years and older). The scatterplot appears to possess a moderately strong, negative, linear relationship.

90.087.084.081.078.0

GradfromHS25yearsandolder

32.0

30.0

28.0

26.0

24.0

22.0

20.0

18.0

Ob

esit

yR

ate

Scatterplot of Obesity Rate v. High School Grads (25 yrs. and older), r = -.513

7

Page 8: Statistical Analysis of Potential Causes of Obesity in the U.S.

Figure 5- Scatterplot of Obesity Rate v. Diabetes RateFigure 5, below, presents the scatterplot of the obesity rate v. the diabetes rate. The

scatterplot appears to possess a moderately strong, positive, linear relationship.

11.010.09.08.07.06.05.04.0

DiabetesRate

32.0

30.0

28.0

26.0

24.0

22.0

20.0

18.0

Ob

es

ityR

ate

Scatterplot of Obesity Rate v. Diabetes Rate, r = .685

Figure 6- Scatterplot of Obesity Rate v. Population Density Figure 6, below, presents the scatterplot of the obesity rate v. population density. The

scatterplot appears to possess a weak, negative, linear relationship.

8

Page 9: Statistical Analysis of Potential Causes of Obesity in the U.S.

1200.01000.0800.0600.0400.0200.00.0

PopulationDensity

32.0

30.0

28.0

26.0

24.0

22.0

20.0

18.0

Ob

es

ity

Ra

teScatterplot of Obesity Rate v. Population Density, r = -.321

Figure 7- Scatterplot of Obesity Rate v. Percent of Uninsured Figure 7, below, presents the scatterplot of the obesity rate v. percent uninsured. The

scatterplot appears to possess a weak, positive, linear relationship.

24.021.018.015.012.09.06.0

PercentageUninsured

32.0

30.0

28.0

26.0

24.0

22.0

20.0

18.0

Ob

es

ity

Ra

te

Scatterplot of Obesity Rate v. Percentage Uninsured, r = .237

9

Page 10: Statistical Analysis of Potential Causes of Obesity in the U.S.

Table 3, below, shows the regression analysis for the research. The independent variables were entered stepwise with the probability to enter set at .200 and the probability to remove set at .250. After entering stepwise, the resulting independent variables that remained were the diabetes rate, population density and the unemployment rate. Therefore the variables that were removed were per capita income, % grads from HS (25 years and older), and the percent uninsured. The resulting R Square is moderately strong at .663.

Table 3- Regression Results

Eqn. 2 Y= 13.59+ 1.413*Diabetes Rate - .004*Population Density + .719*Unemployment Rate

t-stat (9.14) (7.07)** (-4.302)** (3.17)**

p-value (.000) (.000) (.000) (.003)

r (.626) (-.369) (.281)

n= 50 R-Sq. = .663 F= 30.23 F-Prob. = .000 SE= 1.688

**- Significant at 1% level of Significance

From the regression results the R-Sq., which is the coefficient of determination is equal to .663. This means that 66.3% of the variation in the obesity rate can be explained by or attributed to variation in the diabetes rate, population density and the unemployment rate.

T-Statistics

The research for each independent variable will be tested for significance will the following null and alternative hypothesis: (Results are evident in the table above.)

Ho= B = 0

Ha= B > 0 or B< 0, based on functional specification

The alternative was accepted for each of the independent variables as the (p-value/2) is equal to approximately .00 for each. These independent variables are significant at the 1% level of significance.

The evaluation of the equation would be:

For each percentage increase in the diabetes rate the obesity rate would increase by 1.413, on average with all things equal.

For each increase in population density (population per sq. mile) the obesity rate would decrease by .004, on average with all things equal.

10

Page 11: Statistical Analysis of Potential Causes of Obesity in the U.S.

For each percentage increase in the unemployment rate the obesity rate would increase by .719, on average with all things equal.

F- Statistic

The research appears to be statistically significant at the 1% level given that the F- statistic is equal to 30.23 and the significance is equal to .000. Where:

Ho= B Diabetes Rate= B Population Density = B Unemployment Rate = 0

Ha = at least one B is not equal to zero.

The alternative would be accepted that at least one B is not equal to zero, given that the F significance is equal to .000.

Figure 8- Histogram of Residuals

Figure 8, below, presents the histogram of the residuals. The histogram is appears to be approximately normally distributed.

2.500000.00000-2.50000

RES_1

10

8

6

4

2

0

Fre

qu

enc

y

Mean =6.3976602E-15Std. Dev. =1.63519572

N =50

Histogram of Residuals

11

Page 12: Statistical Analysis of Potential Causes of Obesity in the U.S.

Figure 9- Scatterplot of Actual and Predicted Values

Figure 9, below, presents the scatterplot of the dependent variable, Obesity Rate, and the predicted value. The figure appears to be positive, linear and possesses no outliers.

32.0000030.0000028.0000026.0000024.0000022.0000020.00000

PRE_1

32.0

30.0

28.0

26.0

24.0

22.0

20.0

18.0

Ob

esit

yRat

e

Scatterplot of Actual v. Predicted

Figure 10- Scatterplot of Residuals v. Per Capita Income Figure 10, below, presents the scatterplot of the residuals v. per capita income.

Correlation exists as there appears to be linear relationship with no visible curves.

12

Page 13: Statistical Analysis of Potential Causes of Obesity in the U.S.

5000045000400003500030000

PerCapitaIncome

2.50000

0.00000

-2.50000

RE

S_

1Scatterplot of Residuals v. Per Capita Income

Figure 11- Scatterplot of Residuals v. Unemployment Rate Figure 11, below, presents the scatterplot of the residuals v. the unemployment rate.

There appears to be linear relationship with a “cluster” of points. There also appears to be one possible outlier.

8.07.06.05.04.03.02.0

UnemploymentRate

2.50000

0.00000

-2.50000

RE

S_

1

Scatterplot of Residuals v. Unemployment Rate

13

Page 14: Statistical Analysis of Potential Causes of Obesity in the U.S.

Figure 12- Scatterplot of Residuals v. Percent Grads from HS (25 years and older)Figure 12, below, presents the scatterplot of the residuals v. % HS grads (25 years and

older). There appears to be a linear relationship with no visible curves.

90.087.084.081.078.0

GradfromHS25yearsandolder

2.50000

0.00000

-2.50000

RE

S_

1

Scatterplot of Residuals v. Percent of High School Grads (25 years and older)

Figure 13- Scatterplot of Residuals v. Diabetes RateFigure 13, below, presents the scatterplot of the residuals v. diabetes rate. There

appears to be a linear relationship with no curves and two potential outliers.

11.010.09.08.07.06.05.04.0

DiabetesRate

2.50000

0.00000

-2.50000

RE

S_

1

Scatterplot of Residuals v. Diabetes Rate

14

Page 15: Statistical Analysis of Potential Causes of Obesity in the U.S.

Figure 14- Scatterplot of Residuals v. Population Density Figure 14, below, presents the scatterplot of the residuals v. population density. There

appears to be a discontinuous, random, linear relationship with a few potential outliers.

1200.01000.0800.0600.0400.0200.00.0

PopulationDensity

2.50000

0.00000

-2.50000

RE

S_1

Scatterplot of Residuals v. Population Density

Figure 15- Scatterplot of Residuals v. Percent Uninsured Figure 15, below, presents the scatterplot of the residuals v. the percent uninsured.

There appears to be a random, linear relationship.

24.021.018.015.012.09.06.0

PercentageUninsured

2.50000

0.00000

-2.50000

RE

S_1

Scatterplot of Residuals v. Percentage Uninsured

15

Page 16: Statistical Analysis of Potential Causes of Obesity in the U.S.

V. Conclusions

The research presented was fairly successful, but may need some changes before being presented to a panel of professionals. The explanatory power of .663 proves to be moderately strong therefore validity may be found from this research. The greatest effect on obesity from this research proved to be the diabetes rate. This may warrant further investigation as there may be a question of causality. Is it diabetes that increases obesity, or does obesity increase diabetes? This is an issue that may be of some interest to healthcare professionals and they may need to do further research to draw any definitive conclusions. The multicollinearity presented in the correlation matrix may have biased the coefficients presented in Eqn. 2. Therefore the interpretation of this sample regression line may not be very accurate. The research may be improved by investigating other independent variables that were not used in this research and not used in prior research as outlined in Section II- Prior Research. This research can be utilized as a starting point for healthcare professionals in further investigating the link between diabetes and obesity. Also, government and public policy advocates may have an interest in the link between the unemployment rate and the resulting increase in obesity.

16