GS/PPAL 6200 3.00 Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT...

GS/PPAL 6200 3.00 Section NResearch Methods and Information Systems

A QUANTITATIVE RESEARCH PROJECT -(1) DATA COLLECTION(2) DATA DESCRIPTION(3) DATA ANALYSIS II

Agenda

• Correlations• Correlation Coefficient: a quantitative

measure of linear correlations• Correlation Strength versus Statistical

Significance• Simple Regression Analyses• Quantitative Research Project – Recap• Course Conclusion

Correlations

• Is CGPA related in some way to total hours studied (H)? Statistically, is the mean value of CGPA varying in some way with H?

• Remember, we need to account for the fact that they each tend to deviate from their true mean randomly.

• The “correlation coefficient” for a set of observations is a function of how much each of the observed values deviate from the sample means adjusted for (i.e., not explained by) random deviation

Correlation Images

"Correlation examples2" by Denis Boigelot, original uploader was Imagecreator - Own work, original uploader was Imagecreator. Licensed under CC0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Correlation_examples2.svg#/media/File:Correlation_examples2.svg

Representing Linear Correlation

1. For a population, the typical notation is:ρ (H,C) = corr(H,C) = cov (H,C)/σHσC

= 1/(n-1) * Σ [(H-μH)(C- μC)]/ σHσC

2. For a sample from that same population (changing the notation only):r (H, C) = 1/(n-1) * Σ [(H-avgH)(C- avgC)]/ sHsC

• Excel program to calculate (2) above:= CORREL (data array (H), data array (CGPA)), OR= PEARSON (data array (H), data array (CGPA))

Population Correlation Coefficient

• The Pearson correlation coefficient (numbers above images) measures only the linear relationship between two variables

Correlation Coefficient (= 0.816) versus Visual Inspection of Data

"Anscombe's quartet 3" by Anscombe.svg: Schutzderivative work (label using subscripts): Avenue (talk) - Anscombe.svg. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Anscombe%27s_quartet_3.svg#/media/File:Anscombe%27s_quartet_3.svg

Correlations and Predictions

• Presence of a (linear) correlation may offer predictive information that may be useful

• It may (but may not) suggest causality to be examined further - “correlation does not imply causation” (when there is no control group)

• It may suggest policy considerations (policy action, spillover effects, consequences)

10-case Study

Raw Data Scatter Plot with Linear Trend

Case CGPATotal Hours

Studied1 7.67 352 6.83 293 4.17 234 7.67 505 5.00 326 4.17 227 5.00 178 7.33 409 6.83 44

10 6.33 38

Correlation for 10-case Study

• = CORREL (CGPA, HOURS) • = PEARSON (CGPA, HOURS)• = 0.7944

• R-squared = 0.7944 * 0.7944 = 0.63

• If CGPA is a linear function of HOURS and CGPA is normally distributed, then R-squared gives the “explained variance” or 63% if the variation in CGPA can be “explained” by variation in HOURS

Strength versus Significance

• A “strong” correlation may or may not be significant

• A “weak” correlation may or may not be significant

• Key is the size of the sample – for small samples a strong correlation may still be by chance; for large samples it is easy to achieve significance for weak correlations

T-test for Significance

• Null Hypothesis: Ho: r = 0• Alternative Hypothesis: Ha: r ≠ 0 (i.e., there is a

positive or negative correlation that is significant)• Correlation coefficient ( r)• Adjust by weighting (dividing) r by its standard

error = se(r) = [(1-r2)/(n-2)]1/2 • T-stat* = r/se(r) • Compare t-stat* to critical t-value for (n-2)

degrees of freedom and chosen significance level

10-case Study

• Correlation coefficient ( r) for our study = 0.79• se(r) = [1-0.63/(10-2)]1/2 = [0.046] 1/2 = 0.214• T-stat = r/se(r) = 0.79/0.214 = 3.69• For 8 df, two-tailed test @ 95% Confidence,

critical t-value = T.INV.2T(.05,8) = 2.306• 3.69 > 2.306 the correlation would NOT occur

by chance 95% of the time, therefore reject null hypothesis conclude that hours studied is (positively) correlated with CGPA

Representing Linear Relationships

• Since CGPA and HOURS appear to be strongly positively correlated (but it may only be an artifact of the small sample size) and statistically significant (despite being a small sample) then examine relationship more closely

• General linear relationship: Y = mX + b • for Y dependent variable, X independent or

explanatory variable, and b some constant

Graphically

• Locate coordinates (2, 4) that is, X = 2, Y = 4

• Locate coordinates (3, 5)• When X increases by +1

(from 2 to 3) how much does Y increase by? (=m)

• When X = 0, what does Y equal? (= b)

• Therefore model is Y = 1*X + 2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

8

9

10

Y = mX + b

X Variable

Y Va

riab

le

CGPA and HOURS

• For the linear trend line, CGPA = Intercept (b) + coefficient (m) * HOURS

• CGPA = 2.6 + 0.105*HOURS

• For every +1 hour studied per month, by how much does CGPA increase?

• How did we obtain the linear trend line? 15 20 25 30 35 40 45 50 55

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

Hours Studied

CGPA

Regression Analysis - Intuition

• The estimated linear trend line specifies the linear relationship that “best fits” the data

• A “best fit” model is one that minimizes the amount an observation deviates from the hypothesized model

• “Best fit” here means to minimize the sum of the squared deviations between the data points and the linear trend line (model)

• “Linear Least Squares Regression Model”

Regression Analysis - Mechanics

• In Excel: “Data Analysis” “Regression”

• Coefficients: values of “b” (intercept) and “m” coefficient on explanatory variable

• Standard Error, t-stat, P-value and CI (95%) for each estimate

Data Interpretation (again)

• From the Regression Output we know:CGPA = 2.6 + 0.105*HOURS

• For every +1 hour studied, CGPA on graduation increases by 0.105

• Graduating students with +1 grade point higher than other graduating students, studied on average + 9.52 more hours per month (9.52 = 1 / 0.105)

P-Value Approach to Statistical Significance of Total Hours Studied

• H0: coefficient on HOURS = 0; HA: ≠ 0• P-value approach: P-value = 0.0061 < .05 or the

probability this coefficient is obtained purely by chance is less than 5% reject H0 data support HA

• Note: for a 1-sided test (e.g., coefficient > 0) divide reported P-value by 2

Critical Value Approach to Statistical Significance of Total Hours Studied

• H0: coefficient on HOURS = 0; HA: ≠ 0• Critical value approach: critical value =

T.INV.2T (0.05, 9) = 2.622• t-stat = 3.699 > 2.622 reject H0

A Quantitative Research Project: Recapitulation

Research Topic: Academic Performance Research Questions: How well do graduating students

perform academically? What explains that performance? Measure “academic performance” by graduating CGPA

Research Design: Cross-sectional analysis of graduating students in a given year

Data Collection: Survey (a random sample of 10) students graduating in 2014

Data Description: Describe the data with basic statistics Data Analysis: Reasons for attending university and

performance; Total hours studied and CGPA

Research in Public PolicyExcerpted from Morçöl and Ivanova (2010)

Categories of Methods Quantitative Orientation Qualitative Orientation

Empirical Inquiry - Design Methods

Experimental, Cross-sectional, Longitudinal

Case study

Empirical Inquiry - Data Collection Methods

Surveys, Secondary Data Qualitative (long, in-depth, or semi-structured) Interviews

Empirical Inquiry - Data Analysis Methods

Statistical, Regression, or Time-series Analyses

(Computer-assisted) Qualitative Data Analyses

Empirical Inquiry - Combined Methods

Game Theory, Simulations, Systems Analysis, Meta-Analyses, Network Analyses

Case study, Legal Analyses, Archival, Ethnography, Grounded Theory, Textual Analyses

Methods of Decision Making and Planning

Cost-benefit, Decision Analyses, Linear Programming

Brainstorming, Delphi

Quality of Quantitative (Qualitative) Research:Reliability, Relevance, Validity

• Reliability: can we replicate the research results? (are the results dependable?)

• Relevance: are results of practical significance? (are results trustworthy or authentic?)

• Construct Validity: do quantities observed reflect research variables of interest? (is there objectivity?)

• Internal Validity: is there a causal relationship between the independent and dependent variables? (is there credibility?)

• External Validity: can we generalize beyond the one study? (are results transferable?)

Achieving Learning Outcomes

• Basic user familiarity requires familiarity with– research ethics– existing data sets– the collection of qualitative and quantitative data – data measurement– sampling – advantages and disadvantages of different research

methods– descriptive and inferential statistics

Learning Outcomes? Understand key concepts in research Apply critical analytical skills to published research Understand the application, value and limits of quantitative and

qualitative research methodologies and techniques / tools Develop skills in devising and designing research methods suitable

for different policy contexts and for rigorous analysis Provide a grounding in ethical issues related to:

– academic research – the role of the public servant as a custodian of data and information

balancing the public’s right to know against the personal data and information which an individual citizen has a right to be kept confidential

Good Luck!

And THANK YOU…

…for the journey,…for your patience,

…your curiosity, …your humour!

GS/PPAL 6200 3.00 Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT...

Documents

Transcript of GS/PPAL 6200 3.00 Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT...