GS/PPAL 6200 3.00 Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT...
-
Upload
amberly-ward -
Category
Documents
-
view
216 -
download
2
Transcript of GS/PPAL 6200 3.00 Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT...
GS/PPAL 6200 3.00 Section NResearch Methods and Information Systems
A QUANTITATIVE RESEARCH PROJECT -(1) DATA COLLECTION(2) DATA DESCRIPTION(3) DATA ANALYSIS II
Agenda
• Correlations• Correlation Coefficient: a quantitative
measure of linear correlations• Correlation Strength versus Statistical
Significance• Simple Regression Analyses• Quantitative Research Project – Recap• Course Conclusion
Correlations
• Is CGPA related in some way to total hours studied (H)? Statistically, is the mean value of CGPA varying in some way with H?
• Remember, we need to account for the fact that they each tend to deviate from their true mean randomly.
• The “correlation coefficient” for a set of observations is a function of how much each of the observed values deviate from the sample means adjusted for (i.e., not explained by) random deviation
Correlation Images
"Correlation examples2" by Denis Boigelot, original uploader was Imagecreator - Own work, original uploader was Imagecreator. Licensed under CC0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Correlation_examples2.svg#/media/File:Correlation_examples2.svg
Representing Linear Correlation
1. For a population, the typical notation is:ρ (H,C) = corr(H,C) = cov (H,C)/σHσC
= 1/(n-1) * Σ [(H-μH)(C- μC)]/ σHσC
2. For a sample from that same population (changing the notation only):r (H, C) = 1/(n-1) * Σ [(H-avgH)(C- avgC)]/ sHsC
• Excel program to calculate (2) above:= CORREL (data array (H), data array (CGPA)), OR= PEARSON (data array (H), data array (CGPA))
Population Correlation Coefficient
• The Pearson correlation coefficient (numbers above images) measures only the linear relationship between two variables
Correlation Coefficient (= 0.816) versus Visual Inspection of Data
"Anscombe's quartet 3" by Anscombe.svg: Schutzderivative work (label using subscripts): Avenue (talk) - Anscombe.svg. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Anscombe%27s_quartet_3.svg#/media/File:Anscombe%27s_quartet_3.svg
Correlations and Predictions
• Presence of a (linear) correlation may offer predictive information that may be useful
• It may (but may not) suggest causality to be examined further - “correlation does not imply causation” (when there is no control group)
• It may suggest policy considerations (policy action, spillover effects, consequences)
10-case Study
Raw Data Scatter Plot with Linear Trend
Case CGPATotal Hours
Studied1 7.67 352 6.83 293 4.17 234 7.67 505 5.00 326 4.17 227 5.00 178 7.33 409 6.83 44
10 6.33 38
Correlation for 10-case Study
• = CORREL (CGPA, HOURS) • = PEARSON (CGPA, HOURS)• = 0.7944
• R-squared = 0.7944 * 0.7944 = 0.63
• If CGPA is a linear function of HOURS and CGPA is normally distributed, then R-squared gives the “explained variance” or 63% if the variation in CGPA can be “explained” by variation in HOURS
Strength versus Significance
• A “strong” correlation may or may not be significant
• A “weak” correlation may or may not be significant
• Key is the size of the sample – for small samples a strong correlation may still be by chance; for large samples it is easy to achieve significance for weak correlations
T-test for Significance
• Null Hypothesis: Ho: r = 0• Alternative Hypothesis: Ha: r ≠ 0 (i.e., there is a
positive or negative correlation that is significant)• Correlation coefficient ( r)• Adjust by weighting (dividing) r by its standard
error = se(r) = [(1-r2)/(n-2)]1/2 • T-stat* = r/se(r) • Compare t-stat* to critical t-value for (n-2)
degrees of freedom and chosen significance level
10-case Study
• Correlation coefficient ( r) for our study = 0.79• se(r) = [1-0.63/(10-2)]1/2 = [0.046] 1/2 = 0.214• T-stat = r/se(r) = 0.79/0.214 = 3.69• For 8 df, two-tailed test @ 95% Confidence,
critical t-value = T.INV.2T(.05,8) = 2.306• 3.69 > 2.306 the correlation would NOT occur
by chance 95% of the time, therefore reject null hypothesis conclude that hours studied is (positively) correlated with CGPA
Representing Linear Relationships
• Since CGPA and HOURS appear to be strongly positively correlated (but it may only be an artifact of the small sample size) and statistically significant (despite being a small sample) then examine relationship more closely
• General linear relationship: Y = mX + b • for Y dependent variable, X independent or
explanatory variable, and b some constant
Graphically
• Locate coordinates (2, 4) that is, X = 2, Y = 4
• Locate coordinates (3, 5)• When X increases by +1
(from 2 to 3) how much does Y increase by? (=m)
• When X = 0, what does Y equal? (= b)
• Therefore model is Y = 1*X + 2
1 2 3 4 5 6 70
1
2
3
4
5
6
7
8
9
10
Y = mX + b
X Variable
Y Va
riab
le
CGPA and HOURS
• For the linear trend line, CGPA = Intercept (b) + coefficient (m) * HOURS
• CGPA = 2.6 + 0.105*HOURS
• For every +1 hour studied per month, by how much does CGPA increase?
• How did we obtain the linear trend line? 15 20 25 30 35 40 45 50 55
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
Hours Studied
CGPA
Regression Analysis - Intuition
• The estimated linear trend line specifies the linear relationship that “best fits” the data
• A “best fit” model is one that minimizes the amount an observation deviates from the hypothesized model
• “Best fit” here means to minimize the sum of the squared deviations between the data points and the linear trend line (model)
• “Linear Least Squares Regression Model”
Regression Analysis - Mechanics
• In Excel: “Data Analysis” “Regression”
• Coefficients: values of “b” (intercept) and “m” coefficient on explanatory variable
• Standard Error, t-stat, P-value and CI (95%) for each estimate
Data Interpretation (again)
• From the Regression Output we know:CGPA = 2.6 + 0.105*HOURS
• For every +1 hour studied, CGPA on graduation increases by 0.105
• Graduating students with +1 grade point higher than other graduating students, studied on average + 9.52 more hours per month (9.52 = 1 / 0.105)
P-Value Approach to Statistical Significance of Total Hours Studied
• H0: coefficient on HOURS = 0; HA: ≠ 0• P-value approach: P-value = 0.0061 < .05 or the
probability this coefficient is obtained purely by chance is less than 5% reject H0 data support HA
• Note: for a 1-sided test (e.g., coefficient > 0) divide reported P-value by 2
Critical Value Approach to Statistical Significance of Total Hours Studied
• H0: coefficient on HOURS = 0; HA: ≠ 0• Critical value approach: critical value =
T.INV.2T (0.05, 9) = 2.622• t-stat = 3.699 > 2.622 reject H0
A Quantitative Research Project: Recapitulation
Research Topic: Academic Performance Research Questions: How well do graduating students
perform academically? What explains that performance? Measure “academic performance” by graduating CGPA
Research Design: Cross-sectional analysis of graduating students in a given year
Data Collection: Survey (a random sample of 10) students graduating in 2014
Data Description: Describe the data with basic statistics Data Analysis: Reasons for attending university and
performance; Total hours studied and CGPA
Research in Public PolicyExcerpted from Morçöl and Ivanova (2010)
Categories of Methods Quantitative Orientation Qualitative Orientation
Empirical Inquiry - Design Methods
Experimental, Cross-sectional, Longitudinal
Case study
Empirical Inquiry - Data Collection Methods
Surveys, Secondary Data Qualitative (long, in-depth, or semi-structured) Interviews
Empirical Inquiry - Data Analysis Methods
Statistical, Regression, or Time-series Analyses
(Computer-assisted) Qualitative Data Analyses
Empirical Inquiry - Combined Methods
Game Theory, Simulations, Systems Analysis, Meta-Analyses, Network Analyses
Case study, Legal Analyses, Archival, Ethnography, Grounded Theory, Textual Analyses
Methods of Decision Making and Planning
Cost-benefit, Decision Analyses, Linear Programming
Brainstorming, Delphi
Quality of Quantitative (Qualitative) Research:Reliability, Relevance, Validity
• Reliability: can we replicate the research results? (are the results dependable?)
• Relevance: are results of practical significance? (are results trustworthy or authentic?)
• Construct Validity: do quantities observed reflect research variables of interest? (is there objectivity?)
• Internal Validity: is there a causal relationship between the independent and dependent variables? (is there credibility?)
• External Validity: can we generalize beyond the one study? (are results transferable?)
Achieving Learning Outcomes
• Basic user familiarity requires familiarity with– research ethics– existing data sets– the collection of qualitative and quantitative data – data measurement– sampling – advantages and disadvantages of different research
methods– descriptive and inferential statistics
Learning Outcomes? Understand key concepts in research Apply critical analytical skills to published research Understand the application, value and limits of quantitative and
qualitative research methodologies and techniques / tools Develop skills in devising and designing research methods suitable
for different policy contexts and for rigorous analysis Provide a grounding in ethical issues related to:
– academic research – the role of the public servant as a custodian of data and information
balancing the public’s right to know against the personal data and information which an individual citizen has a right to be kept confidential
Good Luck!
And THANK YOU…
…for the journey,…for your patience,
…your curiosity, …your humour!