Post on 05-Apr-2022
Defining, Measuring, and Manipulating Variables
Operational definition of a construct Constructs:
Hunger, aggression, happiness, success, intelligence …
Operational definition: How is the construct measured?
Hunger: scale of 1-7 subjective feeling
Hunger: # hrs since last ate
Accuracy of operational definition Circular or tautological definitions
Definitions may not match construct
Definitions may differ between researchers
Caffeine consumption questionnaire
How is caffeine consumption operationally defined?
Scales of measurement Nominal, ordinal, interval, ratio (p61)
Also distinguished as discrete vs. continuous variables
Or qualitative vs. quantitative
What scale of measurement is used?
Item Scale of measurement
True-false test
IQ test scores
Political affiliation
Top 10 basketball teams
Time to finish an exam
List of favorite to least favorite teachers
Zip code
Class rank
Nominal
Interval
Nominal
Ordinal
Ratio
Ordinal
Nominal
Ordinal
What scale of measurement is used? Indicate your attitude toward scientific research by placing a
check mark on each scale Positive __ __ __ __ __ __ __ Negative Worthless __ __ __ __ __ __ __ Valuable Unethical __ __ __ __ __ __ __ Ethical
Circle your answer: Scientific research has produced many advances that have
significantly enhanced the quality of human life. strongly agree agree neutral disagree strongly disagree
Above examples use “Likert scale” Each response can be numbered from 1 – 7 = interval scale
Weinle (2003) Examine use of drawing to facilitate kid’s narrative about
emotional events. Participants: 6, 7, 8-yr-olds Method: Interviewed about “mad” or “sad” events ½ asked to draw picture while talking; ½ just talked Results: Children who drew while talking provided
significantly longer and richer narratives
What are the scales of measurement for IVs and DV? IV:
Age = ratio; Activity while talking = nominal; Emotion of event = nominal
DV: Length of narrative = ratio; Richness of narrative = interval or
ordinal
Caffeine consumption questionnaire What scales of
measurement are used?
What other questions could be asked that use other scales?
Nominal
Ordinal
Interval
Ratio
Types of measures (p65)
What type of measurement is used? What scale of measurement? Geriatric Depression Scale (GDS)
Choose the best answer for how you have felt over the past week: YES / NO
1. Are you basically satisfied with your life?
2. Have you dropped many of your activities and interests?
3. Do you feel that your life is empty?
4. Do you often get bored?
5. Are you in good spirits most of the time?
6. Are you afraid that something bad is going to happen to you?
7. Do you feel happy most of the time?
8. Do you often feel helpless?
9. Do you prefer to stay at home, rather than going out and doing new things?
10. Do you feel you have more problems with memory than most?
11. Do you think it is wonderful to be alive now?
12. Do you feel pretty worthless the way you are now?
13. Do you feel full of energy?
14. Do you feel that your situation is hopeless?
15. Do you think that most people are better off than you are?
Reliability “Consistency and stability of a measuring instrument”(p65)
Is the scale free from random error?
Observed score = true score + error “High reliability” = low error
Types of errors: Method error (e.g. test situation, equipment error) Trait error (e.g. fatigue, health, truthfulness)
Theoretical reliability True score / true score + error score
Measured reliability Correlation coefficient: -1.0 to 0 to +1.0 .70 – 1.0 Strong; .30 - .69 Moderate; .00 - .29 Weak Not all-or-none; a more or less reliable measure
Correlational design Scatterplot: relationship between 2 quantitative variables
How 1 variable relates to or influences another variable
Individual =
dot (X and Y
data point)
Lexical decision task and measurement error
Press “yes” button when you see a word (“crow”) Press “no” button when you see a non-word (“cwor”) IV: stimulus (word/non-word) DV: RT (time to press button)
Types of measurement errors? Ss responds more slowly on later trials due to fatigue Ss responds more quickly b/c just saw the word before coming to
the lab Ss responds more slowly because sneezing during trial Ss performs poorly b/c can’t read words clearly on screen; b/c room
is too warm; b/c thinking about other things…
Something affects behavior other than the variable you are studying
Types of reliability How can you measure reliability? Test-retest reliability
Compare same test on 2 occasions
Alternative forms reliability Compare equivalent or similar tests
Split-half reliability Compare performance on 2 halves of a test
Inter-rater reliability
Consistency/agreement between 2 judges # agree / # possible agree x 100 What types of measures would use this?
Kazdin (1990): Automatic thoughts questionnaire
“An examination of the internal consistency of the ATQ yielded a coefficient alpha of .96… These statistics suggest a high level of internal consistency.”
Reliability statistic: Chronbach’s alpha
Average correlation among all items of scale
.70 – 1.0 Strong; .30 - .69 Moderate; .00 - .29 Weak
“Individual item-total score correlations, presented in Table 1, were in the moderate to high range (r’s = .39 to .81). The mean item-total correlation… was .69.”
Reliability measurements: Inter-item correlation matrix
All correlations should be positive
Validity Does measure provide info on what we really want to
measure?
Multiple types of validity
Content validity
Criterion validity
Construct validity
Validity is not all-or-none, but on a scale
Can be high in 1 type of validity and low on others
Later… (ch8)
Internal validity: eliminated extraneous variables
External validity: findings will generalize to other contexts
Content validity Does test have representative samples of behavior
Does content of test reflect what we want to measure?
Are all aspects of content represented fairly?
e.g. exams in courses Do test items match with what you’ve learned/studied?
e.g. depression questionnaire Does test measure all behaviors that would be of interest?
The more specific the variable, easier it is to get good content validity
Face validity: does it appear to be valid Examine what test appears to measure on surface
But, does not provide any real evidence
Criterion validity Extent predicts behavior or ability in area
Compare scores on measure with another criterion (area)
Concurrent validity
Test used to predict present performance
e.g. pilot or driving test
Predictive validity
Test used to predict future performance
e.g. SAT or GRE
Convergent validity
Significant (pos or neg) correlations found where expected
Discriminant (or divergent) validity
Zero correlations found between variables supposed to be unrelated
Construct validity Degree test accurately measures construct
Examine if concept is being operationalized in a useful way
e.g. depression questionnaire
Is test measuring same construct in all populations that are tested (young – older adults; all cultures)?
e.g. induce depression
Ss read positive or negative statements to induce or diminish depressed mood but does it resemble naturally occurring depression? Does method have construct validity?
Kazdin (1990): ATQ “Criterion validity: Depressed versus nondepressed children” … A
one-way ANOVA of total ATQ scores indicated that depressed children were significantly higher in negative thoughts (M = 82.8) than were nondepressed children (M = 52.9), F(1, 136) = 47.02, p < .001. Overall ATQ score is predicting which group children belong to.
Another section examines which particular items (or statements on scale) distinguish groups
“Convergent validity. … As shown in Table 2 performance on ATQ correlated significantly with other measures of cognitive processes related to depression. Children who indicated more negative thoughts showed lower self-esteem, greater hopelessness, and more external attribution of control. The correlations … support convergent validity of the ATQ.”
Kazdin (1990): ATQ “Discriminant validity. … As can be
seen in Table 2, the ATQ did not correlate significantly with severity of impairment or social competence. These findings would seem to support the discriminant validity of the ATQ. However the absence of … correlations might have been due to the different raters (children vs parents).”
“These results suggest that the ATQ tended to correlate more highly with other measures of cognitive processes and with depression than with measures of prosocial behavior and positive affective experience.
Reliability and validity What is relationship between reliability and validity?
Can measure be valid w/o being reliable?
No
Can measure be reliable w/o being valid?
Yes
Can give same score each time but not give any useful information!
e.g. Measure height as estimate of intelligence
The SAT: What type of validity? Is the SAT useful in predicting how well students
perform in college?
Does SAT-math test concepts from math courses at high school level?
Do SAT questions measure “academic strength”?
Are SAT-math and SAT-verbal positively correlated?
The SAT Is the SAT useful in predicting how well students
perform in college?
Criterion or predictive validity
Does SAT-math test concepts from math courses at high school level?
Content validity
Do SAT questions measure “academic strength”?
Construct validity
Are SAT-math and SAT-verbal positively correlated?
Convergent validity