Temporal Stability of Performance Failure Appraisal Inventory Items

21
This article was downloaded by: [University of Calgary] On: 29 September 2013, At: 04:26 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Measurement in Physical Education and Exercise Science Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hmpe20 Temporal Stability of Performance Failure Appraisal Inventory Items David E. Conroy & Jonathan N. Metzler Published online: 18 Nov 2009. To cite this article: David E. Conroy & Jonathan N. Metzler (2003) Temporal Stability of Performance Failure Appraisal Inventory Items, Measurement in Physical Education and Exercise Science, 7:4, 243-261, DOI: 10.1207/S15327841MPEE0704_3 To link to this article: http://dx.doi.org/10.1207/S15327841MPEE0704_3 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is

Transcript of Temporal Stability of Performance Failure Appraisal Inventory Items

This article was downloaded by: [University of Calgary]On: 29 September 2013, At: 04:26Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK

Measurement in PhysicalEducation and Exercise SciencePublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/hmpe20

Temporal Stability ofPerformance Failure AppraisalInventory ItemsDavid E. Conroy & Jonathan N. MetzlerPublished online: 18 Nov 2009.

To cite this article: David E. Conroy & Jonathan N. Metzler (2003) Temporal Stabilityof Performance Failure Appraisal Inventory Items, Measurement in Physical Educationand Exercise Science, 7:4, 243-261, DOI: 10.1207/S15327841MPEE0704_3

To link to this article: http://dx.doi.org/10.1207/S15327841MPEE0704_3

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and viewsexpressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for anylosses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of theContent.

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone is

expressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

MEASUREMENT IN PHYSICAL EDUCATION AND EXERCISE SCIENCE, 7(4), 243–261Copyright © 2003, Lawrence Erlbaum Associates, Inc.

Temporal Stability of Performance FailureAppraisal Inventory Items

David E. Conroy and Jonathan N. MetzlerDepartment of Kinesiology

The Pennsylvania State University

This study was designed to investigate the stability of responses to items on the longand short forms of the Performance Failure Appraisal Inventory (PFAI) (Conroy,2001), a dispositional measure of fear of failure and cognitive-motivational-relationalappraisals associated with the fear of failure. Female college students and male col-lege students (N = 356) enrolled in physical activity classes completed the PFAI fourtimes in a 3-week interval. In general, responses to PFAI items exhibited a level ofstability that would be expected based on the previous investigations of dispositionalmeasures of anxiety in physical activity settings (all proportion of agreement ±1 >.70 for all items at all time periods). Two potentially problematic items were identi-fied and proposals were offered for improving these items. These results contribute tothe accumulating evidence supporting the validity and reliability of PFAI scores; re-searchers are encouraged to consider the PFAI for research on fear of failure.

Key words: stability, fear of failure, item analysis

Traditionally, psychometric theorists have emphasized measurement reliability orstability at the level of the scale score representing an unobserved variable(Nunnally & Bernstein, 1994; Schutz, 1998). Poor reliability is an undesirablepsychometric property not only because it decreases confidence in the accuracy ofany single score estimate but also because it attenuates correlations with othermeasures. Recently, researchers have highlighted the importance of assessing thestability of responses to individual items (Nevill, Lane, Kilgour, Bowes, & Whyte,2001; Wilson & Batterham, 1999). A primary reason for evaluating item-level sta-bility in addition to factor-level stability is that “poor reliability or stability of in-dividual items may be overlooked in the ‘averaging or canceling out’ process

Requests for reprints should be addressed to David E. Conroy, 267 Rec Hall, Department of Kinesi-ology, The Pennsylvania State University, University Park, PA 16802. E-mail: [email protected]

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

when assessing the reliability or stability of [scale scores]” (Nevill et al., 2001,p. 273). Increased attention to the importance of item-level stability resulted in thedevelopment of methods (e.g., proportion of exact response agreement, propor-tion of negligible response differences, 95% limits of agreement) that overcomelimitations of popular indices of relative agreement (i.e., Pearson product-momentcorrelations). This study was designed to apply those methods to investigate thestability of responses to Performance Failure Appraisal Inventory items (Conroy,2001; Conroy, Willow, & Metzler, 2002) over six different time periods.

THE PERFORMANCE FAILURE APPRAISAL INVENTORY

The Performance Failure Appraisal Inventory assesses beliefs concerning the like-lihood that failure leads to aversive consequences (i.e., beliefs that would predis-pose individuals to appraise threat and experience state anxiety when they are fail-ing in achievement situations; Conroy, 2001; Conroy et al., 2002). These beliefspertain to fears of: (a) experiencing shame and embarrassment, (b) devaluingone’s self-estimate, (c) having an uncertain future, (d) important others losing in-terest, and (e) upsetting important others. Relationships between these five ap-praisals can be modeled accurately and parsimoniously with a single higher-orderfactor representing general fear of failure (FF) (Conroy et al.).

Several studies provide a basis for valid interpretations of scores from the PFAIand PFAI short-form (PFAI-S), which includes one item from each PFAI subscale.The beliefs sampled by the PFAI and PFAI-S were drawn from a content analysisof athletes’ and performing artists’ descriptions of the consequences of failing(and not succeeding; Conroy, Poczwardowski, & Henschen, 2001). Factorial va-lidity of scores was documented by Conroy, 2001; Conroy et al., 2002). Thosestudies also established expected relationships between PFAI and PFAI-S scoresand measures of other constructs such as trait anxiety, achievement goal orienta-tions, social desirability, hope, optimism, fear of success, sport anxiety, worry,concentration disruption, and somatic anxiety. Conroy, Metzler, and Hofer (2003)found strong support for the structural stability (i.e., longitudinal factorial invari-ance), differential stability (i.e., test-retest reliability of latent variable scores), and latent mean stability in fixed and random effect models of PFAI and PFAI-Sscores.

ITEM-LEVEL STABILITY ESTIMATES

Relying solely on factor-level stability estimates based on combinations of severalitems may mask individual items that draw inconsistent responses across admin-istrations (Nevill et al., 2001; Oppenheim, 1966; Wilson & Batterham, 1999).

244 CONROY AND METZLER

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

Evaluating stability at the item-level enhances understanding of how individualsrespond to individual items on a measure over time. Such evidence can either bol-ster users’ confidence that test stimuli evoke consistent behaviors over time or itmay serve a diagnostic function by providing direction(s) for scale refinementsbeyond those indicated by investigations of factor score stability.

Although Pearson product-moment correlations would appear to be a logicaltool for evaluating item stability, several problems with such an approach havebeen noted (Bland & Altman, 1986, 1995; Nevill, 1996; Nevill et al., 2001; Vincent, 1995; Wilson & Batterham, 1999). For example, Pearson correlations areindices of association and not within-person agreement and are insensitive to sys-tematic changes in means. Additionally, Pearson correlations assume a bivariatenormal distribution of responses and can be attenuated in homogeneous samplescompared to heterogeneous samples with the same proportion agreement.

Given that Pearson product-moment correlations are problematic when evaluat-ing the stability of responses to individual items on a scale, several alternative in-dices have been proposed. Wilson and Batterham (1999) initially proposed the pro-portion of exact response agreement from test to retest as an index of item-levelstability. Unlike Pearson product-moment correlations, the proportion of exactagreement statistics does not require an assumption of bivariate normality and doesnot depend on high between-individual variance. The utility of this approach is lim-ited by its inability to distinguish near agreements from large disagreements and toidentify the direction of systematic bias in response patterns (Nevill et al., 2001).

To address these limitations, Nevill et al. (2001) proposed the use of discretedifference scores between test-retest responses. Given that these difference scoresare not normally distributed, parametric statistics are inappropriate for assessingagreement. The range of the differences (excluding the top and bottom 2.5% ofscores) can be reported as a nonparametric equivalent of the 95% limits of agree-ment. A second option is to calculate the proportion of difference scores withinsome reference value “chosen to equate to no practically important difference (forexample ±1)” (Nevill et al., 2001, p. 275). The choice of a reference value for thisindex is arbitrary and appropriate reference values may vary for different ranges ofresponse scales. These approaches are superior to correlations between responsesat two time points because they are sensitive to and will reveal systematic biasfrom test to retest. For items that appear to be unstable over time, a nonparametricmedian sign test can reveal whether the instability is the product of random erroror systematic bias (i.e., do retest responses tend to be larger or smaller over time?).

Purpose

This study was designed to investigate the stability of responses to PFAI itemsacross six time periods. Both the long form and short form of the PFAI were

PFAI ITEM STABILITY 245

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

included in the investigation. Previous investigators of item-level stability uti-lized a 3-month time interval to “reduce recall of responses by individuals”(Nevill et al., 2001, p. 275); however, we administered the PFAI measures fourtimes in a 3-week period. This shorter interval was used because theorists haveargued that achievement motivation can be changed in as little as 3-weeks (McClelland, 1965; Theebom, De Knop, & Weiss, 1995) and potential applica-tions of the PFAI (e.g., evaluating interventions to prevent or treat FF) will require frequent test administrations in relatively short time periods to estimate ac-curate change trajectories. These four assessments permitted stability estimationfor six unique time periods (i.e., 2, 5, 7, 14, 19, and 21 days). Data were collectedfrom physical activity classes because competence development is the focus ofthese activities and FF may be a particularly relevant motive for these learners.

METHODS

Participants

College students enrolled in physical activity classes (N = 356) at a large north-eastern university participated in this study on four occasions over a 3-week inter-val for extra credit in their physical activity classes (classes lasted a total of sixweeks). Participants included 106 women (30%), 250 men (70%), and two indi-viduals who did not report their gender. Participants ranged in age from 18 to 34years (M = 21.57, SD = 1.92). Participants were recruited from strength training(n = 175), golf (n = 110), jogging (n = 61), and walking (n = 10) classes. Threeparticipants self-reported receiving treatment for FF or test anxiety and were re-moved from these analyses.

Instruments

Both the 25-item long form and 5-item short form versions of the PFAI (Conroyet al., 2002) were administered in this study. Each item began with one of twostems, “When I am failing…” or “When I am not succeeding…,” followed by anaversive consequence of failing. Participants rated how often they believed eachstatement was true for them on a 5-point scale ranging from –2 (do not believe atall) to +2 (believe 100% of the time); the midpoint of the response scale was an-chored by 0 (believe 50% of the time). Scores from the PFAI and PFAI-S havedemonstrated factorial invariance both across groups (Conroy et al., 2002) andover time (Conroy et al., 2003). Internal consistency estimates (i.e., coefficient al-pha) for first- and second-order factors have ranged from .69 to .90. Interpreta-tions of PFAI and PFAI-S scores as FF indices are supported by their location in a

246 CONROY AND METZLER

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

nomological network that includes trait anxiety, achievement goals, social desir-ability, hope, optimism, fear of success, sport anxiety, worry, cognitive disruption,and somatic anxiety.

Procedures

After being introduced to the purpose of this study, this risks of participating, andtheir rights, participants signed an informed consent document and completed thePFAI and PFAI-S at the end of their class in exchange for a small amount of extracredit in the class. A third instrument unrelated to this research was administeredbetween these measures. On average, participants needed 10 min in total to com-plete the questionnaires. Three additional waves of data collection were com-pleted two days, one week, and three weeks after initial data collection.

Data Analysis

Missing data were handled using pairwise deletion to maximize the amount of dataavailable to estimate the stability of responses. On average, data from 236 partici-pants were used to estimate the stability of an item and no stability estimates werecalculated based on responses from fewer than 204 participants. For each item,proportion exact agreement, proportion ±1 agreement, and nonparametric 95%limits of agreement for test-retest differences were calculated using the proceduresdescribed by Wilson and Batterham (1999) and Nevill et al. (2001). A referencevalue of ±1 was selected for indexing agreement to facilitate comparisons withprevious research (Nevill et al., 2001) and because larger intervals seemed unnec-essarily broad in light of the 5-point scale used for responses. Although Wilson andBatterham advocated bootstrapping, a sampling distribution to estimate 95% con-fidence intervals surrounding point estimates of agreement, Nevill et al. (2001)demonstrated the inefficiency and redundancy of these calculations because con-fidence intervals can be easily estimated and bootstrapped estimates of the sam-pling distribution do not provide additional unique information that informs un-derstanding of item-level stability. Consequently, we omitted the bootstrapanalyses and estimated confidence intervals only for difference scores (and not forproportions of agreement) based on normal approximations to the binomial distri-bution. These confidence intervals can range from –4 to +4 (indicating a switchfrom one extreme rating to the other between the test and retest occasions).

Cut-off criteria for the agreement indices can vary and should be selectedbased on the nature of the characteristics being measured, the length of the timeinterval between measurements, and the intended use of scores (Nunnally & Bernstein, 1994; Wilson & Batterham, 1999). Given that no cut-off values exist

PFAI ITEM STABILITY 247

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

for identifying items associated with inconsistent responding, we emphasized rel-ative comparisons of stability coefficients between items on a factor when inter-preting results. If the responses to a particular item were not noticeably unstablerelative to other items on that subscale, we computed a median sign test to deter-mine whether a systematic bias in responses to that item existed during retests.

RESULTS

Item-level descriptive statistics for the 25 items on the long form PFAI and thefive PFAI short form items are presented in Table 1. A series of paired t-tests (withBonferroni corrections to preserve a family-wise alpha level of .05 for the sixtests) indicated that no items on the PFAI-S exhibited significant mean differencesbetween waves of measurement (i.e., measurement occasions). Long form PFAIitem means were more variable across waves of testing. Means tended to be high-est at wave 1 and exhibited a general tendency to decrease in subsequent waves;means at wave 4 were the lowest. Eight items exhibited no changes between anywaves. Responses to items on the fear of important others losing interest subscaleappeared to be especially stable because three of the five items did not exhibit anystatistically significant changes in mean scores over time. The two items whosemeans did change increased over time instead of decreasing like items on theother subscales.

Stability results for seven items on the fear of experiencing shame and embar-rassment subscale are presented in Table 2. The mean proportions of exact and ±1agreement, and nonparametric 95% limits of agreement for these items across alltime periods were .51, .87, and –1.95–2.20, respectively. No items on this sub-scale were distinguished from other subscale items by low stability coefficients.Although the range of nonparametric 95% limits of agreement for several itemsbegan to broaden after 7 days, it was only at the 21-day mark that response sta-bility demonstrated a noticeable degradation on more than one stability index(particularly for items 18, 20, and 25 whose retest responses tended to be lowerthan the original responses, ps < .05).

Stability results for four items on the fear of devaluing one’s self-estimate sub-scale are presented in Table 3. The mean proportions of exact and ±1 agreement,and nonparametric 95% limits of agreement for these items across all time peri-ods were .55, .90, and –1.75–2.00, respectively. Responses to items 1, 4, and 7 exhibited greater stability over time than did responses to item 16. This differ-ence in stability was evident after a 2-day interval and was also marked after 7and 21 days. A median sign test indicated that differences between earlier andlater responses to this question tended to be positive (i.e., later responses weresmaller than earlier responses, ps < .05). Item 1 distinguished itself from theother items on this subscale due to its high relative stability.

248 CONROY AND METZLER

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

249

TAB

LE 1

Item

-Lev

el M

eans

and

Sta

ndar

d D

evia

tions

Wav

e 1

Wav

e 2

Wav

e 3

Wav

e 4

Item

NM

±σ

NM

±σ

NM

±σ

NM

±σ

Fear

of

Exp

erie

ncin

g S

ham

e an

d E

mba

rras

smen

tIt

em 1

032

8–0

.46 a

±1.

2825

3–0

.57

±1.

2227

3–0

.63 a

±1.

1927

6–0

.59

±1.

19It

em 1

532

80.

36ab

1.21

254

0.22

ad±

1.23

273

0.03

1.24

276

–0.0

2 cd

±1.

29It

em 1

832

80.

30ab

1.24

254

0.04

1.19

273

–0.1

3 b±

1.29

276

–0.1

4 c±

1.28

Item

20

328

–0.4

1.17

254

–0.4

1.19

273

–0.5

1.19

276

–0.6

1.13

Item

22

328

–0.4

1.27

254

–0.6

1.16

273

–0.5

1.24

276

–0.5

1.21

Item

24

328

–0.0

4 ab

±1.

3225

4–0

.22

±1.

2427

3–0

.26 a

±1.

2827

6–0

.38 b

±1.

24It

em 2

532

8–0

.08 a

1.30

254

–0.2

1.27

273

–0.3

5 a±

1.21

276

–0.3

9 b±

1.26

Fear

of

Dev

alui

ng O

ne's

Sel

f-E

stim

ate

Item

132

8–1

.23

±0.

9925

4–1

.25

±1.

0227

3–1

.29

±0.

9627

6–1

.21

±1.

02It

em 4

328

–0.6

3 a±

1.14

254

–0.6

1.15

273

–0.8

1.12

276

–0.8

4 a±

1.09

Item

732

8–0

.60 a

±1.

1625

4–0

.60 b

±1.

1827

3–0

.75

±1.

1627

6–0

.84 a

1.08

Item

16

328

0.49

abc

±1.

2225

40.

23ad

±1.

2427

30.

07b

±1.

3227

6–0

.05 c

1.25

Fear

of

Hav

ing

an U

ncer

tain

Fut

ure

Item

232

8–0

.54 a

bc±

1.11

254

–0.6

6 a±

1.14

273

–0.7

6 b±

1.09

276

–0.7

5 c±

1.20

Item

532

8–0

.52 a

±1.

1625

4–0

.52

±1.

1427

3–0

.66

±1.

1527

5–0

.69 a

±1.

17It

em 8

327

–0.4

9 a±

1.17

254

–0.5

1.16

273

–0.6

7 a±

1.16

276

–0.6

1.18

Item

12

328

–0.0

1.25

254

–0.1

1.28

273

–0.1

1.27

276

–0.1

1.29

Fear

of

Impo

rtan

t Oth

ers

Los

ing

Inte

rest

Item

11

328

–0.8

1.12

254

–0.8

1.10

273

–0.7

1.11

276

–0.8

1.09

Item

13

328

–1.1

6 a±

0.97

254

–1.0

5 b±

1.01

273

–0.9

1 ab

±1.

0527

6–0

.92

±1.

05It

em 1

732

8–0

.95

±0.

9625

4–0

.83

±0.

9727

3–0

.89

±1.

0027

6–0

.85

±1.

01It

em 2

132

8–0

.98 a

±0.

9925

4–0

.94 b

±1.

0427

3–0

.79 a

1.07

276

–0.8

1.08

Item

23

328

–0.7

1.09

254

–0.7

1.15

273

–0.6

1.15

276

–0.8

1.10

(con

tinue

d)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

250

TAB

LE 1

(Con

tinue

d)

Wav

e 1

Wav

e 2

Wav

e 3

Wav

e 4

Item

NM

±σ

NM

±σ

NM

±σ

NM

±σ

Fear

of

Ups

etti

ng I

mpo

rtan

t Oth

ers

Item

332

8–0

.20 a

bc±

1.17

254

–0.4

5 a±

1.13

273

–0.4

9 b±

1.17

276

–0.5

3 c±

1.15

Item

632

8–0

.21 a

1.16

253

–0.4

3 a±

1.18

273

–0.4

4 b±

1.16

276

–0.6

2 b±

1.08

Item

932

8–1

.34 a

0.97

254

–1.2

5 c±

0.93

273

–1.0

6 ac

±1.

0427

6–1

.11 b

±1.

04It

em 1

432

7–0

.45

±1.

1925

4–0

.43

±1.

1527

3–0

.48

±1.

1227

6–0

.53

±1.

11It

em 1

932

8–0

.25 a

bc±

1.15

254

–0.3

8 a±

1.15

273

–0.4

2 b±

1.15

276

–0.5

0 c±

1.16

Sho

rt F

orm

Item

132

5–0

.65

±1.

1925

3–0

.71

±1.

2327

1–0

.77

±1.

2227

4–0

.87

±1.

20It

em 2

325

–0.5

1.19

253

–0.5

1.21

271

–0.6

1.18

274

–0.6

1.24

Item

332

5–0

.99

±1.

0225

3–0

.91

±1.

0327

1–0

.82

±1.

0927

4–0

.93

±1.

08It

em 4

325

–0.3

1.18

253

–0.4

1.20

271

–0.3

1.22

274

–0.5

1.19

Item

532

5–0

.29

±1.

3025

3–0

.31

±1.

2827

1–0

.30

±1.

3327

4–0

.38

±1.

33

Not

e.It

em 1

2 sh

ould

be

reve

rse-

scor

ed w

hen

scor

ing

the

inve

ntor

y. M

eans

in th

e sa

me

row

that

sha

re a

com

mon

sub

scri

pt d

iffe

r si

gnif

ican

tly a

t p<

.05

(Bon

ferr

oni c

orre

cted

).

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

TABLE 2Stability Estimates for Fear of Experiencing Shame and Embarrassment Items

Proportion of Agreement

Item Exact ±1 2.5%ile 97.5%ile

2-day intervalItem 10 .48 .85 –2 +3Item 15 .49 .91 –2 +2Item 18 .46 .88 –2 +2Item 20 .44 .87 –2 +2Item 22 .55 .88 –2 +2Item 24 .49 .83 –2 +2Item 25 .44 .80 –2 +2

5-day intervalItem 10 .50 .88 –2 +2Item 15 .58 .91 –1 +2Item 18 .53 .91 –1.63 +2Item 20 .53 .93 –1.63 +2Item 22 .59 .92 –2 +1Item 24 .56 .88 –2 +2Item 25 .54 .88 –1.63 +2

7-day intervalItem 10 .50 .88 –2 +3Item 15 .58 .91 –1.6 +2Item 18 .53 .91 –2 +3Item 20 .53 .93 –2 +2Item 22 .59 .92 –2.6 +2Item 24 .56 .88 –2 +2Item 25 .54 .88 –2 +3

14-day intervalItem 10 .59 .89 –2 +2Item 15 .49 .87 –2 +2Item 18 .51 .89 –2 +2Item 20 .57 .91 –2 +2Item 22 .54 .88 –2 +2Item 24 .56 .87 –2 +2.1Item 25 .54 .87 –2 +2

19-day intervalItem 10 .53 .87 –2 +2Item 15 .50 .85 –2 +3Item 18 .42 .84 –2 +2Item 20 .49 .88 –2 +2Item 22 .55 .91 –2 +2Item 24 .50 .87 –2 +2Item 25 .50 .88 –2 +2

(continued)

251

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

Stability results for four items on the fear of having an uncertain future sub-scale are presented in Table 4. The mean proportions of exact and ±1 agreement,and nonparametric 95% limits of agreement for these items across all time peri-ods were .51, .87, and –1.91–2.34, respectively. Responses to item 12, the only reverse-scored item on the PFAI, were noticeably less consistent on all stability in-dices than were responses to other items on this subscale. As indicated by mediansign tests, the inconsistencies in responses to this item were not systematic at anytime interval (ps > .05).

Stability results for five items on the fear of having important others lose in-terest subscale are presented in Table 5. The mean proportions of exact and ±1agreement, and nonparametric 95% limits of agreement for these items across alltime periods were .54, .90, and –1.97–1.80, respectively. Item 17 was associatedwith inconsistent responding more than other items on this subscale, but only at 2-, 7-, and 21-day time periods. The median sign tests indicated that those incon-sistencies were not systematic (ps > .05).

Stability results for five items on the fear of upsetting important others sub-scale are presented in Table 6. The mean proportions of exact and ±1 agreement,and nonparametric 95% limits of agreement for these items across all time peri-ods were .53, .89, and –1.77–2.08, respectively. Items 3 and 6 exhibited slightlyless consistency in responses than other items on this subscale, but only at 2, 7,and 21 day time periods. Median sign tests for responses associated with thesetime periods revealed that responses to item 3 and item 6 tended to becomesmaller over time (ps < .05).

Stability results for items on the PFAI-S are presented in Table 7. The mean pro-portions of exact and ±1 agreement, and nonparametric 95% limits of agreement forthese five items across all time periods were .58, .91, and –1.77–2.02, respectively.By way of comparison, for the 25 long form items, the mean proportions of exactand ±1 agreement, and nonparametric 95% limits of agreement (across all time

252 CONROY AND METZLER

TABLE 2 (Continued)

Proportion of Agreement

Item Exact ±1 2.5%ile 97.5%ile

21-day intervalItem 10 .48 .82 –2 +3Item 15 .42 .84 –2 +2Item 18 .37 .76 –2 +3Item 20 .41 .85 –2 +2.55Item 22 .44 .85 –2 +2Item 24 .44 .83 –2 +2.55Item 25 .35 .78 –2 +3

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

periods) were .52, .88, and –1.88–2.08, respectively. No items on the short formstood out as being noticeably more or less consistent than other items on the scale.

DISCUSSION

The purpose of this research was to evaluate the stability of responses to items onthe long and short form versions of the PFAI over a 3-week interval. A general trend

PFAI ITEM STABILITY 253

TABLE 3Stability Estimates for Fear of Devaluing One's Self-Estimate Items

Proportion of Agreement

Item Exact ±1 2.5%ile 97.5%ile

2-day intervalItem 1 .71 .93 –2 +2Item 4 .56 .93 –2 +2Item 7 .55 .94 –1 +2Item 16 .43 .80 –2 +3

5-day intervalItem 1 .73 .99 –1 +1Item 4 .54 .93 –2 +2Item 7 .56 .88 –2 +2Item 16 .51 .85 –2 +2

7-day intervalItem 1 .67 .92 –2 +2Item 4 .47 .89 –2 +2Item 7 .51 .88 –1 +2Item 16 .41 .77 –2 +3

14-day intervalItem 1 .69 .94 –2 +1Item 4 .55 .91 –2 +2Item 7 .62 .92 –2 +2Item 16 .50 .86 –2 +2

19-day intervalItem 1 .70 .94 –2 +1Item 4 .59 .95 –1 +2Item 7 .54 .90 –1 +2Item 16 .49 .87 –2 +2

21-day intervalItem 1 .63 .90 –2 +2Item 4 .47 .93 –1 +2Item 7 .47 .86 –2 +2Item 16 .39 .80 –2 +3

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

for item means to decline over time was apparent. Several plausible reasons are pos-sible to explain why item means may have declined in the course of this study. Giventhat the data were collected during instructional courses for the different activities,skill improvements may have been responsible for decreased fear of failure. It isalso possible that failure became less threatening as participants became more com-fortable with their teacher, classmates, and the activity setting in general. Incorpo-rating measures of self-efficacy and objective performance improvement into futureresearch may help to rule out this possibility. Alternatively, a habituation effect may

254 CONROY AND METZLER

TABLE 4Stability Estimates for Fear of Having an Uncertain Future Items

Proportion of Agreement

Item Exact ±1 2.5%ile 97.5%ile

2-day intervalItem 2 .49 .89 –2 +2Item 5 .49 .87 –2 +2Item 8 .49 .89 –2 +2Item 12 .39 .73 –3 +3

5-day intervalItem 2 .65 .94 –1 +2Item 5 .63 .91 –2 +2Item 8 .64 .92 –1 +2Item 12 .45 .79 –2 +4

7-day intervalItem 2 .49 .90 –1 +2Item 5 .46 .85 –2 +2Item 8 .50 .86 –2 +2.63Item 12 .40 .72 –3 +4

14-day intervalItem 2 .67 .95 –2 +1Item 5 .61 .94 –1 +2Item 8 .60 .96 –2 +1Item 12 .45 .78 –3 +3

19-day intervalItem 2 .59 .93 –1.83 +2Item 5 .57 .93 –1 +2Item 8 .56 .95 –1 +1.83Item 12 .44 .81 –2 +3

21-day intervalItem 2 .47 .90 –2 +2Item 5 .41 .83 –2 +2Item 8 .42 .84 –2 +2.58Item 12 .34 .72 –3 +4

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

255

TABLE 5Stability Estimates for Fear of Important Others Losing Interest Items

Proportion of Agreement

Item Exact ±1 2.5%ile 97.5%ile

2-day intervalItem 11 .52 .89 –2 +2Item 13 .57 .92 –2 +2Item 17 .52 .92 –2 +1Item 21 .60 .90 –2 +2Item 23 .56 .89 –2 +2

5-day intervalItem 11 .56 .91 –2 +2Item 13 .55 .89 –2 +1Item 17 .63 .93 –1 +2Item 21 .59 .93 –2 +1Item 23 .57 .88 –2 +2

7-day intervalItem 11 .53 .85 –2 +2Item 13 .54 .87 –2 +1Item 17 .48 .87 –2 +2Item 21 .56 .92 –2 +1Item 23 .51 .88 –2 +2

14-day intervalItem 11 .62 .92 –2 +2Item 13 .61 .89 –2.1 +2Item 17 .57 .91 –2 +1.1Item 21 .57 .91 –2 +2Item 23 .56 .88 –2 +2

19-day intervalItem 11 .53 .93 –2 +1.83Item 13 .51 .91 –2 +2Item 17 .50 .90 –2 +2Item 21 .60 .91 –2 +2Item 23 .55 .89 –2 +2

21-day intervalItem 11 .46 .86 –2 +2Item 13 .48 .86 –2 +2Item 17 .49 .86 –2 +2Item 21 .54 .91 –2 +2Item 23 .46 .86 –2 +2

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

256

TABLE 6Stability Estimates for Fear of Upsetting Important Others Items

Proportion of Agreement

Item Exact ±1 2.5%ile 97.5%ile

2-day intervalItem 3 .41 .82 –2 +3Item 6 .43 .83 –2 +2Item 9 .57 .92 –2 +2Item 14 .48 .87 –2 +2Item 19 .52 .90 –1 +2

5-day intervalItem 3 .62 .90 –2 +2Item 6 .60 .90 –2 +2Item 9 .62 .93 –2 +1Item 14 .61 .93 –2 +2Item 19 .63 .94 –1 +2

7-day intervalItem 3 .43 .82 –2 +2.6Item 6 .37 .81 –2 +2.6Item 9 .52 .85 –2 +2.6Item 14 .44 .89 –2 +2Item 19 .54 .90 –1 +2.6

14-day intervalItem 3 .62 .93 –2 +2Item 6 .60 .92 –1 +2Item 9 .63 .92 –2 +2Item 14 .61 .94 –1.1 +2Item 19 .56 .89 –2 +2

19-day intervalItem 3 .57 .91 –2 +2Item 6 .49 .92 –1 +2Item 9 .60 .93 –2 +1Item 14 .61 .93 –2 +2Item 19 .58 .94 –1 +2

21-day intervalItem 3 .41 .79 –2 +2Item 6 .37 .79 –2 +3Item 9 .56 .87 –2 +2Item 14 .47 .88 –2 +2Item 19 .42 .89 –2 +2

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

257

TABLE 7Stability Estimates for PFAI-S Items

Proportion of Agreement

Item Exact ±1 2.5%ile 97.5%ile

2-day intervalItem 1 .53 .92 –2 +2Item 2 .56 .93 –1.05 +2Item 3 .64 .94 –1.05 +2Item 4 .53 .90 –2 +2Item 5 .52 .89 –2 +2

5-day intervalItem 1 .63 .92 –2 +2Item 2 .67 .93 –2 +2Item 3 .68 .93 –2 +1.68Item 4 .65 .91 –2 +2Item 5 .61 .91 –2 +2

7-day intervalItem 1 .57 .89 –2 +2Item 2 .54 .95 –1 +2Item 3 .62 .91 –2 +2Item 4 .55 .89 –2 +2Item 5 .53 .86 –2 +2

14-day intervalItem 1 .66 .91 –2 +2Item 2 .65 .94 –1.2 +2Item 3 .60 .91 –1 +2Item 4 .60 .93 –1.2 +2Item 5 .56 .89 –2 +2

19-day intervalItem 1 .56 .90 –1.85 +2Item 2 .61 .94 –1.85 +1.85Item 3 .62 .92 –2 +2Item 4 .58 .92 –2 +2Item 5 .50 .88 –2 +3

21-day intervalItem 1 .55 .90 –2 +2Item 2 .50 .90 –1 +2Item 3 .56 .90 –2 +2Item 4 .52 .88 –2 +2Item 5 .52 .84 –2 +2

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

have been responsible for the change in means over time (i.e., with repeated admin-istrations, participants become desensitized to item content). Researchers or con-sultants who employ the PFAI should be sensitive to these possibilities. Multiple assessments, inclusion of possible covariates to explain change, and careful plan-ning of the timing of PFAI administration are recommended to provide more accu-rate estimates of score changes over time. Ultimately, additional research will beneeded to rule out one or more of these hypotheses. Although some significant dif-ferences existed between item means across waves, observed effect sizes rangedfrom trivial to small magnitudes. Regardless of the reason for mean change or itsmagnitude, this average group trend was less important for our purposes than wasintraindividual response stability for each item.

In general, items exhibited a similar level of intra-individual consistency acrosssubscales. Interestingly, response stability tended to be highest when data fromwave 1 were not involved (i.e., 5-day, 14-day, and 19-day time periods). The golfclasses involved in this study had a skills test on the first day of data collection.Reports of dispositional fear of failure during wave 1 may have been biased by elevated state anxiety associated with these performance evaluations (see Conroyet al., 2003). Despite this unexpected pattern of stability across time periods, theoverall patterns of response stability were similar for each item, so further inter-pretation will focus on identifying potentially problematic items.

No items on the fear of experiencing shame and embarrassment, fear of impor-tant others losing interest, or fear of upsetting important others subscales or on thePFAI-S could be considered outliers with respect to response stability compared toother items on those subscales. The fear of devaluing one’s self-estimate subscalepossessed the most consistent items across time periods. Items on the fear of hav-ing an uncertain future subscale were the least consistent items across time periods(although items on the fear of experiencing shame and embarrassment subscale ex-hibited a similar inconsistency). The fear of devaluing one’s self-estimate and thefear of having an uncertain future subscales also contained the two items associ-ated with the lowest levels of response consistency. This finding was quite surpris-ing considering the former subscale had the highest average stability estimates.

The least stable item on the PFAI was item 12 on the long form fear of havingan uncertain future subscale (“When I am failing, I am not worried about it af-fecting my future plans”). Across the six time periods, item 12 exhibited the low-est mean proportions of exact and ±1 agreement (.41 and .76, respectively); noother item had mean proportions of exact and ±1 agreement less than .44 and .83,respectively. This item was the only negatively-scored item and has previouslybeen identified as problematic based on its low squared multiple correlation in re-search on the structural validity of PFAI scores (Conroy et al., 2002). This new ev-idence from a separate dataset reinforces the notion that test developers may wishto reword item 12 to increase the consistency of its responses. Simply reorientingthe question to be positively scored may be sufficient (e.g., removing the word

258 CONROY AND METZLER

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

“not”). Except for this item, the stability of other items on the fear of having an un-certain future subscale compared quite favorably with the rest of the PFAI items.

The second problematic item identified in this research was item 16 (“When Iam failing, I hate the fact that I am not in control of the outcome”). This findingwas somewhat surprising because previous research on the psychometric proper-ties of PFAI scores had not indicated any problems with this item (cf. Conroy,2001; Conroy et al., 2002). The low stability of responses to this item may be at-tributable to the negative wording of the item even though that negative wordingdoes not change the direction of item scoring. Although negatively-worded itemsare known to introduce systematic method variance into response patterns(Marsh, 1996; Motl & Conroy, 2000; Motl, Conroy, & Horan, 2000; Motl & DiStefano, 2002; Tomàs & Oliver, 1999), several other PFAI items included thatnegative wording, and negative statements are central to avoidance motivationalorientations such as those associated with fear of failure. Thus, variance attrib-uted to the negative wording alone seemed insufficient and inappropriate as anexplanation of the inconsistent responses to item 16. A more likely explanationmay involve the affective content of the item. Rather than rating a belief, this itemuniquely required participants to rate an affective experience (i.e., hate). Affectsare naturally expected to vary more than well-learned beliefs and it may not beoptimal for a measure of appraisal patterns (cf., emotional traits, Vallerand &Blanchard, 2000) to use affective statements as stimuli. To address both of theseplausible explanations, it may be worth rewording this item to read, “When I amfailing, my lack of control over the outcome bothers me.” Aside from item 16, arelatively high level of response consistency characterized the remaining items onthe fear of devaluing one’s self-estimate subscale across all time periods.

Although investigations of intraindividual response stability have not beencommon in the sport and exercise psychology literature, researchers have ex-plored the stability of responses to the Social Physique Anxiety Scale (SPAS), an-other dispositional anxiety construct (Wilson & Batterham, 1999; Nevill et al.,2001). The average proportion of exact agreements for SPAS responses were .56(Wilson & Batterham, 1999) and .49 (Nevill et al., 2001), compared to .52 for thelong form PFAI items and .58 for the short form PFAI items. The mean proportionof ±1 agreement for SPAS responses was .89 (Nevill et al.) compared to estimatesof .91 and .88 for the PFAI and PFAI-S items, respectively. The nonparametric95% limits of agreement for long- and short-form PFAI responses (–1.77–2.02and –1.88–2.08, respectively) were narrower than for 9-item (–2.07–+2.27) or 12-item (–2.14–+2.22) models of SPAS responses (Nevill et al., 2001). Thus, re-sponses to the long and short form versions of the PFAI appeared to be slightlymore stable than responses to the SPAS. This slight increase in stability was nottoo surprising considering the shorter time period used in this research. Future re-search employing longer time periods is warranted based on these results. Futureresearch with athletes at higher competitive levels would also be informative

PFAI ITEM STABILITY 259

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

because certain consequences of failing may be more or less salient at differentcompetitive levels.

In conclusion, a reasonable level of intraindividual response stability to thevast majority of items on the PFAI and PFAI-S during three weeks of instructionalcourses was demonstrated. Two potentially problematic items were highlightedand several plausible explanations for the instability of responses to those itemswere advanced. Overall, the available evidence suggested that responses to PFAIand PFAI-S items are as stable as would be expected for the construct being meas-ured. Based on these results and previous studies investigating the psychometricproperties of PFAI scores, we strongly encourage researchers and consultants toconsider this instrument when studying fear of failure or appraisal styles associ-ated with fear of failure.

ACKNOWLEDGMENTS

We thank Jessica Miller and Jason Willow for their assistance with data collection.

REFERENCES

Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two meth-ods of clinical measurement. Lancet, I, 307–310.

Bland, J. M., & Altman, D. G. (1995). Comparing two methods of measurement: A personal history.International Journal of Epidemiology, 24 (Suppl. 1), S17–S24.

Conroy, D. E. (2001). Progress in the development of a multidimensional measure of fear of failure:The Performance Failure Appraisal Inventory (PFAI). Anxiety, Stress, & Coping, 14, 431–452.

Conroy, D. E., Metzler, J. N., & Hofer, S. M. (2003). Factorial invariance and latent mean stability ofperformance failure appraisals. Structural Equation Modeling, 10, 401–422.

Conroy, D. E., Poczwardowski, A., & Henschen, K. P. (2001). Evaluative criteria and consequences associated with failure and success for elite athletes and performing artists. Journal of AppliedSport Psychology, 13, 300–322.

Conroy, D. E., Willow, J. P., & Metzler, J. N. (2002). Multidimensional measurement of fear of failure:The Performance Failure Appraisal Inventory. Journal of Applied Sport Psychology, 14, 76–90.

Marsh, H. W. (1996). Positive and negative global self-esteem: A substantively meaningful distinctionor artifactors? Journal of Personality and Social Psychology, 70, 810–819.

McClelland, D. C. (1965). Toward a theory of motive acquisition. American Psychologist, 20,321–333.

Motl, R. W., & Conroy, D. E. (2000). Validity and factorial invariance of the Social Physique AnxietyScale. Medicine and Science in Sports and Exercise, 32, 1007–1017.

Motl, R. W., Conroy, D. E., & Horan, P. (2000). The Social Physique Anxiety Scale: An example of thepotential consequences of negatively-worded items. Journal of Applied Measurement, 1, 327–345.

Motl, R. W., & DiStefano, C. (2002). Longitudinal invariance of self-esteem and method effects asso-ciated with negatively worded items. Structural Equation Modeling, 9, 562–578.

Nevill, A. M. (1996). Validity and measurement agreement in sports performance. Journal of SportsSciences, 18, 569–570.

260 CONROY AND METZLER

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13

Nevill, A. M., Lane, A. M., Kilgour, L. J., Bowes, N., & Whyte, G. P. (2001). Stability of psychomet-ric questionnaires. Journal of Sports Sciences, 19, 273–278.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.Oppenheim, A. N. (1966). Questionnaire design and attitude measurement. London: Heinemann.Schutz, R. W. (1998). Assessing the stability of psychological traits and measures. In J. L. Duda (Ed.),

Advances in sport and exercise psychology measurement (pp. 393–408). Morgantown, WV: Fitness Information Technology.

Theebom, M., De Knop, P., & Weiss, M. R. (1995). Motivational climate, psychological responses, andmotor skill development in children’s sport: A field-based intervention study. Journal of Sport &Exercise Psychology, 17, 294–311.

Tomàs, J. M., & Oliver, A. (1999). Rosenberg’s Self-Esteem Scale: Two factors or method effects.Structural Equation Modeling, 6, 84–98.

Vallerand, R. J., & Blanchard, C. M. (2000). The study of emotion in sport and exercise: Historical,definitional, and conceptual perspectives. In Y. L. Hanin (Ed.), Emotions in sport (pp. 3–37).Champaign, IL: Human Kinetics.

Vincent, W. J. (1995). Statistics in kinesiology. Champaign, IL: Human Kinetics.Wilson, K., & Batterham, A. (1999). Stability of questionnaire items in sport and exercise psychology:

Bootstrap limits of agreement. Journal of Sports Sciences, 17, 725–734.

PFAI ITEM STABILITY 261

Dow

nloa

ded

by [

Uni

vers

ity o

f C

alga

ry]

at 0

4:26

29

Sept

embe

r 20

13